Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

bjorn nyland's test of tesla vision

This site may earn commission on affiliate links.
>>….its not 'vision' that needs solving, its to a degree 'intelligence'<<

Quite.
Much of the AI around now is certainly Artificial but short on Intelligence even though it may simulate it pretty well.
I suspect (a) FSD is going to come up against a brick wall until there’s REAL AI and (b) if the latter ever arises a whole new can of worms will be opened.
 
I'm not sure you think "vision" is and what's been solved there to think it's now just a question of intelligence. Vision is still fooled.

We spent some time looking at Watson, the IBM intelligent thing. They're trying to make out it has AGI characteristics but its more like an set of aggregated ANI services wrapped into something thats more than a single use case. Look at the Jeopardy game thing, it was trained for the show. Give the same system a Pointless question and it wouldn't have a clue where to start, nor could it naturally comprehend the how to answer the rules of the show without being trained, something that a human can get with a description of the rules and 3 or 4 example questions and answers.

All existing AI approaches are at their core statistical models with layers of convolution and filters to transform the input to give a different lens on the situation, the filters being totally deterministic. Chat bots are still pretty dumb and are narrow AI, yes they're better than they were but thats because we train them to look for other things - it's like image recognition, you can train it to see a face, dog, cat, elephant, speed sign. car etc, the next level is the same model to see a smiling face, happy face, angry dog, barking dog, etc.

No system has passed the Turing test and the Google "It's starting to think for itself" thing is really just someone getting a little carried away.

Anyone selling AGI at the moment is pretty much selling snake oil, and anyone buying it are probably idiots given the number of use cases where simpler applications can leverage significant benefits.

But back to the vision aspect, the old adage of rubbish in, rubbish out applies, and if all you've got feeding the system are half a dozen low resolution cameras then no matter how good the AI, it's compromised. It's like telling Lewis Hamilton to drive without having any tactile feedback through his hands, his body, his hearing etc. The more input you provide the better the understanding of the environment, and that's why taking away any sensor is a retrograde step.
 
It's like telling Lewis Hamilton to drive without having any tactile feedback through his hands, his body, his hearing etc.

That's actually a very good demonstration of how power our brains are even with very limited inputs...

GT ran at lower resolution in 2009 than what Tesla AP cameras are generating and certainly not 360 views, yet a 'capable' human brain could turn that very artifical environment into real world improvements.

 
taking away any sensor is a retrograde step
I'm not sure that this is always true, although it certainly could be in some cases. A great sensor for certain situations might be useless in others.... hence the attraction of complimentary sensor suites. But when their outputs disagree - as they may well do in circumstances where you you really need multiple sensors - you have to have a means of reliably believing the correct input.

I think that is likely why ditching radar and trying to depend on the one absolutely indispensable (vision) input makes sense if you could get it to work.

Another issue is that (human / living) intelligence isn't bound to be rational. Living things aren't bound to be rational either. But I don't see how AI will ever be able to irrationally act against its own implicitly rationalised view.

I think that's why designing single or limited function systems that can make quite a lot of assumptions and also be designed to take (only) appropriate risks often makes sense. If you can determine when to activate windshield wipers without learning to recognise every visible effect that suggests it is likely to be raining, that is a better approach imo.

Same with headlight control.
 
Last edited:
  • Like
Reactions: init6
Anyone selling AGI at the moment is pretty much selling snake oil,

I have close to zero understanding of what the liks of Google (or even Tesla) are trying to do. But I would suspect the editors of Nature do, and to generate Nature papers almost on demand would suggest to me some one some where is making progress.

Personally I like to view the world from the point of view what's possible, every human generation has seen step changes in understanding that 99.999% of the population at the time refused to believe or accept, Galileo wasn't wrong yet was murdered.

The people working in AI at present I suspect will be the ones to really deliver on the fourth industrial revolution. If Tesla FSD is any part of that, who knows.
 
But I don't see how AI will ever be able to irrationally act against its own implicitly rationalised view.

But that's the exciting bit, just because you, I, or anyone else cannot see a way of achieving something, it doesn't make it impossible.

Trying to achieve the impossible is surely what we all should aim for? Otherwise you will always settle for the status quo and never advance.

A $5 rain/light sensor is a pretty extreme example of it, but nevertheless I can see the reasoning. But clearly failure is something you have to accept as there a somethings in life that are impossible.......
 
Trying to achieve the impossible is surely what we all should aim for?

Like make a yoke do a BETTER job than tried and tested steering wheel?

The danger is that you use the quest for the impossible as a selling tool to differentiate your product or mask shortcomings. I agree you should not be scared of attempting the seemingly impossible when there is a clear enough justification, but not just for the sake of it.

A $5 rain/light sensor is a pretty extreme example

It is indeed! Along with basic controls and features remaining at least static or improving, but not the lottery update system that Tesla seem to operate where it is not at all obvious what their real objective is... when new features that could be added to existing hardware aren't, bugs / retrograde steps seem to come as often as potential improvements, and long standing shortcomings remain unaddressed for months or years.
 
  • Like
Reactions: Wol747
I'm not sure that this is always true, although it certainly could be in some cases. A great sensor for certain situations might be useless in others.... hence the attraction of complimentary sensor suites. But when their outputs disagree - as they may well do in circumstances where you you really need multiple sensors - you have to have a means of reliably believing the correct input.

I think that is likely why ditching radar and trying to depend on the one absolutely indispensable (vision) input makes sense if you could get it to work.

Another issue is that (human / living) intelligence isn't bound to be rational. Living things aren't bound to be rational either. But I don't see how AI will ever be able to irrationally act against its own implicitly rationalised view.

I think that's why designing single or limited function systems that can make quite a lot of assumptions and also be designed to take (only) appropriate risks often makes sense. If you can determine when to activate windshield wipers without learning to recognise every visible effect that suggests it is likely to be raining, that is a better approach imo.

Same with headlight control.
Well I agree it's not true if the sensor is unreliable.

But if the sensor is telling you something reasonable, then mixing that with all the other sensor info in your AI model will help. Tesla seem unable to do it, whether the radar was junk or they just gave up is anyone's guess, but if the radar is telling you there is a car in front and the cameras aren't, which do you believe? Lets look at it a different way - Tesla have 3 forward facing cameras, now if they disagree, what do they do? Take the answer which 2 of them think is right? So why is the other one wrong? But in practice, training a neutral net it wouldn't be you doing it anyway, you'd feed it all in to the model and the model would work it all out. The radar has attributes the vision doesn't haven't, depth being the main one. The radar could be just helping confirm depth which might be useful to distinguish between a small 50 speed limit sign on the back of a lorry close to you and a 50 speed limit roundel at the side of the road but further away, both of which to a pure vision system could look the same size.

While we're getting deep on this, an image isn't "an image" either like we might understand it, it's effectively 100,000 pixels or so of data in an array, if a pixel fails in the camera the thing will still generally work. In other words a camera isn't a sensor, its 100k sensors stitched together.
 
But back to the vision aspect, the old adage of rubbish in, rubbish out applies, and if all you've got feeding the system are half a dozen low resolution cameras then no matter how good the AI, it's compromised. It's like telling Lewis Hamilton to drive without having any tactile feedback through his hands, his body, his hearing etc. The more input you provide the better the understanding of the environment, and that's why taking away any sensor is a retrograde step.
Apparently the current cameras are only 1.2 megapixel. Tesla have reportedly entered into a multi $billion contract with Samsung to supply 5 megapixel cameras. Is this an admission that the current hardware (never mind the software) is incapable of delivering FSD, and if this is the case will all our cars be upgraded? Every Tesla sold since 2018 has been sold with the promise that it has the hardware needed for FSD, and that autonomy will be delivered through future software upgrades.
 
  • Like
Reactions: Wol747
if the sensor is unreliable

I don't think anyone would set out to use an unreliable sensor. Also there are many ways of viewing unreliability. B Pilar cams are obviously unreliable... or may be undependable..... based on fogging / condensation problem. But the out put could be unreliable / undependable due to insufficient resolution for the application. And many more examples too I'm sure.

But assuming all these kinds of unreliability are understood, you still get a crossover if you use multiple sensors when the sensors could be seeing the same view differently.

But if the sensor is telling you something reasonable

And of course you somehow have to decide what's reasonable based on two or more potentially conflicting views.

While it might seem like an overly deep analysis, it is imo a very fundamental point and one that Tesla must be grappling with as evidenced by camera spec and radar changes.
 
I’ve long thought the sensor suite was insufficient. It was spec’d as part of a number of components, the compute power has been upgraded twice. The software restarted more than once, the radar now dropped, why would we think the cameras are the only bit they got right and good enough?

It’s not just the resolution, and low light performance, and anti glare in sunlight, but ability to clean, stop fogging and redundancy.

They’ve already added an interior camera but not retrospectively. So why? It’s either needed or it’s not

I’m sure level 3 might be possible, but levels 4 and 5… not so sure.
 
Tesla have 3 forward facing cameras, now if they disagree,

We are getting some way off lighting issues, but in the context of lighting depending on interpretation of vision I guess we are still in the frame somewhere!

Your point here is totally on the mark. You can't have 3 vision sensors capable of giving you contradictory data based on the same category of input device. There clearly is only one actual 'vision' based reality. Glitches with AP lane changes when pulling back into left lane in front of a truck and AP suddenly aborting is likely due to the view of the truck switching from one camera to another.

I have no idea what our (UK) cars are running, but I understood some time ago there was a move to switch to an amalgamated view (like a stitched image) so that the combined views of multiple cameras was combined to produce a single 360 image and hence get rid of anomalies as objects moved from camera to camera.

if the radar is telling you there is a car in front and the cameras aren't

Did you see greentheonly’s videos from 2 or 3 years ago that appeared to clearly show what I interpret as secondary radar reflections (say from overhead signs or sides of trucks) that happened to coincide with visible shadows on the road surface.

If such an occurrence (reasonably) leads the car (based on concurrence of both radar image and visible image - shadow and radar point reflections) to conclude that there is an object in front of the car, what can the car do other than brake?

Although less critical, operation of wipers based on vision only seems needlessly difficult. Seems like a waste of processing resources too.

Serial inputs (like stitched video images) become less reliable if (because) they depend on multiple inputs. Any camera obscured results in part of the full image being unknown and you have to decide if you can rely on an incomplete view. Parallel inputs give a degree of redundancy through overlapping, but then you have the issue of which image to believe if they don't agree.

Why bother with any of that for subsystems like wiping a dirty or wet windscreen when there are other simple ways of dealing with it reliably?
 
Last edited:
Did you see greentheonly’s videos from 2 or 3 years ago that appeared to clearly show what I interpret as secondary radar reflections (say from overhead signs or sides of trucks) that happened to coincide with visible shadows on the road surface.

If such an occurrence (reasonably) leads the car (based on concurrence of both radar image and visible image - shadow and radar point reflections) to conclude that there is an object in front of the car, what can the car do other than brake?

I didn't see it, no, I can imagine such a thing exists, Tesla even used reflected radar to work out if there was a car in front of the car in front to improve emergency braking, that's been quickly forgotten.


A couple of examples also tell a story. On this view you'd probably go with the vision as it's more or less a happy day scenario so long as that sun doesn't blind the camera (which we know can happen)

i1.jpg



But what about these... is there a car there in the same lane? Maybe one a 100m ahead?

i2.jpg


or this, I can see something but is it stopping

i3.jpg


or this one.. can't really see anything but I couldn't say my confidence is high on that, I believe most of us would slow down if this was any worse

i4.jpg


The equation presumably isn't as simple as vision and radar each says yes or no to something in front, so what do you do it they differ?

In a dumbed down way, the equation is likely to be: vision says yes with 85% confidence, radar says yes with 90% confidence - correlate the two - that's a yes
Maybe vision says no with a 60% confidence, and radar says yes with 80% confidence - I might go yes there because vision isn't that confident
Of vision says no with 85% confidence and radar says yes with 55% confidence - I might be thinking no.

You get the gist I hope. And using the examples above you can see vision may have a low confidence and a secondary cross check from a sensor with a different performance envelope might be more confident. The radar is likely to be great with a low sun, might get a bit noisy with light rain and might be useless in heavy rain and traffic, but so long as it can work out its own confidence factor, the data is useable.
 
  • Like
Reactions: Battpower
I'm not sure you think "vision" is and what's been solved there to think it's now just a question of intelligence. Vision is still fooled.

We spent some time looking at Watson, the IBM intelligent thing. They're trying to make out it has AGI characteristics but its more like an set of aggregated ANI services wrapped into something thats more than a single use case. Look at the Jeopardy game thing, it was trained for the show. Give the same system a Pointless question and it wouldn't have a clue where to start, nor could it naturally comprehend the how to answer the rules of the show without being trained, something that a human can get with a description of the rules and 3 or 4 example questions and answers.

All existing AI approaches are at their core statistical models with layers of convolution and filters to transform the input to give a different lens on the situation, the filters being totally deterministic. Chat bots are still pretty dumb and are narrow AI, yes they're better than they were but thats because we train them to look for other things - it's like image recognition, you can train it to see a face, dog, cat, elephant, speed sign. car etc, the next level is the same model to see a smiling face, happy face, angry dog, barking dog, etc.

No system has passed the Turing test and the Google "It's starting to think for itself" thing is really just someone getting a little carried away.

Anyone selling AGI at the moment is pretty much selling snake oil, and anyone buying it are probably idiots given the number of use cases where simpler applications can leverage significant benefits.

But back to the vision aspect, the old adage of rubbish in, rubbish out applies, and if all you've got feeding the system are half a dozen low resolution cameras then no matter how good the AI, it's compromised. It's like telling Lewis Hamilton to drive without having any tactile feedback through his hands, his body, his hearing etc. The more input you provide the better the understanding of the environment, and that's why taking away any sensor is a retrograde step.
Roger Penrose has repeatedly pointed out that consciousness/intelligence has to be non computational (I.e. non algorithmic) due to limitations imposed by Gödel’s incompleteness theorem. In simple terms, we humans can “see” that there are truths/insights that a formal system cannot within itself prove. If true (and I think it probably is), then AI in its current form will never do anything close to what a human can do. There is some additional concept that we are yet to understand.
 
>>I'm not sure that this is always true, although it certainly could be in some cases. A great sensor for certain situations might be useless in others.... hence the attraction of complimentary sensor suites. But when their outputs disagree - as they may well do in circumstances where you you really need multiple sensors - you have to have a means of reliably believing the correct input.<<

A good example would be when you are driving in traffic with the windows closed and you hear an ambulance siren. You have the aural warning but 9 times out of 10 you have no idea from which direction it’s coming. You want to get out of the way but until you KNOW the location of the ambulance you’re stuck because the right thing to do might be accelerate, stop, turn left, turn right or whatever. It’s only when you spot the flashing lights that a decision can be made - an unconscious switch of input sensors.
We “poor, unreliable” humans do this instinctively yet apparently the radar is being removed because the system can’t be made to do similar things.
I suspect Karpathy went because he eventually told Musk some unacceptable truths.
 
Have we discussed the fact that since mid July, the Vienna convention has been amended, paving the way for Level 3 autonomous driving in the EU/UK.
Therefore, why haven't we heard of Tesla applying for Level 3 yet, which would allow the use of auto steer on motorways without keeping hands on the wheel? (and therefore bringing EAP/FSD slightly more on par with the US)?