Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Tesla replacing ultrasonic sensors with Tesla Vision

This site may earn commission on affiliate links.
It’s pretty wild. And it explains why there are such strong disagreements on the subject. It’s a shame and I wish I knew a way to improve it because it’s really off putting.
I'm kinda surprised, but at the same time I understand it. In overly simple terms, neural networks are supposed to emulate organic neural functions. If you took identical twins and sent them both to the same driving school with the same instructor in the same car, would they both drive the exact same when presented with the same route? With all the differing stimuli that can happen in the same route, the neural nets in the Tesla are getting different input every time they drive that stretch. This time there was a pedestrian, another time there were more cars, and another time the sun was in a different part of the sky generating different shadows and highlighting different details and features of objects around it.

However, this should account for minor variations in the drive. When one person can drive 10 miles of city streets and have a near flawless drive, and another person can drive a similar route and have a terrible drive (loaded with PBs, running red lights and stop signs, hitting curbs, etc), I too wonder what's the difference in the cars.
 
  • Like
Reactions: DrGriz
Time source 1 says 1:43 pm
Celestial body source says 2:12 pm
What time is it?
No one who takes automation seriously uses only one sensor per type - read about Boing 737 Max story. In your case, you should have odd number of timers, and celestial bodies. Worst case scenario, you remove the outliers (or that class of sensors) and average the results (or do some sort of weighted average, based on the type of conflict or ML model). Basic portfolio problem.
Voyager spacecraft (built in early 1970s) use inertial navigation (several independent gyros), radio beam to Earth, and celestial bodies orientation. Still flying and sending back data. So, they must have known something about automation and how to resolve sensor conflicts. Just recently they started to reluctantly disable sensors because of dwindling power.
 
  • Like
Reactions: Battpower
No one who takes automation seriously uses only one sensor per type - read about Boing 737 Max story. In your case, you should have odd number of timers, and celestial bodies. Worst case scenario, you remove the outliers (or that class of sensors) and average the results (or do some sort of weighted average, based on the type of conflict or ML model). Basic portfolio problem.
Voyager spacecraft (built in early 1970s) use inertial navigation (several independent gyros), radio beam to Earth, and celestial bodies orientation. Still flying and sending back data. So, they must have known something about automation and how to resolve sensor conflicts. Just recently they started to reluctantly disable sensors because of dwindling power.
So you want three sensor types (or two with a double) all with some failure rate?

Now expand from a single value to a variable number of potential objects in three dimensional space.

Sensor 1: object at 5, 16, 99
Sensor 2: no objects detected
Sensor 3: no objects detected
Is there an object? If so, where is it?

Sensor 1: object at 5, 16, 99
Sensor 2: object at 7, 25, 106
Sensor 3: no objects detected
Is there an object? If so, where is it?

Sensor 1: object at 5, 16, 99
Sensor 2: object at 7, 25, 106
Sensor 3: object at 9, 34, 113
Is there an object? If so, what is its direction?
 
  • Funny
  • Like
Reactions: Boza and enemji
So you want three sensor types (or two with a double) all with some failure rate?

Now expand from a single value to a variable number of potential objects in three dimensional space.

Sensor 1: object at 5, 16, 99
Sensor 2: no objects detected
Sensor 3: no objects detected
Is there an object? If so, where is it?

Sensor 1: object at 5, 16, 99
Sensor 2: object at 7, 25, 106
Sensor 3: no objects detected
Is there an object? If so, where is it?

Sensor 1: object at 5, 16, 99
Sensor 2: object at 7, 25, 106
Sensor 3: object at 9, 34, 113
Is there an object? If so, what is its direction?
Looks like you're using Casio sensors!

But seriously, it depends on why the data is different.

Faulty sensors
Incorrect / unsuitable sensors
External factors effecting data not taken into account.
Processing error.

Years ago I worked on an early railroad signalling project that moved away from electromechanical hardware. We had best of 3 and if 2 agreed, the 3rd was shut down.

Trouble was that more hardware = greater chance of error. So also chance of all 3 disagreeing or 2 agreeing with eachother but both being wrong! So in fact, better approach was to make the systems more reliable and only have 2. In the (less likely) event of disagreement, you shut both down to a fail safe status.

Not sure there is a direct comparison with car systems, but having the minimum number of highest quality complementary sensors to handle the task, with solid logic to validate outputs against eachother would seem a good start.

The simplicity of 'mono-culture' (like single sensor type) is often appealing but always has an Achilles Heel.
 
Last edited:
  • Like
Reactions: enemji
I had PB with the radar; now, I have PB with vision only. I have just learned the circumstances when the car was doing PB, now I have to learn it again. Not sure if it is less or more; it is different and definitely still existent.
The argument for removing sensors because of contradicting input is flawed in its core. Pretty much everywhere else systems have decision making algorithms to address conflicts. It has been established that various sensors with conflicting information is better than single sensor with less information. Simple portfolio problem. The aerospace industry has solved that decades ago. No need to reinvent the wheel (literally) when the wipers are not working.
This was discussed a while back, with radar, phantom braking can happen if there is a mismatch of target association between vision and radar. Overpasses and bridges also are issues. You can see the video linked (23:53) for specific examples. For people where this is the cause of their phantom braking, Vision was an improvement for them.
Tesla.com - "Transitioning to Tesla Vision"
The examples others raised showed it's not as simple as you are trying to make it out to be. Aerospace examples are irrelevant, because as you point out, they typically use odd number sensors so they can implement simple voting. In this case there are only two sensors: the camera and the radar. They necessarily have to pick one to prioritize when there are conflicts, there is no simple solution.

When they improved vision to the point they can determine velocity of objects using vision, they eliminated radar. Of course it has its own issues that can cause phantom braking. But it's a moving target (the occupancy network is a completely new thing that can lead to a lot of improvements).

Note if you don't have FSD Beta, you don't actually get a preview of the latest that Tesla is working on. The legacy code for highway AP is much less sophisticated and Tesla has not merged the code base yet (the so called "single stack").
 
Last edited:
I have to say I’ve been following this thread closely because I too am very concerned about losing the radar and ultrasonic sensors my 2017 has installed. *however* after receiving FSD beta (finally!) last week, and seeing how perceptive and frankly amazing Tesla vision software is in actual driving… I’ve decided that Elon may well be right that vision alone is the best solution for today’s cars. The visualizations are stunning.
86381E12-EC3D-4F56-9BAC-B7BEB42E9641.jpeg
 
And cameras aren't perfect or identical to human vision. VO = only as good as vision can support, and you have to allow for degradation of vision due to a whole range of relatively common scenarios.



Which is the other Tesla characteristic that seems to have been evident all along. The car / ap / FSD systems seems always ready to doubt and contradict its view of a situation it was happy with a split second earlier. Happy to see objects glitch in and out of view. Edges of highways suddenly twitch. Of course that is based on what we get shown in visualisations, but why show a changing / glitchy visualisation if somewhere there is a nice stable one? The whole notion of 'confidence' seems missing, and that's reflected in how the car handles itself. I vary my driving style and speed based on confidence. Tesla's model seems to still be ultimately limited / controlled by rules.
You are in UK so you don't have FSD Beta, which has the latest tech. The production code most of us are using, have problems when objects straddle two camera views (objects can blink in and out). The marking of the roadway is also less sophisticated. Also, before Vision, although there is some temporal smoothing, everything is evaluated basically frame by frame (which can also cause objects to blink in and out). There is also the fact that in the consumer facing UI, basically it is loading an existing set of models, so if an object is not in that set, the visualization may not show it, but that does not necessarily mean the code is not taking it into account.
While there has been so much talk about additional dimensions and building a time element into processing of consecutive images, can a NN have a concept of how confident it is, or does it just give fixed determinations? Even if it does have a notion of confidence, is that helpful if only one 'answer' is allowed? IE: If the NN is only marginally confident that it's a trash bin rather than mobility scooter, how confident should it be before it slows down because the object is very near to the highway? Trash bin + person or + windy weather, could move itself so slow down. Mobility scooter moving on sidewalk close to highway, slow down, trash bin + nothing around, ignore as long as drive able space not impacted.

The 'living in the moment' or more like 'living in the second' bahvior with no notion of 'confidence' seems at the heart of several often discussed issues imo.
NNs by design have built in confidence. This issue is that it relies on classifying every object. The occupancy network (ON) eliminates a lot of issues of the previous object recognition. Instead of trying to identify what the object is, the ON only determines if a given block is occupied (then additional labels can be put on blocks if necessary). This makes processing much faster and it allows for avoiding objects that the other NN has not classified.

I suggest reading up on the ON before commenting (I think a lot of people need to read up on it):
A Look at Tesla's Occupancy Networks

Note this is only being rolled out in latest version of FSD Beta. There was a leak that Tesla is testing using ON data to replace USS data in latest update, but they appear to not have put it in the UI yet (so cars with USS still don't appear to see it yet):
 
Looks like you're using Casio sensors!

But seriously, it depends on why the data is different.

Faulty sensors
Incorrect / unsuitable sensors
External factors effecting data not taken into account.
Processing error.

Years ago I worked on an early railroad signalling project that moved away from electromechanical hardware. We had best of 3 and if 2 agreed, the 3rd was shut down.

Trouble was that more hardware = greater chance of error. So also chance of all 3 disagreeing or 2 agreeing with eachother but both being wrong! So in fact, better approach was to make the systems more reliable and only have 2. In the (less likely) event of disagreement, you shut both down to a fail safe status.

Not sure there is a direct comparison with car systems, but having the minimum number of highest quality complementary sensors to handle the task, with solid logic to validate outputs against eachother would seem a good start.

The simplicity of 'mono-culture' (like single sensor type) is often appealing but always has an Achilles Heel.
It is not just the number/type of sensors but also contextual. If I have radars and cameras and it is foggy (cameras can detect that) I would put more weight on radars. On the other hand, if the radars get spurious pings back (you can detect that) more weight could be put on the cameras.

Actually, it works the other way around - more sensors lower the random errors because they cancel each other. The challenge is that approach increases the cost; hence, there is a point of diminishing returns - but it is certainly more than one. In fact, you made the system less reliable by removing the third sensor. With only two sensors you have a higher probability to “fail safe” than with three (considering random errors). Granted, 3 sensors introduce more errors than 2; however, those errors average out better the more sensors you have.

On the other hand, if you have another class of sensors they will introduce a different error distribution, which could overlap nicely with the first set and reduce errors even further, e.g. cameras have better resolution; radar has better wave propagation.

Again, look at the aerospace. For example, to measure altitude they use barometer, GPS and radar. No one is making the case to remove one class because introduces errors - and, believe me, they rarely completely agree.

You have a good point about “fail safe”. That is the commonly accepted approach - if the system has unresolvable conflict it falls back to the human, who has a wider spectrum of ability to resolve conflicts. Some new research use ML to find patterns and thus reduce the need to “fail safe”. However, that has been somewhat solved in relatively simple or severely restricted environments.
 
Last edited:
  • Like
Reactions: pilotSteve
NNs by design have built in confidence. This issue is that it relies on classifying every object. The occupancy network (ON) eliminates a lot of issues
That's kind of my point. That NN confidence is not like human confidence. I doubt anyone knows how a given NN is 'NN confident' about anything (IE: what characteristics it finally used to determine something), but based on vision only it has a sensorial depravation compared to humans.

It makes sense with NN having such a limitation to restrict your objective to 'locating clumps of stuff' / 'volumetric driveable space' / ON.

So I agree ON could be the basis for solving a lot of issues.

But the myopic vision only approach does not make as much sense as other multi-sensory approaches sense imo since cameras get obscured and often - by sun, by rain / road spray / fog, by dirt / mud splashes / bird excrement etc especially for AEB for example.

I suggest reading up

Not sure Cohen adds much to Ashok's presentation which I watched end to end multiple times a couple of months back.

Cohen said 'These algorithms are currently not in the Autopilot [or] FSD software, but could be in a very near future!'

I'm not sure either how much of this if anything if in which deployed software.

Of course we are unlikely to see FSD Beta in the UK any time soon (thanks @stopcrazypp for reminding me) but my main concern is how often ap / FSD functions on my car are compromised due to camera 'obstrction'. Also, how frustrating it is to see Tesla's determination to use vision only for everything rather than integrate proven approaches.
 
Last edited:
  • Like
Reactions: Boza
more sensors lower the random errors because they cancel each other. The challenge is that approach increases the cost; hence, there is a point of diminishing returns

Not from any personal experience but I can see how also, since different sensors can have very different failure modes / intrinsic limitations, as long as you have a solid algorithm for working out which to trust, you must be better of than single sensor.

Taking multiple camera inputs to create a single ON / 3D view would seem to have the opposite characteristic of needing multiple very similar camera feeds to keep working simultaneously to create your (single) valid 3d view. More cameras means more chance that at least one of them gets compromised. Then, can you trust your aggregated view?
 
Last edited:
This is all nice but occupancy network is currently not available, so all of those are speculations, based on presentations and beta products. As opposed to USS, which is not only in production but standard in the industry.
As a general rule, one does not remove functionality until there is (at least) equivalent replacement.
Another rule that I personally use in my own professional work, and which Elon also seems to be using, is to burn your bridges. That will ensure that you will be even more determined to focus on the realization of the future step
 
Screenshot_2022-11-21-11-34-08-213.jpg


From AK’s 'recruitment presentation' referred to a few pages back in this thread where he goes into the radar + vision issues, he rhetorically asks why would you waste time coding to combine vision and radar data when vision data is so clean / good. I wish he had then included same data with reduced visibility. My guess is he would have soon found an answer to his question.

However, the fact that there is a clear recognition that 'phantom braking' is real and has identifiable causes is reassuring, especially with the background owner noise claiming their car isn't effected so it isn't a 'thing'.

Going VO does seem to obviously and unavoidably have restricted operational environment and sensor limitations. I'll take some convincing that reducing sensor diversity rather than fixing integration will deliver overall better results.
 
Last edited:
  • Like
Reactions: Boza
I wish he had then included same data with reduced visibility. My guess is he would have soon found an answer to his question.

However, the fact that there is a clear recognition that 'phantom braking' is real and has identifiable causes is reassuring, especially with the background owner noise claiming their car isn't effected so it isn't a 'thing'.

Going VO does seem to obviously and unavoidably have restricted operational environment and sensor limitations. I'll take some convincing that reducing sensor diversity rather than fixing integration will deliver overall better results.
The answer to your question is very simple. Driving is and always will be vision based. So in the scenario where visibility is poor, there will be no autopilot or FSD. Sure the radar will tell you if there is an obstruction ahead or not but it cannot tell you how to stay between the lane markers.
 
The answer to your question is very simple. Driving is and always will be vision based. So in the scenario where visibility is poor, there will be no autopilot or FSD. Sure the radar will tell you if there is an obstruction ahead or not but it cannot tell you how to stay between the lane markers.

Yes. The perspective for my post was that of a UK FSD owner eagerly awaiting their pumpkin turning into a robotaxi!
 
  • Funny
Reactions: pilotSteve
Hi,
That video shows that even when he covered all the forward facing cameras - the car was still able to drive on FSD.
Did he forget that the car has GPS and therefore should know where it is on a map?
This means it should know where the roads are - but not be able to see objects, pedestrians, cars etc.
Regarding phantom braking under bridges - you would imagine that the system knows from GPS that a bridge is present.
It also knows the time of day, direction and position of the sun - relative to the front of the car, outside temperature (an indication whether it is potentially a sunny day), external light level sensor - again indicates sunny day.
With all the above data and the different camera views - you would imagine that the likelihood of a shadow being present under a bridge is quite predictable.
However - I remember an early bad accident that somebody had in a Tesla (model S) which was on AP and it ploughed through the side of a truck that was across the road in front of it.
Maybe a shadow and a sideways truck look similar to cameras (but quite different to a radar sensor!)
Cheers
Steve