Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Tesla replacing ultrasonic sensors with Tesla Vision

This site may earn commission on affiliate links.
Sounds like a way to replace hardware with software to make cheaper cars
Looking at the service manual related to ultrasonic sensors, there is a lot of wiring harness for both the front and rear with a lot of additional little steps. Just looking at the rear fascia which has wiring that only exists for the ultrasonic sensors, there's 2 connector locks, 6 harness connectors, 10 clips and the brackets to hold the sensor and the ultrasonic itself that both need to be of the correct color.

The cost saving isn't just the parts installed on the car but also simplifying the inventory management and manufacturing time and service too.

It has yet to be proven, but it can also come with actual benefits too that might be sooner than if not removed because the Autopilot team is now on the hook to make it happen "as soon as possible."
 
  • Like
  • Informative
Reactions: mongo and Boza
I don't believe Vision can ever use this picture info to determine the exact distance it is
Exact distance with a single image would be difficult to do consistently especially with poles that can be arbitrary sizes. But in this scenario, there's 3 cameras with different positions and field of views (35°, 50°, 120°) as well as history of approaching the pole. There aren't many examples of Autopilot predictions along with the 3 camera views, but here's a somewhat similar situation from AI Day 2021 with objects close to the front:

multicam detection.jpg


Even though there aren't explicit distances shown on this visualization, would you at least agree that Autopilot has the data to beep/display inches to the closest object?

With the subtle differences in views of the 3 cameras shown above, the Occupancy network probably already take that into account without even realizing it as its training data happens to always come with the appropriately positioned triple camera view with the ground truth position and size of occupancy/objects.

Similarly, the video data with ground truth trains the Occupancy network to know that objects "grow" a certain way as it gets closer with subtle differences across the 3 cameras allowing it to better predict the actual size and distance to objects like arbitrarily sized poles. Effectively this can provide better than human 2-eyed stereoscopic depth perception (even though humans are actually pretty bad at measuring exact distances just by "eyeballing it").
 
One might say phantom braking is the safer alternative to ambiguity than colliding.
In 90% of the cases - yes. In the other 5% - no. If we use humans the success rate goes way above 90%. The fact that there is ambiguity means that there is no default safer alternative. Giving up and defaulting to one “safe state” is typical for non-intelligent automation like that for elevators (always reset to first floor), traffic lights (always blink red in US and yellow in EU), etc. More intelligent systems attempt to reduce the problem to one that they know how to solve. That requires usage and evaluation of as many inputs as possible because there are many “safe” alternatives and the AI is selecting the best one. From that perspective, Tesla AP is as dumb as an elevator.
One should look at the aerospace where they have been tackling the problem, under much simpler conditions, for at least 50 years. The environment is simpler, aircraft talk to each other, the ATC is much more stringent than the highway patrol, heavily regulated. Still, there is a person to take over when the AP (mind you, real AP) cannot solve the problem. That AP fails much more gracefully than the FSD - ample warning, multiple failure options considered, as opposed to just “Land!”. Oh, and they have an array of different sensors to rely upon. I have not heard anyone saying “We have ground proximity radar - let’s remove the barometric altimeter because we get confusing data from them”.
FSD will remain a pipe dream for years to come, no matter how much cool aid Elon keeps selling. I love it as a driver assistant feature. But FSD it is not.
 
One might say phantom braking is the safer alternative to ambiguity than colliding.

I kinda agree. My EAP does far more phantom braking now, in the last 2 months. If there is a car ahead of me, but in an adjacent lane, it is more careful and slower to accelerate if starting from a stop, and if cruising, sometimes slows down to the speed of that adjacent car before accelerating slowly, again.

This behavior is close to a careful human. Not bad. However, there are many instances when it brakes AFTER it has passed a car, a cyclist (both have happened to me). That behavior is annoying, because there is nothing in front of it now, and it brakes.
 
Exact distance with a single image would be difficult to do consistently especially with poles that can be arbitrary sizes. But in this scenario, there's 3 cameras with different positions and field of views (35°, 50°, 120°) as well as history of approaching the pole. There aren't many examples of Autopilot predictions along with the 3 camera views, but here's a somewhat similar situation from AI Day 2021 with objects close to the front:

View attachment 861463

Even though there aren't explicit distances shown on this visualization, would you at least agree that Autopilot has the data to beep/display inches to the closest object?

With the subtle differences in views of the 3 cameras shown above, the Occupancy network probably already take that into account without even realizing it as its training data happens to always come with the appropriately positioned triple camera view with the ground truth position and size of occupancy/objects.

Similarly, the video data with ground truth trains the Occupancy network to know that objects "grow" a certain way as it gets closer with subtle differences across the 3 cameras allowing it to better predict the actual size and distance to objects like arbitrarily sized poles. Effectively this can provide better than human 2-eyed stereoscopic depth perception (even though humans are actually pretty bad at measuring exact distances just by "eyeballing it").
This is almost 200yr old technology. It is called “visual ranging” and it was very, very popular until 1950s. You look at an object from two separate cameras and the difference in the picture will give you the distance. That is how animals see in 3D.
However, it has limitations. In addition to obvious ones when the camera cannot obtain the picture (fog, snow, obstacle), this method breaks when the difference is too big (object is too close) and it requires quite a bit of computations (may be slow).
That is why since the 1950s it has been massively replaced by radar. Much simpler problem to solve (just measure time) and works in fog, snow, etc. Resolution is much worse than cameras (you know how close the object it but not exactly where it is) but that was solved with phase arrays (still very expensive!).
It seems that combining two sensors - one that easily measured distance (radar) and one with high resolution (camera) is the way to go, at least for everyone else in the field.
 
This is almost 200yr old technology. It is called “visual ranging” and it was very, very popular until 1950s. You look at an object from two separate cameras and the difference in the picture will give you the distance. That is how animals see in 3D.
However, it has limitations. In addition to obvious ones when the camera cannot obtain the picture (fog, snow, obstacle), this method breaks when the difference is too big (object is too close) and it requires quite a bit of computations (may be slow).
That is why since the 1950s it has been massively replaced by radar. Much simpler problem to solve (just measure time) and works in fog, snow, etc. Resolution is much worse than cameras (you know how close the object it but not exactly where it is) but that was solved with phase arrays (still very expensive!).
It seems that combining two sensors - one that easily measured distance (radar) and one with high resolution (camera) is the way to go, at least for everyone else in the field.
That's why a key feature of the vision system is object persistence so it can use the position it determined before the object was too close.

FWIW, phased arrays make non-mechanical beam steering possible, but azimuth and elevation resolution is still dependent on antenna size vs frequency.
 
Even though there aren't explicit distances shown on this visualization, would you at least agree that Autopilot has the data to beep/display inches to the closest object?

The ultrasonics are more important, especially for me, in the rear direction, where there is only one monocular camera. There's no multiple views.

I back into a space with glass door to park at home. It has reflections, possibly from the car, depending on the lighting conditions. Parking precision is important so I can close the gate and I use the calibrated physical distances from ultrasonic to let me know when I should stop.

It's difficult for me to believe a monocular vision is going to work well.
 
You look at an object from two separate cameras and the difference in the picture will give you the distance. That is how animals see in 3D.
That only works for short distances as the eyes are very close together. This is why you can just the distance well out to a meter (for example) vs things out 10, 20, 40 meters. So many other methods besides two eyes and their distance apart (different perspectives) that are being used.
A32Ztw9.jpg
 
@Mark II, with regard to your comment about the hood geometry -- is the visibility really that much worse than any other car? The only visibility issues I noticed on the test drive were with the mirrors, which I determined could be alleviated by looking more thoroughly over my shoulder if the camera failed for whatever reason. I've never had a great view of what's in front of the bumper of any car, beyond where the hood drops off.
- My first car was an 84 corolla. Very square shaped compared to later years. I could reliably park it, backward or frontward, within 0.5 inch of a wall without touching the wall. No sensors or cameras.
- Next up was a 2001 corolla, which was more curved. I found it a bit harder than the previous, but still could park within perhaps 2 inches.
- Then 2016 Sonata. This was even more curved than 2001 corolla, however it had a backup camera* and sensors. I was less confidence about distance, lets say 6 inches, however the sensors and camera made a huge difference.
- Then 2018 Model S. The curve is significant and I imagine without the sensors my confidence would be about 12 inches forward, and perhaps 2 feet backward. The cameras and sensors are an absolute necessity.
- Finally 2022 Model S. About similar curve as 2018, although wider in the back, so need to be careful pulling up alongside pillars and garage doors. However one major difference is a lower front spoiler thing. I hit that on curbs all the time when parking forward. Generally I park backward, but sometime it is just much more convenient to pull in forwards. Here the sensors don't actually help and were a forward camera would really make a difference. Yet another stupid design on the Model S refresh. If they going to add a lower front spoiler they could have also added a camera. Kind of like the dumb Yoke that does not steer by wire. Half done job.

*About the Sonata backup camera. I preferred the backup camera than any Tesla. Firstly it was centered, secondly it had yellow and red lines the would bend exactly as to where the car would go, and where it would end up. In addition to cross lines indicating distance. All very useful.
 
  • Informative
Reactions: LorentzTorque
The ultrasonics are more important, especially for me, in the rear direction, where there is only one monocular camera. There's no multiple views.

I back into a space with glass door to park at home. It has reflections, possibly from the car, depending on the lighting conditions. Parking precision is important so I can close the gate and I use the calibrated physical distances from ultrasonic to let me know when I should stop.

It's difficult for me to believe a monocular vision is going to work well.
I also reverse park at home because of charger location. The fit is very tight due to bicycles, tires, shelves, etc. I too found the backup camera insufficient, even the sensors. So I ended up putting white tape on the garage floor. I was hopeful the Tesla would see it as a parking and park itself, or perhaps help with summon, but it did not. It did help me though. I can more easily align the side cameras with the lines and the back line with the back camera :). Now if only we had true 360 view I would not need to do any of that.
 
That's why a key feature of the vision system is object persistence so it can use the position it determined before the object was too close.

FWIW, phased arrays make non-mechanical beam steering possible, but azimuth and elevation resolution is still dependent on antenna size vs frequency.
For a given antenna size and frequency you can increase the azimuth and elevation resolution by adding more of the same antennas, stacked horizontally and vertically. So the beam you are steering gets narrower with more antennas added to the array, in addition to increased gain.
 
That only works for short distances as the eyes are very close together. This is why you can just the distance well out to a meter (for example) vs things out 10, 20, 40 meters. So many other methods besides two eyes and their distance apart (different perspectives) that are being used.
A32Ztw9.jpg
I was trying to simplify it. Certainly, there are other visual ways to _estimate_ distance but all of them are less accurate and more computationally expensive than simple radar.
 
That's why a key feature of the vision system is object persistence so it can use the position it determined before the object was too close.

FWIW, phased arrays make non-mechanical beam steering possible, but azimuth and elevation resolution is still dependent on antenna size vs frequency.
Interesting point about object persistence. It reminds me of “dead reckoning” in navigation. Wondering at what point the system will accumulate so much error that the result is useless. Also, how would that work with objects that were never in the camera frame? The front camera is pretty high and has a pretty good size blind spot. While this is a problem with humans as well, we are much better at the experience-based inference. I.e. I may not see a parking spot threshold but I expect it to be there.
In a few years they could probably get Tesla Vision somewhere close to what other manufacturers already do using various sensors. But at what cost? Instead of fixing real issues (the list is long) they will spend time, money and efforts to get back to where they started (and I am being optimistic here - in a few areas V11 is still worse than V10). How does that make sense?!
 
I have amblyopia (lazy eye). As such while I have 2 fully functional eye balls and both correctable to 20/20 vision, I have absolutely zero stereoscopic vision and because of that non-existent depth perception.

I lean on the USS so much it isnt even funny, and after being able to park my Model Y without fear or stress of hitting things, I'll never see myself purchasing a car without equivalent functionality. I'm hearing great things about 360 imaging radar that could replace USS... but having been born with 2 terrible misaligned cameras and a decent AI processor between them, I have a hard time trusting my car's exterior and surroundings... or your car... to a bunch of cheap low frame rate cameras even with industry bleeding edge AI.
 
in the rear direction, where there is only one monocular camera
Multiple cameras seeing the same spot isn't necessary for Occupancy network to make predictions, but it probably does help the network pick up on the subtle cues of differences among the triple cameras. More important for single camera views would be video/temporal context.

However, your particular situation is probably especially difficult with the glass door, so if Tesla doesn't have that addressed in the near term, you might need to put something on it to make a section of it opaque. Unless the door is really wide, there's probably the door frame or outer edge that can be a proxy as FSD Beta should know it doesn't fit through a door-sized opening. But for Optimus… did somebody need to put something on the glass doors to prevent it from trying to walk through glass? 😝

occupancy glass.jpg


Maybe also requiring a bit more training is generally going in reverse isn't something FSD Beta has needed to handle much of yet, so forward motion for occupancy predictions probably are already fairly well trained. But training for reverse is probably similar process where shadow mode data engine could just send back video after the vehicle was moving backwards.
 
For a given antenna size and frequency you can increase the azimuth and elevation resolution by adding more of the same antennas, stacked horizontally and vertically. So the beam you are steering gets narrower with more antennas added to the array, in addition to increased gain.
Sure, but those stacks make the overall antenna larger (not talking size of each element). Element spacing is usyally a function of wavelength. If the distance between the furthest elements stays the same, you can't improve past the beam of an equivalently sized parabolic antenna.
StackPath
 
Interesting point about object persistence. It reminds me of “dead reckoning” in navigation. Wondering at what point the system will accumulate so much error that the result is useless. Also, how would that work with objects that were never in the camera frame? The front camera is pretty high and has a pretty good size blind spot. While this is a problem with humans as well, we are much better at the experience-based inference. I.e. I may not see a parking spot threshold but I expect it to be there.
In a few years they could probably get Tesla Vision somewhere close to what other manufacturers already do using various sensors. But at what cost? Instead of fixing real issues (the list is long) they will spend time, money and efforts to get back to where they started (and I am being optimistic here - in a few areas V11 is still worse than V10). How does that make sense?!

Error does build, but we are talking of the low total movement/ where cameras can't see case. Since speedometers are better than 2% accurate, wheel distance sensors are better than 3 inches over 10 feet (USS stops reading exacr distance at 12 inches). With the caveat of that being straight travel. Once we add in tight turning, the individual wheel slip/ skid comes more into play and we need to rely more on the accelerometer and gyros in the SRS/ airbag system.
There may be a close in zone between the front and B pillar views that is not viewable. Were this zone occluded on approach with the vehicle then turning around that point, an object would go unsensed, but that seems low probability (if the maneuver is even possible without doing an Austin Powers turn).

As to cost, SW development has zero cost after it is solved, whereas sensor cost exists on every car built. Tesla is currently at > 1 million vehicles per year aiming for 20M in 2023. For every $1 of sensor cost removed, Tesla saves $1 million. if FSD engineers are paid $500k a year, that $1 per vehicle pays for 2 more engineers to solve the replacement problem.
Now consider that the full USS system costs in the hundreds of dollars. For each $100 removed, they can hire 200 more (highly paid) engineers.
Project out to 2030. If Tesla is making 20 million cars, every $100 in part reduction is $2B in profit. If the ramp were linear, total savings from now is ~$8 billion per $100 removed. Averaged over that 8 year span, it could pay for a billion dollars a year in development costs.
 
Error does build, but we are talking of the low total movement/ where cameras can't see case. Since speedometers are better than 2% accurate, wheel distance sensors are better than 3 inches over 10 feet (USS stops reading exacr distance at 12 inches). With the caveat of that being straight travel. Once we add in tight turning, the individual wheel slip/ skid comes more into play and we need to rely more on the accelerometer and gyros in the SRS/ airbag system.
There may be a close in zone between the front and B pillar views that is not viewable. Were this zone occluded on approach with the vehicle then turning around that point, an object would go unsensed, but that seems low probability (if the maneuver is even possible without doing an Austin Powers turn).

As to cost, SW development has zero cost after it is solved, whereas sensor cost exists on every car built. Tesla is currently at > 1 million vehicles per year aiming for 20M in 2023. For every $1 of sensor cost removed, Tesla saves $1 million. if FSD engineers are paid $500k a year, that $1 per vehicle pays for 2 more engineers to solve the replacement problem.
Now consider that the full USS system costs in the hundreds of dollars. For each $100 removed, they can hire 200 more (highly paid) engineers.
Project out to 2030. If Tesla is making 20 million cars, every $100 in part reduction is $2B in profit. If the ramp were linear, total savings from now is ~$8 billion per $100 removed. Averaged over that 8 year span, it could pay for a billion dollars a year in development costs.
No way ultrasonic sensors cost “hundreds” of dollars per car.
 
No way ultrasonic sensors cost “hundreds” of dollars per car.
12 Bosch sensors, color matched
12 brackets
Bumper fascia punch outs
2 wire harnesses each with 6x 3 pole and 1x 12 pole connector, 8 circuits, 2 which are multisplice
Mating harness (unless direct to ECU) each with 2 connectors and 8 circuits.
Possibly 2 ECUs, at a minimum interface circuitry.
Shipping, handling, supplier markup, installation, warranty.

At $8 a sensor, that is $100 on their own. Aftermarket puts then at $50-$60, at 4x markup, that still $12 a piece.

Even at a single hundred, that's $100 million a year in savings, which is a lot of engineering. Equivalent to ~4% of Q2's R&D annualized.
 
  • Like
Reactions: Hayseed_MS