Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

how does the new end to end FSD work, need a block diagram from of data flow from the fleet to DoJO to an individual's car

This site may earn commission on affiliate links.
Absolutely incorrect.
It is prerelease software. It requires monitoring and intervention. It is expected.
Naw, I didn't ask for it, I don't want it, I'm just giving feedback that it can fail 5 times in 100 miles.

It is nowhere near ready for general release.

And it's not actually pre-release at this point, if I understand right?

The logic of "supervised" is self-defeating:

Tesla: "it can drive for you!!! Except where it can't and we don't actually know where that is. So if it's about to crash, intervene!!!"

Also Tesla: "AP has (IDK) x fewer crashes than human drivers!!!"

Me: This human prevented 5 FSDs crashes in 100 miles. AP or FSD did not drive safely, I did.
 
  • Like
Reactions: legendsk
It's not a perception issue if it "sees" the sign and even visualizes it! It's a logic issue. For speed, it's also observing the flow of traffic, which is why it may ignore speed signs. Your speed setting will also change behavior.

Basically there are a ton of issues to address that is unrelated to perception (lane selection, and stopping behavior is mentioned in other threads). If you solve all those, will lidar and/or radar still matter? That is basically the logic Tesla is using.

You can talk about driving in incremental weather, but having a system that works in good weather is already very useful. Plus if the visibility is so bad, you can't drive visually, that's already not safe to drive in, in the first place.
Define "logic". How do you express "logic" in an ML e2e system? ;)

 
  • Like
Reactions: enemji
For one, processing all of those sources takes a lot of compute power.
And when the output of the "camera" gives at least the same as Lidar or radar, why use it.
But the "cameras" give even more output, color, which tends to be very helpful for driving.
The complexity of a large neural net can be surprisingly uncorrelated with the size of its inputs. For example, AlphaGo used a gigantic neural network with tiny inputs. Adding two sensors (radar+lidar) to the existing 7-camera HW4 suite (not counting cabin camera) wouldn’t increase the input bandwidth all that much, relatively speaking.

Also, it’s ironic that you mention color. 5% of the population is colorblind, yet they have no trouble driving. Doubly ironic, radar and Lidar are essentially equivalent to different color channels; radio (4mm) and infrared (1550nm) respectively, that have some superior characteristics for their specific tasks, relative to visible light.
The other sources just don't add information that the car doesn't already know.
They emphatically do, in poor-visibility conditions. (Or even good-visibility ambiguous conditions, like distinguishing a shadow from an obstacle, which pure vision has trouble with.)
Does it make a difference that a car that's about 100 yards from you is 300.02 feet vs 300? Too much information is dangerous.
I'm pretty sure that If I made you look at all the data sources constantly, you wouldn't be able to drive.
It’s not about precision, it’s about disambiguation. Neural nets excel at discarding extraneous information and focusing on the essential features. There is nothing “dangerous” to a NN about “too much” information.
 
Okay, so how does the system determine which is correct? Lidar has absolutely no way to resolve this.
And let me add which is correct "even if there's a 0.0001% chance of an obstacle"
Lidar has every way to resolve this. It generates a 3D point cloud, which would look very different in the shadow case than in the obstacle case, whereas to the camera (or even to the human eye) the difference is far more subtle.
And you already said that the 2016 incident included radar.
The radar “saw” the side of the trailer, but had no way to distinguish it from an overhead sign, and the (human-written) software decided that it was an overhead sign. Lidar would have been able to disambiguate.
 
  • Like
Reactions: JB47394
It's not a perception issue if it "sees" the sign and even visualizes it! It's a logic issue. For speed, it's also observing the flow of traffic, which is why it may ignore speed signs. Your speed setting will also change behavior.
The visualization network and control network are largely separate. That’s why the car sometimes seems to ignore signs (such as left-turn-only) that are obvious in the visualization.
Basically there are a ton of issues to address that is unrelated to perception (lane selection, and stopping behavior is mentioned in other threads). If you solve all those, will lidar and/or radar still matter? That is basically the logic Tesla is using.
Even the best control networks won’t be of much use if you can’t properly see your surroundings. That’s where radar and Lidar come in.
You can talk about driving in incremental weather, but having a system that works in good weather is already very useful. Plus if the visibility is so bad, you can't drive visually, that's already not safe to drive in, in the first place.
The issue is more if you’re already driving in acceptable weather, and the visibility suddenly becomes compromised. (Unexpected downpour, or mud splashes, or sun glare, or sudden fog, or a rogue pigeon with good aim.) The car (particularly L4/Robotaxi) needs to be able to handle these situations safely.
 
I have about 100 miles of free trial 12.3.x, and I had to slam in the brakes about 5x to not actually crash. One time, cars coming down a hill from the right at 45mph, car started to cross, right into their path. Would have been a major T-Bone 1 second later.

Another time it almost crashed into a tree at a fork in the road.

5 critical failures in 100 miles. It needs to go 600k to a million miles. How close is it?

Spokane, Wa.
5 in 100 is at least 10X better than mine. I would go back to driving it if I could get 5 in 100.
 
This is a very good example. That being said, I recall the reason to move away was that even Radar or LiDAR is not that perfect. Even their readings are ambiguous. So now you have to figure out which of the two you will depend on, and when.
This is for the NN to figure out. As shown by the leap from v11 to v12, the NN can do a lot better job at this sort of thing than human coders can. If it turns out that the added sensors don’t add value (in terms of reliability), the NN will figure this out too, and will learn to ignore those inputs. But at least the experiment should be tried, IMO.
 
  • Like
Reactions: JB47394
If it turns out that the added sensors don’t add value (in terms of reliability), the NN will figure this out too, and will learn to ignore those inputs.
That was something I was thinking about as well. What if they loaded a test car down with the kitchen sink of sensors, collected plenty of data with it, then trained the network and looked at the use of the inputs. It would be amazing if the software itself identified which sensors were needed for which driving scenarios.
 
  • Like
Reactions: Ben W
Define "logic". How do you express "logic" in an ML e2e system? ;)

Logic is what happens after perception, namely the planning stage. The system have already "seen" the object, but how it responds to that object is a whole other manner.

The speed example is a good one, when the car sees a speed limit sign (meaning the perception system have already recognized it as a speed limit sign and read the number), whether it follows it depends on the following variables (and maybe more):
1) speed offset or adjustment set by user
2) flow of traffic (general speed other cars are travelling)
3) is it slowing down from a previous higher speed limit (don't want to slam on the brakes)
4) what the map says the speed limit is in the given section of road
5) whether the limit makes sense (people have tested using fake speed limit signs and there is some sanity checks done, for example speed limits that are not in increments of 5mph).

Perception problems is if it doesn't see the object, for example there is a stop or speed sign, but the car doesn't see it. But again radar and lidar doesn't help in those cases, because they don't read signs.

Note Tesla's system is not a big black box. It is a modular system, meaning there are still separate modules to handle the different inputs, plus a perception vs planning module (that's why the existing visualizations still work).
 
Last edited:
The visualization network and control network are largely separate. That’s why the car sometimes seems to ignore signs (such as left-turn-only) that are obvious in the visualization.
That's what I'm getting at. For example, the going straight on a turn lane or going into the "wrong" turn lane (for example it needs to make a right turn right after, but goes to the leftmost lane and then needs to make a lane change again after the intersection, instead of going into the 2nd to left lane and not needing to change lanes), the car sees that it's in a turn lane (and in latter case that both lanes are turn lanes), but it just makes a bad decision. That's not a perception problem (throwing in lidar/radar, or even higher resolution cameras, will do nothing to solve that problem).

Even the best control networks won’t be of much use if you can’t properly see your surroundings. That’s where radar and Lidar come in.
As above, a vast majority of problems currently with FSD, radar/lidar will not help.
The issue is more if you’re already driving in acceptable weather, and the visibility suddenly becomes compromised. (Unexpected downpour, or mud splashes, or sun glare, or sudden fog, or a rogue pigeon with good aim.) The car (particularly L4/Robotaxi) needs to be able to handle these situations safely.
The behavior there is typically just to come to a stop (many times in lane and not necessarily to the side of the road) and under the SAE's L4 definition, that is an acceptable fallback. Waymo vehicles do that too (including pulling to a stop in its own lane).
https://www.sfchronicle.com/bayarea/article/san-francisco-waymo-stopped-in-street-17890821.php
Lidar won't see much better than cameras in that situation, and radar can't see the lane lines so you can't rely on it to continue driving. So you still need to have enough cameras that still have visibility to be able to safely pull to a stop. The likelihood of all cameras being blinded at the same time is not very high (especially for a situation where other sensors are not affected, plus also in a way that the car can't safely pull to a stop based on prior information when it still had visibility).
 
Last edited:
That's what I'm getting at. For example, the going straight on a turn lane or going into the "wrong" turn lane (for example it needs to make a right turn right after, but goes to the leftmost lane and then needs to make a lane change again after the intersection, instead of going into the 2nd to left lane and not needing to change lanes), the car sees that it's in a turn lane (and in latter case that both lanes are turn lanes), but it just makes a bad decision. That's not a perception problem (throwing in lidar/radar, or even higher resolution cameras, will do nothing to solve that problem).
Completely agreed; this is purely a NN/training failure, not a sensor suite limitation, and v12 currently has lots of such failures.
As above, a vast majority of problems currently with FSD, radar/lidar will not help.
True. But we are still orders of magnitude away from L4/L5. Once the low-hanging fruit is solved (largely NN/training), what remains will be largely cases that are hardware-limited. At that point I expect radar/lidar to become much more important for making continued progress on the march of 9's.
The behavior there is typically just to come to a stop (many times in lane and not necessarily to the side of the road) and under the SAE's L4 definition, that is an acceptable fallback.
The definition is that the car should achieve a "minimal risk condition". The example given in the spec of stopping in a lane involves complete system failure; for bad weather and such their examples involve pulling to the side of the road for added safety. To achieve L4, FSD will have to be able to do this at least as well as a skilled human driver. (I.e. in conditions where a decent human driver would not stop in the middle of the road, FSD should not either.)
Waymo vehicles do that too (including pulling to a stop in its own lane).
https://www.sfchronicle.com/bayarea/article/san-francisco-waymo-stopped-in-street-17890821.php
Yes, and they were rightly criticized for this incident (and subsequently announced they planned to make improvements to avoid it) because it was a situation where an ordinary human driver wouldn't have stopped in the middle of the road like that.
Lidar won't see much better than cameras in that situation, and radar can't see the lane lines so you can't rely on it to continue driving.
Cameras can still see the nearby lane lines (even in dense fog they can see a few meters away, which is enough), and combined with maps and GPS and radar to avoid obstacles just out of the cameras' range, this should be plenty of information to be able to pull over safely. And newer lidars can see a bit farther than cameras in dense fog. Radar of course can "see" the farthest in fog. High-resolution millimeter-wave radars such as this one could combine the best properties of both radar and Lidar.
So you still need to have enough cameras that still have visibility to be able to safely pull to a stop. The likelihood of all cameras being blinded at the same time is not very high (especially for a situation where other sensors are not affected, plus also in a way that the car can't safely pull to a stop based on prior information when it still had visibility).
The question is whether "not very high" is "< 0.0001%". Beyond a certain point, any L4 failure will feel like a black swan event. But they should still be as rare as possible. IOW, the car should only have to pull to a stop when other drivers are doing the same.
 
Completely agreed; this is purely a NN/training failure, not a sensor suite limitation, and v12 currently has lots of such failures.

True. But we are still orders of magnitude away from L4/L5. Once the low-hanging fruit is solved (largely NN/training), what remains will be largely cases that are hardware-limited. At that point I expect radar/lidar to become much more important for making continued progress on the march of 9's.
The NN/training is not the low hanging fruit, it's the hardest part! It's the reason why even cars loaded to the gills with sensors can still fail. The idea Tesla is operating on is that if that part is solved, then radar/lidar become nice-to-haves instead of necessities. They can simply operate in a more narrow ODD if necessary.
The definition is that the car should achieve a "minimal risk condition". The example given in the spec of stopping in a lane involves complete system failure; for bad weather and such their examples involve pulling to the side of the road for added safety. To achieve L4, FSD will have to be able to do this at least as well as a skilled human driver. (I.e. in conditions where a decent human driver would not stop in the middle of the road, FSD should not either.)
But the spec critically does not require you to pull to the side, it's optional. That's why even Waymo and Cruise stops in lane plenty of times even for incidents that are not complete system failures (the link I posted is bad weather).
Yes, and they were rightly criticized for this incident (and subsequently announced they planned to make improvements to avoid it) because it was a situation where an ordinary human driver wouldn't have stopped in the middle of the road like that.
Cameras can still see the nearby lane lines (even in dense fog they can see a few meters away, which is enough), and combined with maps and GPS and radar to avoid obstacles just out of the cameras' range, this should be plenty of information to be able to pull over safely. And newer lidars can see a bit farther than cameras in dense fog. Radar of course can "see" the farthest in fog. High-resolution millimeter-wave radars such as this one could combine the best properties of both radar and Lidar.
The point is you still need cameras to be working to even safely pull over. Neither lidar nor radar can replace that. In a scenario that blinds all cameras, having those won't solve the problem.
The question is whether "not very high" is "< 0.0001%". Beyond a certain point, any L4 failure will feel like a black swan event. But they should still be as rare as possible.
I imagine a situation where all cameras are blinded, yet radar/lidar is not affected, and the car not being able to pull to a stop based on prior known info is going to be pretty much on the order of complete system failure.
IOW, the car should only have to pull to a stop when other drivers are doing the same.
Ideally that is the case, but it's very much not the case for current L4 cars and that is okay for operation at the current stage of technology (and is expected to continue indefinitely for example phoning in for remote assistance).
 
The complexity of a large neural net can be surprisingly uncorrelated with the size of its inputs. For example, AlphaGo used a gigantic neural network with tiny inputs. Adding two sensors (radar+lidar) to the existing 7-camera HW4 suite (not counting cabin camera) wouldn’t increase the input bandwidth all that much, relatively speaking.

Also, it’s ironic that you mention color. 5% of the population is colorblind, yet they have no trouble driving. Doubly ironic, radar and Lidar are essentially equivalent to different color channels; radio (4mm) and infrared (1550nm) respectively, that have some superior characteristics for their specific tasks, relative to visible light.

They emphatically do, in poor-visibility conditions. (Or even good-visibility ambiguous conditions, like distinguishing a shadow from an obstacle, which pure vision has trouble with.)

It’s not about precision, it’s about disambiguation. Neural nets excel at discarding extraneous information and focusing on the essential features. There is nothing “dangerous” to a NN about “too much” information.


So I assume that you work for Tesla and Tesla is about to announce that they are going to enable the external radar and add lidar.

But since I doubt that is true, all I can assume is that you are armchair quarterbacking and have very little knowledge to what is actually occurring.

Tesla didn't remove the radar by accident. They didn't remove the USS by accident. And they have basically shown that vision without lidar works.

Visualization really isn't a problem at this point. (unless you have evidence otherwise)
 
Completely agreed; this is purely a NN/training failure, not a sensor suite limitation, and v12 currently has lots of such failures.

True. But we are still orders of magnitude away from L4/L5. Once the low-hanging fruit is solved (largely NN/training), what remains will be largely cases that are hardware-limited. At that point I expect radar/lidar to become much more important for making continued progress on the march of 9's.

The definition is that the car should achieve a "minimal risk condition". The example given in the spec of stopping in a lane involves complete system failure; for bad weather and such their examples involve pulling to the side of the road for added safety. To achieve L4, FSD will have to be able to do this at least as well as a skilled human driver. (I.e. in conditions where a decent human driver would not stop in the middle of the road, FSD should not either.)

Yes, and they were rightly criticized for this incident (and subsequently announced they planned to make improvements to avoid it) because it was a situation where an ordinary human driver wouldn't have stopped in the middle of the road like that.

Cameras can still see the nearby lane lines (even in dense fog they can see a few meters away, which is enough), and combined with maps and GPS and radar to avoid obstacles just out of the cameras' range, this should be plenty of information to be able to pull over safely. And newer lidars can see a bit farther than cameras in dense fog. Radar of course can "see" the farthest in fog. High-resolution millimeter-wave radars such as this one could combine the best properties of both radar and Lidar.

The question is whether "not very high" is "< 0.0001%". Beyond a certain point, any L4 failure will feel like a black swan event. But they should still be as rare as possible. IOW, the car should only have to pull to a stop when other drivers are doing the same.

How come RADAR could not "see" a telephone pole?

 
RADAR is best for showing moving objects, and relative speeds between objects. LIDAR is what is needed here - LIDAR would have shown the pole.
So a truck jackknifed across the road would not be picked up by radar. It would need to be LiDAR which is also using light (like a camera) to identify objects. So in poor weather visibility, having LiDAR and RADAR is an awesome technology to have. or not?
 
  • Like
Reactions: Ben W
coming from a lifetime of "if then else" computer programming and Boolean hardware logic it is difficult to understand how this works. So here is what I can come up with in understanding. BTW I really love seeing it function especially at unprotected left turns. It is unbelievable! So the computer in the car sees a situation from its cameras and relates that to a situation that match a situation that has been downloaded from DoJo. Hold on, there has to be some "logic" or Neural nets in the car that can determine, for example the speed of the cars approaching and spacing of cars on the left and right to determine if it is safe to proceed. Also, BTW, I have observed the car doing this much faster than I can. Those are stressful situations. IMHO the average person has no clue the value of this and others until they experience it half a dozen or more times.
Deep down the NN is as deterministic as your traditional logic (since you can code an NN in any programming language if you want, though for performance reasons they are usually run on custom hardware). The difference is that an NN deals in probabilities, using a complex net of interconnected nodes that feed results from one layer to the next (doing weighted matrix math on the way). You feed the NN inputs, and the NN outputs its determination of how closely these inputs match a set of pre-determined criteria. (I’m simplifying a lot here.). These criteria are expressed as fixed parameters of the NN, but these parameters are not hard coded. Instead they are determined by a training process that contains a large selection of sample inputs and the desired output in each case. This is largely automatic, but requires a large set of sample data to ensure the network accurately generates the desired results (and lots of computing power to grind through the large set of parameter variables). The result is a trained network that can (say) recognize a car in a photograph (and, just as importantly, say when there is NOT car in the photo). When properly trained, such a network can distinguish “car-like” things that it has never seen before (different angles, lighting, sizes etc). The inputs and outputs of course, dont have to be static, but might include determining (say) the correct path to take when negotiating a junction, or if two moving objects will collide based on their current velocities.
 
  • Like
Reactions: Ben W
The NN/training is not the low hanging fruit, it's the hardest part! It's the reason why even cars loaded to the gills with sensors can still fail. The idea Tesla is operating on is that if that part is solved, then radar/lidar become nice-to-haves instead of necessities. They can simply operate in a more narrow ODD if necessary.
NN/training is the limiting factor as of today, with MTBF in the tens of miles. As the NN/training improves, hardware will become the limiting factor. For Robotaxi to be a success as Tesla envisions it, the ODD will have to be pretty wide. I don't think they can get there with pure vision.
But the spec critically does not require you to pull to the side, it's optional. That's why even Waymo and Cruise stops in lane plenty of times even for incidents that are not complete system failures (the link I posted is bad weather).
As mentioned, they are rightly criticized when this happens, and take steps to fix it, because it is worse than a human would do in those situations. L4 systems need to not only technically meet the bare minimum of the spec, they need to do the task well enough to be accepted by society without backlash.
The point is you still need cameras to be working to even safely pull over. Neither lidar nor radar can replace that. In a scenario that blinds all cameras, having those won't solve the problem.
In practice, a multi-camera system can be mostly impeded (by fog or rain or dirt) but not completely blinded; no one is putting duct tape over the lenses. The vision generally stays good enough to see close-up details like lane lines a few meters away; what it can't do is see far-away details well enough to actually drive at the speed that a human could (or that a "good" L4 system would be expected to) in adverse conditions. Right now it just throws up the Red Hands of Death when the ODD is exceeded; they will have to figure out how to program it to reliably safe itself in these instances. (Maybe it already tries to; I haven't yet experimented with letting it keep driving in these scenarios.)
I imagine a situation where all cameras are blinded, yet radar/lidar is not affected, and the car not being able to pull to a stop based on prior known info is going to be pretty much on the order of complete system failure.
If it's a situation where an ordinary human would not fail, then it is a sensor limitation failure, not a "system failure" (which typically implies a literal software bug/crash or a hardware fault).
Ideally that is the case, but it's very much not the case for current L4 cars and that is okay for operation at the current stage of technology (and is expected to continue indefinitely for example phoning in for remote assistance).
Teslas do not have the ability to be driven remotely. Tesla may have to have an AAA-like fleet of backup drivers roaming the city to rescue stuck Robotaxis for a while, which will be somewhat embarrassing if it keeps happening in conditions that humans wouldn't have problems with.