Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

FSD Predictions for 2020

This site may earn commission on affiliate links.
It doesn’t visualize a green light. Exactly what is visualized and what is seen might be different things. For example if
Probabilities for RedYellowGreen are
R: 0.1
Y: 0
G: 0.9

Will it visualize it as green? Or did they set the cutoff to 0.99? As for deciding when to drive, is it purely based on the traffic light or also from other vehicles, pedestrians, time since green or a lot of other factors? We don’t know, it is hidden in the neural network. The traffic light output and it’s feature layers from the hydra net into the RNN is just one of many inputs.

But this is a good example of data that Tesla might decide to collect, situations when the FSD is uncertain and the driver decides to drive. They could just autolabel it to be whatever the driver did.
First it is showing Red - and at 1:39 it turns gray.

As to when to start driving - I think initially they will ask the driver to press the accelerator to take the car off the hold and get going.
 
My prediction is that FSD can decipher all this and park the car and not result in getting a ticket. Or towed.

LA_parkingsigns.jpg
 
Doesn't recognize the green light at 1:33. Let us know if you figure out why
Oh, in this case the traffic light visualization goes from "red" to "no color" even while the light is still red and car pulls forward, so this probably isn't really the neural network classified it as the wrong thing. It's more likely that the light wasn't fully in view of the main camera anymore so the neural network didn't even have a traffic light in its input to classify, but the visualization knows to persist the light even when getting closer, e.g., passing through the intersection.

The fisheye "wide" camera can see higher and would be able to see this traffic light, but it looks like the deployed Autopilot doesn't visibly use the fisheye while driving yet. This is probably the same reason why Autopilot has trouble with tight curves that are common in residential and hilly areas. One guess is that Tesla is training the main camera to be very accurate and then generate training data for the fisheye camera by doing some math to take into account the visual distortion.

Here's a screenshot from Autonomy day showing the main camera not seeing the left traffic light while fisheye sees it easily:
main vs fisheye.png
 
Oh, in this case the traffic light visualization goes from "red" to "no color" even while the light is still red and car pulls forward, so this probably isn't really the neural network classified it as the wrong thing. It's more likely that the light wasn't fully in view of the main camera anymore so the neural network didn't even have a traffic light in its input to classify, but the visualization knows to persist the light even when getting closer, e.g., passing through the intersection.

The fisheye "wide" camera can see higher and would be able to see this traffic light, but it looks like the deployed Autopilot doesn't visibly use the fisheye while driving yet. This is probably the same reason why Autopilot has trouble with tight curves that are common in residential and hilly areas. One guess is that Tesla is training the main camera to be very accurate and then generate training data for the fisheye camera by doing some math to take into account the visual distortion.

Here's a screenshot from Autonomy day showing the main camera not seeing the left traffic light while fisheye sees it easily:
View attachment 493280

I don't think so, since the driver doesn't seem to pull forward until after he says "switched to green." Also, you can tell that by the movement of the map.
 
Something else to consider. Karpathy was talking about having a "black box" that takes in sensor input and spits out driving policy. That completely skips vector space visible by humans, and may require HW4. The advantage that has is you can train the NN with both sensor and driver control input, which Tesla has loads of. The current "conventional" method of having a NN generate vector space requires human coded driving policy, which can be quite the bottleneck. To make matters worse, different regions have different driving cultures. Will that mean different driving policy will be required for each area?
The problem is, Tesla has loads of bad driver inputs. The NN will learn how to drive aggressively, miss red lights and do ludicrous launches.
 
  • Funny
Reactions: ZappCatt
I don't think so, since the driver doesn't seem to pull forward until after he says "switched to green."
He says "switched to green" at 1:45 when the visualization isn't showing a color already. At 1:38, both lights are already showing no color and the speedometer is still 6mph and reaches 0mph at 1:42, so the car travels at least a few feet forward after already losing sight of the traffic lights.
 
The problem is, Tesla has loads of bad driver inputs. The NN will learn how to drive aggressively, miss red lights and do ludicrous launches.
Karpathy mentioned learning only from good drivers:
This is apparently a much easier problem than actually getting end-to-end NN to work.

Decent drivers generally "cluster" - so you can ignore the outliers when training. In the AI podcast, IIRC, George Holtz talks about this.
 
He says "switched to green" at 1:45 when the visualization isn't showing a color already. At 1:38, both lights are already showing no color and the speedometer is still 6mph and reaches 0mph at 1:42, so the car travels at least a few feet forward after already losing sight of the traffic lights.

I see what you're saying, but again, my point still holds: car doesn't recognize the green light at the stop line for the traffic light.
 
He says "switched to green" at 1:45 when the visualization isn't showing a color already. At 1:38, both lights are already showing no color and the speedometer is still 6mph and reaches 0mph at 1:42, so the car travels at least a few feet forward after already losing sight of the traffic lights.
Losing sight of the lights before even stopping at the stop line is a problem.
 
Losing sight of the lights before even stopping at the stop line is a problem.
Yeah, it seems like Autopilot development using fisheye and pillar cameras is definitely lagging behind the main camera. Elon Musk said in the 2019Q3 earnings call that they are currently focusing on traffic lights and windy narrow roads -- both of which seem to require using these additional cameras to handle correctly.

Here's another screenshot of what the cameras can see from one of greentheonly's Tokyo videos:
greentheonly tokyo street lights.jpg
Here the car is almost through the intersection entering the crosswalk, and notice how the main camera (bottom middle) sees nothing related to the traffic light while the fisheye camera (top middle) easily sees the green light and pole. The street light (??) at the right edge of the main camera sees only 2 horizontal rungs while the fisheye sees all 6 rungs plus the lights at top, so the problem is not a sensor issue. Interestingly, the left pillar camera (top left) sees the green light overhead too, so maybe it could be used to detect traffic lights in some cases, but this might have too many false positives for other direction's traffic lights (e.g., right pillar sees red light behind the van).
 
Caveat: George Hotz (the President of Comma AI) said that it was just a theory; that he suspects it's true but doesn't know for certain. (I think the discussion of imitation learning starts around the 1-hour mark.)
Are you going from memory ? I don't recall him saying theoretically that's the case. Either way - given the challenges with getting end-to-end to work, I suspect weeding out bad drivers is a relatively easy problem.
 
Are you going from memory ? I don't recall him saying theoretically that's the case.

The exact time code is 1:03:51.

Either way - given the challenges with getting end-to-end to work, I suspect weeding out bad drivers is a relatively easy problem.

End-to-end imitation learning is different from mid-to-mid imitation learning. End-to-end uses raw pixels and mid-to-mid uses representations produced by a computer vision neural network. These representations are what the FSD Visualization Preview shows on the screen.

What Tesla is doing is mid-to-mid imitation learning. At least for the majority of cases, such as lane changes.

Arguably path prediction is end-to-end imitation learning.
 
More on mid-to-mid imitation learning:

Waymo's ChauffeurNet: mid-to-mid imitation learning

Yes - basically send raw output from all the sensors and NN tells you how to steer the car.

Come to think of it, I don't think path prediction is quite end-to-end since it's only one component in the overall planner.

Also, I'm not sure whether path prediction was trained on raw pixels or representations like road edges.
 
The exact time code is 1:03:51.



End-to-end imitation learning is different from mid-to-mid imitation learning. End-to-end uses raw pixels and mid-to-mid uses representations produced by a computer vision neural network. These representations are what the FSD Visualization Preview shows on the screen.

What Tesla is doing is mid-to-mid imitation learning. At least for the majority of cases, such as lane changes.

Arguably path prediction is end-to-end imitation learning.

They don't do mid-to-mid. In-fact there is no evidence of them doing any mid-to-mid. The lane changes are completely classical control cpu algorithms. The path prediction video is supervised learning/end-to-end as you have said. Its the same way others also do path prediction or as Mobileye also refers to it as "holistic path planning" or "Holistic Lane Centering".

"Typically, a very large number of images are provided to the trained system during the training phase, and for each image a prestored path of the vehicle ahead of a respective present location of the vehicle is provided. The prestored path can be obtained by recording the future locations of the vehicle along the road on which the vehicle was traveling while the image was captured."

Yes - basically send raw output from all the sensors and NN tells you how to steer the car. That is more difficult than filtering out bad drivers.

Precisely this is how Mobileye's Path Prediction network works and they train it using raw images and future trajectory of the host car. Mobileye calls this "an end-to-end deep neural network, which may be trained to predict the correct short range path from an input image"


Here's one of their patent about it.
YaWz2wU.png
 
Last edited:
Come to think of it, I don't think path prediction is quite end-to-end since it's only one component in the overall planner.

Also, I'm not sure whether path prediction was trained on raw pixels or representations like road edges.
There will always be, practically speaking, some elements of procedural code. NOA, for example, could use overall path given by routing using maps and traffic conditions - but within that path short distance driving is achieved by end-to-end NN. The above post links to MobilEye that calls it "an end-to-end deep neural network, which may be trained to predict the correct short range path from an input image".