Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Neural Networks

This site may earn commission on affiliate links.
What does Karpathy mean in this video at 11:45 when he says “Build a single model”?


At 12:28, he re-states it: “Train 1 model to solve all tasks”.

3LBR6NR.png

I think he means there is only one model/ NN that all developers are working with. There is not a driving model, sign recognition model, oject recognition model, lane recognition model that are later combined. Only one big NN. That means the training data and tests must be part of the repository and, at each check in, the entire network gets rebuilt from scratch based on the latest changes and validated against all test cases.
No post modification integration of different group's efforts.
 
Some interesting numbers:

Traffic sign recognition: neural networks vs. humans

Question for @jimmy_d: if AKnet_V9 has 5x as many parameters as the previous version of AKnet, then how many parameters did the old AKnet have relative to GoogLeNet? GoogLeNet has about 7 million parameters. If the old AKnet is the same, then AKnet_V9 has 35 million parameters.

In a previous post, you said Tesla increased the parameters “more than 2x”. 7 million * 2.5 = 17.5 million. * 5 = 87.5 million.

That doesn’t seem that big compared to these other neural networks that have 100 million, or 150 million, or 860 million parameters. And if I’m not mistaken, they all take less than 20 GFLOPs to run, so you could run one for each camera with less than 160 GFLOPs. Drive PX2 AutoCruise has 4 TFLOPs or 4,000 GFLOPs, so why is HW3 necessary if AKnet_v9 has under 100 million parameters?

Yes, frame rate matters, but there's a more important item that you might be missing. Parameters in a CNN get used many times and how many times they get used per frame depends on the network shape. A single CNN parameter gets used HxWxC times per frame. H = layer input height, W = layer input width, C = layer input channels. For a network like inception all the layer H and W scale with the input frame H and W. The layer C values are unbounded - they can go very large without hitting utility boundaries. In the case of AKnet_V9 H and W are much larger than earlier versions (more than 2x) and C is generally also about 2x, so AKNET_V9 takes about 10x as much computation per parameter as the V8 networks (this varies by camera - the side cameras only get 1/4 as much processing as the front/rear cameras).

So if you have 5x as many parameters and 10x as much computation per parameter you need 50x as much computation. Those numbers are only for illustration by the way - just wanted to convey the notion that the layer I/O geometry is very important.
 
What does Karpathy mean in this video at 11:45 when he says “Build a single model”?


At 12:28, he re-states it: “Train 1 model to solve all tasks”.

3LBR6NR.png

This talk is an extended simile that is explaining a proposed approach to developing SW2 (NNs in this case) to test driven development and it relates many of the recommended practices to conventional programming best practices. In the section you cite Karpathy is saying that, similar to how using a single repository for code and minimizing forks is best practice in SW1, in SW2 you want to pursue the similar objective of bringing all your improvements back to your core code branch whenever possible. In practice this means that rather than making lots of versions of a core model which are optimized for different tasks you instead push all of those requirements back into a single core model and work towards having that core model perform as many of the tasks as possible.

Of course, it's a simile and the benefits aren't exactly the same, but the general concept is: allow all the different functions that your code performs to each benefit from improvements in the others. For an NN this means that you want all the different outputs to each benefit from the training, optimization, and abstractions developed by all of the others. To do this in a simple way means to have a bias towards building a single NN to do lots of jobs rather than separate NNs for different jobs.

Notably, this is one of the key differences that AKNET_V9 shows compared to the V8 networks. In V8 (and incidentally in the networks that I believe are currently running V9) there are separate camera networks for each kind of camera - a total of six for main/narrow/fisheye/pillar/repeater/backup - and all of them are different. But AKNET_V9 is a single network that takes in all the cameras and generates several different output streams.
 
" One common misconception in this space is that developing a self-driving system is “just about the data”, with the implicit assumption that the team with the most data will win. Our experience suggests this is not the case. Pursuing this view can lead to the generation of tremendous numbers of low-value autonomy miles. Self-driving cars can generate terabytes of data per hour, far more than is useful to process. The teams that don’t thoughtfully scale data pipelines that extract value will drown in data and operational complexity."

- Chris urmson, Aurora / Creator of Google's self driving car.
 
  • Like
Reactions: Inside and DanCar
Yes, frame rate matters, but there's a more important item that you might be missing. Parameters in a CNN get used many times and how many times they get used per frame depends on the network shape. A single CNN parameter gets used HxWxC times per frame. H = layer input height, W = layer input width, C = layer input channels. For a network like inception all the layer H and W scale with the input frame H and W. The layer C values are unbounded - they can go very large without hitting utility boundaries. In the case of AKnet_V9 H and W are much larger than earlier versions (more than 2x) and C is generally also about 2x, so AKNET_V9 takes about 10x as much computation per parameter as the V8 networks (this varies by camera - the side cameras only get 1/4 as much processing as the front/rear cameras).

So if you have 5x as many parameters and 10x as much computation per parameter you need 50x as much computation. Those numbers are only for illustration by the way - just wanted to convey the notion that the layer I/O geometry is very important.

Sorry this is wrong (doh). Parameters get used HxW times (not HxWxC). Kernels are KxKxC in size (so KxKxC parameters per kernel) and they get used on a HxWxC input frame. So total parameters in a set of kernels for a layer is KxKxCixCo (Ci is input frame channels and Co is output channels - one kernel per output channel). Computation for that layer is KxKxCixCoxHxW MACs - so the ratio of computation to parameters is just HxW. For a single layer.

So AKNET_V9 is using parameters more heavily than V8 networks by a ratio of (1280x960) / (640x416) for half of the cameras and (640x480) / (640x416) on the other half just based on the frame size.

But of course there's more to it. Then first two inception layers on V8 have 4 input frame reductions (from 640x416 to 320x208 to 160x104 to 80x52) V9 only has 3 reductions so most of the layers in V9 are an additional 4x more parameter use than V8 beyond just the camera frame resolution difference because V9 does less frame resolution reduction than V8.

The net result is AKNET_V9 is using 600Gops across 66M parameters for a 1280x960 frame - on the order of 10,000 ops per parameter). V8 uses 17Gops across 13M parameters, which is closer to 1000 ops per parameter. I'm rounding off these numbers because there are various complicating factors that I don't want to bother with here and these numbers are just intended to be illustrative.

When you add in the fact that AKNET_V9 is processing more cameras and that V8 uses even smaller parameter files for the side cameras (AKNET_V9 uses one model for all cameras) you find that AKNET_V9 needs about 10x more aggregate computation than a V8-style implementation that uses all cameras (which is what I think is currently being used in recent versions of the "V9" firmware).

Which is about the difference between HW3 and HW2. That might not be a coincidence.

I should probably make up a name for the non-AKNET_V9 networks which are included in the V9 firmware to avoid confusion. How about V9-non-AKNET_V9? It rolls right off the tongue.
 
I’m looking through old posts in this thread and I found this prescient snippet from April 12, 2018:

I'm hoping that recent performance tells us that on-ramp-to-off ramp could come soon (this year), and conceivably whitelisted highway L3 after that (next year), but that's such a different thing than FSD on surface streets that, IMHO, we have no idea at all how far away it is.
 
how is it on-ramp/off-ramp when its only suggesting lane changes...

I am once again sensing that with some folks the most optimistic intepretation of any Tesla news is the choice and for competition the most pessimistic interpretation.

I mean how else can one explain the solid enthusiasm for on-ramp-off-ramp-Level-3-next-year Navigate on Autopilot while at the same time voicing repeated concern for Waymo’s autonomous stumbles and unclear leadership position. First interpretation is exceedingly optimistic and the latter is exceedingly pessimistic.

I wish there was a bit more balance.
 
how is it on-ramp/off-ramp when its only suggesting lane changes...

It does take exits also, without driver confirmation even. Sometimes, though, it swerves into a short emergency pull-over lane at full speed, thinking it's an exit, also without driver confirmation. I'm not a YouTube personality so I have no viral videos to show, but NOA has tried to kill me several times. For a while I used it whenever possible out of curiosity, but I've quit using it entirely. It's just too frightening sometimes, and stressful all the time.
 
I'm not a YouTube personality so I have no viral videos to show, but NOA has tried to kill me several times
It's hard to take your analysis seriously when you make such obvious hyperbolic statements. This is characteristic of polarized, black-and-white positions. And you don't have to be a "YouTube personality" to post a simple video from the new TeslaCam functionality available to all owners now.
 
It's hard to take your analysis seriously when you make such obvious hyperbolic statements. This is characteristic of polarized, black-and-white positions. And you don't have to be a "YouTube personality" to post a simple video from the new TeslaCam functionality available to all owners now.

"Tried to kill me" should be taken to be tongue in cheek. Obviously I do not seriously impute murderous motives to my car. It's hard to take your criticism seriously when you choose to criticize such trivial things.

What I mean is that it has done dangerous things requiring me to take over immediately. I suppose I should have captured the dashcam but I'm not used to even having that option, and anyway the things it has done would not show up very well on a forward dashcam -- it has come close to slamming me into a concrete barrier sideways by going into an exit too early and too quickly for example, and it routinely does sudden braking maneuvers that could cause another car to rear-end me. Neither of those would show up well on the forward dash cam.

Suffice it to say that I do not enable the feature anymore, though I certainly would if I thought it were (a) useful, and (b) safe. It has proven to be neither of those things for me, no matter what type of highway I try it on.
 
  • Informative
Reactions: electronblue
Back to the topic of neural networks. I contacted Amir Efrati at The Information, but I have not been able to get to the bottom of this. Is Tesla working on a neural network for path planning? If so, is Tesla using some form of imitation learning to train its path planning neural network?

The full scoop: Tesla AI and behaviour cloning: what’s really happening?

A most curious and puzzling case...
3LBR6NR.png
 
After analyzing the options, I think the calibration phase for the cameras is to allow matching the main and narrow cameras to a high enough accuracy to enable stereo vision.
Hi Jimmy, this comment of yours is about 1 yr old but, that's where I am at the moment is learning what I can on Tesla's NN. Given that, your quote above, what would you guess would be the difference good/bad between calibration at night and calibration during daylight? I drove home at night (9:30)after picking up my car in Mt Kisco, NY.
 
  • Helpful
Reactions: strangecosmos
When I took delivery of my car (at end of March) first drive from SC home (~180 miles) I had no AP, and then next day it was working. This pretty much reflects all other reports from about that time I believe.
Yes, I realize I am replying to your old post, sry. An earlier this year on-line PDF of the manual for M3, said after calibration the car had to be stopped then restarted before EAP was active. The current one, and mine delivered last month, simply required park then drive to enable EAP. I pulled into breakdown lane, put in park, then resumed on Drive on Nav.
 
  • Informative
Reactions: strangecosmos
Hi Jimmy, this comment of yours is about 1 yr old but, that's where I am at the moment is learning what I can on Tesla's NN. Given that, your quote above, what would you guess would be the difference good/bad between calibration at night and calibration during daylight? I drove home at night (9:30)after picking up my car in Mt Kisco, NY.
Calibration is getting enough data to be able to get a good estimate for how the camera is mounted and exactly what focal length etc the camera has.

Camera Calibration - Geometry of Image Formation | Coursera
 
It does take exits also, without driver confirmation even. Sometimes, though, it swerves into a short emergency pull-over lane at full speed, thinking it's an exit, also without driver confirmation. I'm not a YouTube personality so I have no viral videos to show, but NOA has tried to kill me several times. For a while I used it whenever possible out of curiosity, but I've quit using it entirely. It's just too frightening sometimes, and stressful all the time.

It's quite easy to think that a simple added logic is a huge leaps towards self driving.

Simple logic such as change lane to the new lane if navigation says that new lane is an exit lane or when you approach a lane fork change lanes to left and right based on navigation.

If anyone considers that to be the state of the art of highway autonomy (as trent does and frankly does for anything Tesla related) then boy are we in for a long ride.
 
  • Like
Reactions: SpotfireY