Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Neural Networks

This site may earn commission on affiliate links.
Likewise Tesla's network is likely to be at exactly the limit of what HW3 is capable of providing and will be selected from the type of network architecture that their chip will be most efficient at processing.

I would have thought it would be the inverse: they develop a network architecture that they think can do the job, and then design the hardware to run that architecture.

Maybe we will see a new architecture for HW4.
 
I would have thought it would be the inverse: they develop a network architecture that they think can do the job, and then design the hardware to run that architecture.

Maybe we will see a new architecture for HW4.

They certainly don't ignore the target application when they design the hardware. But the precise details of the hardware will be tuned to optimize the economical throughput for a set of operations that are a good match for the target application. You get into the ballpark of what you think is needed with the best compute/IO/storage ratios for your target.

Once you have the hardware in hand you might as well use 100% of its capability and, generally, bigger is better for NN capacity, so you end up making an NN that takes as close to 100% of the hardware capability as is easy to accomplish.
 
  • Informative
Reactions: StefanSarzio
If you're looking at GPUs doesn't it make sense to consider TPUs?

Cloud TPUs - ML accelerators for TensorFlow | Cloud TPU | Google Cloud

I'd think the fastest way to explore a large training space would be to map it across a number of preemptible / spot-priced TPUs, saving intermediate results to a cloud database.

Yeah, cloud TPUs are beautiful.

I actually got offered the use of a TPU Pod for a month (105 TPUs, I think) and it's so very tempting.

But I'm doing RL these days so gradient descent isn't the main problem I'm facing. Sigh.
 
  • Funny
Reactions: Tezlanian
Does anyone have experience with or knowledge of using GPUs on AWS? I recently read that Google Brain used ~45,000 GPU hours to use AutoML/Neural Architecture Search to create a neural network architecture called NASNet. I looked up AWS pricing for GPUs and it looks like it's 40 cents per hour. So, 45,000 GPU hours would only be $18,000.

Even for a small startup, that seems reasonable. For a big company, you could afford to pay much more. If you wanted to use 1000x as much computation as Google Brain, if you wanted to use 45 million GPU hours, it would cost $18 million. For Tesla, that feels like a drop in the bucket.

Does that sound right? This cost seems ridiculously low.

Interesting also when you look at Efficient Neural Architecture Search (ENAS), which is attempting to bring the computation cost of Neural Architecture Search (NAS) down by 1000x. If ENAS can achieve equally good results as the NAS that Google Brain used, then with 45 million GPU hours you could do 1,000,000x as much search as Google Brain. Crazy.

Say Tesla really wanted to go nuts with AutoML and spend $180 million. That wouldn't be unfeasible; Tesla could still stay profitable and cash flow positive if it spent another $180 million on R&D in one quarter. With regular NAS, it could do 10,000x as much search as it took to find NASNet. With working ENAS, it could do 10,000,000x more.

Unless I'm getting the actual AWS pricing wrong. So please let me know!

Incidentally, this is why I want mad scientist Elon to stay in control of Tesla. Or at least for Tesla to have a Board that gives Elon the freedom to run the company. I have the feeling that many of the Boards of public companies (outside of the tech world, at least) would lack the imagination to approve of this kind of spending on a mad science project. Yeah, Elon is crazy, and I goddamn hope he stays that way.

Would still be vastly cheaper to have your own hardware. Cloud only makes sense if your needs are transient or highly elastic. Tesla fits neither. Also keep in mind that Tesla has designed hardware for inference in their fleet. This means they probably have their ASICs to handle training internally. I, of course, have no evidence of this but it would make operational and financial sense. Google's TPU can be used for both training and inference btw. Training is orders of magnitude higher cost so the ROI on ASICs is great there.

Funny that there is a chip called the Nvidia Tesla P100. Did we run out of names?

It's actually a pretty old product line. Not sure which was first offhand but I definitely knew of Nvidia's Tesla products before Tesla the car. That said, Tesla Inc's products certainly more embody Nikola.

You might want to consider that the final training run for a job is a small fraction of the total time used to develop a neural network. Often you run many variations of a training run over a long period of time before you find a formula that works and are able to generate the network you want. The advantage to having a lot of machines is that you can get faster turn around on your tests. If you have to do dozens of trial runs before you get what you're looking for and each one of those takes a week or more then the calendar time needed becomes excessive.

Large organizations often have a pool of resources that a bunch of researchers share for running big jobs and smaller dedicated machines (think 4 to 16 GPUs) for individual researchers. You can generally work out ideas for how you want to go about accomplishing something on a small machine. You go to the big machine once you have a pretty good plan and want to throw a lot of resources at it.

You're of course referring to the trial and error tweaking variables and pruning data (the part that keeps from enjoying the NN space professionally, no offense, <3 your dedication to your interest, each to their own). This also makes me wonder how much of Tesla's autopilot utilizes transfer learning techniques? Is there an anointed fully connected layer with the minions training the bottom? Apologies if this has been answered elsewhere.

I would have thought it would be the inverse: they develop a network architecture that they think can do the job, and then design the hardware to run that architecture.

Maybe we will see a new architecture for HW4.

I am not sure what new ops would be relevant here so much as simply growing the redundancy of the same type of execution units. Ergo, make the ASIC bigger. There shouldn't be fundamental changes to the ops, @jimmy_d can correct me if I'm off base. I don't follow this space as well as I should.
 
I am not sure what new ops would be relevant here so much as simply growing the redundancy of the same type of execution units. Ergo, make the ASIC bigger. There shouldn't be fundamental changes to the ops, @jimmy_d can correct me if I'm off base. I don't follow this space as well as I should.

Oh, I meant a new neural network architecture, not a new chip architecture! I can see how that was confusing.

I’m just spitballing and wondering if HW4 comes out in 2-3 years. Perhaps in the meantime, Karpathy’s team uses AutoML to design a new neural network architecture that outperforms AKnet. Maybe they’ll call it OKnet. Once they have the NN architecture decided, they’ll make sure the hardware teams designs the HW4 chip to run OKnet.

AKnet is already designed to run on HW3, and HW3 is designed to run AKnet, so it might be hard to put a brand new NN architecture in HW3. Especially if it’s much bigger.

I wonder if the NNs we have today will be considered small by the standards of the not-too-distant future. Perhaps AutoML will allow us to create much larger NNs than humans can design? Does that make sense?

As in, if you had to translate the capabilities AKnet into a Software 1.0 program — well, you couldn’t, because it would be too inconceivably huge. There is a limit to what human minds can design. Maybe the same is true of neural networks themselves. If AutoML can be used to explore a possibility space much larger than what humans can imagine, we can discover much bigger and more complicated neural networks than we could code by hand.

Just a thought.
 
I'm not sure if this is a neural network question or not, but can someone explain why you can be completely stationary with all the cars around you also completely stationary, yet the icons showing vehicles show them dancing around instead of stationary?

Based on what I've read here, the NN currently deployed are spitting out object recognition and distance 36 times a second, with a new reading each of those times. Each instance constitutes 2 frames separated by some number of milliseconds, which presumably allows some amount of parallax based distance estimation. This distance estimation only works if the object is moving laterally relative to the camera. So when everything is still, I don't think the system can judge distance accurately. Now you have wildly different distance estimations spat out 36 times a second, which is probably causing the shakes. I haven't heard anything about the system (AK Net or conventional) that keeps the recognized objects in some kind of persistent model of the surrounding area. It may not yet be smart enough to determine that an object is actually static without more precise data.
 
Thanks.
So when everything is still, I don't think the system can judge distance accurately. Now you have wildly different distance estimations spat out 36 times a second, which is probably causing the shakes.
I still don't understand how it can have different distance estimations on effectively a static picture. An algorithm should spit out the same values every time if it doesn't change? I don't think the pictures would even change by one pixel in that time frame?
 
I'm not sure if this is a neural network question or not, but can someone explain why you can be completely stationary with all the cars around you also completely stationary, yet the icons showing vehicles show them dancing around instead of stationary?
Apparently / supposedly, the on screen visualization doesn't even get the full NN pass, but instead gets data out from somewhere in the middle - so it's not even fully processed. So no telling what kind of garbage it's trying to visualize. It's not a good visualization of what the NN really sees.
 
  • Like
Reactions: Anner J. Bonilla
Thanks.

I still don't understand how it can have different distance estimations on effectively a static picture. An algorithm should spit out the same values every time if it doesn't change? I don't think the pictures would even change by one pixel in that time frame?

But it does change, just due to noise in the pixels and subtly changing lighting. Even people inside the vehicles moving, clouds passing overhead, whatever. And one thing about NNs is that they can be very sensitive to very small changes in their inputs -- a bad NN architecture or poorly trained NN may behave like a chaotic system in fact. So when uncertainty in the outputs is high, those very small changes in pixel values can create large differences in output from frame to frame.

When uncertainty is low (the network has a very clear idea of what's happening) the outputs should be more stable even with a little noise thrown in. So what the dancing cars tells us is that the network still has very little ability to be precise about the location of cars in the 360 view. Recognizing large objects close to the camera, which is what the 360 view must do, is very challenging. Recognizing them and getting a precise idea about their size and location, when half the vehicle may be out of the frame, is even harder.
 
Thanks
But it does change, just due to noise in the pixels and subtly changing lighting.
The noise in the pixels is the part that escaped me. Now that I think about how they work it makes a lot more sense. I knew the on-screen display is not part of the final NN output, but has its own rudimentary processing, yet couldn't figure out what was different between frames. I guess if the software that does this processing was given a higher priority in development they could also make it more stable.
 
I have never in my life needed human speech / voice command to follow public safety directions, i follow their hand gestures.

1*CNVCipNU0uD-7LaBn0GJkA.gif




Based on the unbiased analysis of the system by Verygreen. We know it doesn't detect general object obstacles, cones, debris, barriers, curbs, guard rails, traffic lights, traffic signs, animals, road markings, etc.

Tesla's SFS for example is basic, while Mobileye has 15 different categories and they might have added more. It will tell you whether the edge of the road is flat, whether its a curb, whether its a concrete wall, whether its a guard rail, and whether its a concrete barrier.

bnjToOq.png


puc6mPw.png


Even eyeq3 had a network that detected potholes, animals and road debris. Infact Audi is using it for Automatic Active Suspension adjustment.

W79LFxB.png



They also have Networks to do things such as lane detection and what lane you are currently in, and upcoming lanes detection from afar. NOA could use a network like that as the way it handles things currently are primitive. right now it looks like they are doing some sort of algorithm downstream that barely works. Which is why it detects shoulders as a lane and also not seeing lanes which leads to missing exits and attempting to take them way too late.



giphy.gif


giphy.gif


ME also has a lane segmentation network that tells you what it lanes means and leads to. Lane expansion, merge lane, lane split, lane collapse, exit lane, etc. All of this is in Eyeq4 and in production TODAY!



That remark is not from me, but from other reviews of NOA.





What is the disengagement rate of v9 on limited access freeways?
















Again Elon boasted that AP2 chip was more than enough for Level 5 FSD and grandstanded that he could do cross country with his eyes closed and promised FSD feature to be released in early 2017. Have you forgotten all of this? How convenient!



Its not the NN that is the problem (when it comes to highway autonomy for the most part), its the motion planning and control algorithm. How is it that you people never seem to be able to differentiate the two?

Are you comparing a system that is currently available to the consumer (Tesla), to one that is not?
 
I'm not sure if this is a neural network question or not, but can someone explain why you can be completely stationary with all the cars around you also completely stationary, yet the icons showing vehicles show them dancing around instead of stationary?

The accuracy of the network sucks . Which undermines everything that jimmy has been preaching. As @wk057 pointed out. AP1 doesnt exhibit this behavior because the NN detection in ap1 is accurate even though it's still doing a prediction every frame.

The code that renders the car ui hasn't changed. The only thing that has changed is its input.

Wk even tried to create a hd map from lanes coming from ap2 cars like he did for ap2 and couldn't because of the instability and inaccuracy of the detections.

@verygreen questioned the pr videos coming from ME and saying that ME must be applying some type of filter/clean up for their videos. But as @wk057 and AP1 confirms. There is no filter. The network just produces good outputs.



Are you comparing a system that is currently available to the consumer (Tesla), to one that is not?

Eyeq4 are already in thousands of BMW, nio and VW cars this year.
 
Last edited:
The accuracy of the network sucks . Which undermines everything that jimmy has been preaching. As @wk057 pointed out. AP1 doesnt exhibit this behavior because the NN detection in ap1 is accurate even though it's still doing a prediction every frame.

The code that renders the car ui hasn't changed. The only thing that has changed is its input.

Wk even tried to create a hd map from lanes coming from ap2 cars like he did for ap2 and couldn't because of the instability and inaccuracy of the detections.

@verygreen questioned the price videos coming from ME and saying that ME must be applying some type of filter/clean up for their videos. Ask @wk057 and AP1 confirms that there is no filter. The network just produces good outputs.





Eyeq4 are already in thousands of BMW, nio and VW cars this year.

Were the bounding boxes for the pedestrians disabled? The video seems sort of swervey, esp after the turn when it headed toward the car ahead of the empty parking sport.

And are you comparing the AP2 UI output (of unknown tap point) to the ME final output?
 
The accuracy of the network sucks . Which undermines everything that jimmy has been preaching. As @wk057 pointed out. AP1 doesnt exhibit this behavior because the NN detection in ap1 is accurate even though it's still doing a prediction every frame.

The code that renders the car ui hasn't changed. The only thing that has changed is its input.

Wk even tried to create a hd map from lanes coming from ap2 cars like he did for ap2 and couldn't because of the instability and inaccuracy of the detections.

@verygreen questioned the pr videos coming from ME and saying that ME must be applying some type of filter/clean up for their videos. But as @wk057 and AP1 confirms. There is no filter. The network just produces good outputs.


I think they're fusing the IMU output of the vehicle with the predicted object boxes, i.e. transforming them by relative motion of the vehicle. That's the whole point of the demo isn't it? That they can build a parking map in memory from relative vehicle motion.