Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Neural Networks

This site may earn commission on affiliate links.
This is what Karpathy tweeted earlier today

“lots of exciting recent work in large-scale distributed training of neural nets: (very) large-batch SGD, KFAC, ES, population-based training / ENAS, (online) distillation, ... ”

Maybe @jimmy_d can explain that to the mere mortals like myself.

From twitter comments:
----
If you have a batch of 64, the most (trivial) parallelism you can get us to break the batch up to 64 processors. If you have a batch of 12k, you can now use 12k nodes. Course parallel tends to scale best too, so you get closer to linear speed up with nodes.
----

Basically faster processing, which results in faster network training in theory.
 
From twitter comments:
----
If you have a batch of 64, the most (trivial) parallelism you can get us to break the batch up to 64 processors. If you have a batch of 12k, you can now use 12k nodes. Course parallel tends to scale best too, so you get closer to linear speed up with nodes.
----

Basically faster processing, which results in faster network training in theory.

Yup, I think you are right. It seems that most of the acronyms used were about different batching and approximation techniques for faster calculations
 
This is what Karpathy tweeted earlier today

“lots of exciting recent work in large-scale distributed training of neural nets: (very) large-batch SGD, KFAC, ES, population-based training / ENAS, (online) distillation, ... ”

Maybe @jimmy_d can explain that to the mere mortals like myself.

Neural network development is at this time a highly empirical process. Because theory has a hard time explaining the detailed consequences of various small tuning decisions, and because small tuning decisions can have a big impact on the performance of networks, practical development and most research requires performing a lot of experiments. But experiments, especially on large cutting edge networks, require a lot of computation. That has two effects: experiments can get expensive and they can take a lot of time. To solve the former a lot of new computer designs are coming along that are specialized in training NNs and the size of doable experiments has been increasing by about a factor of 10 per year for the last several years. But to solve the latter (irrespective of the former) there's another approach that can be taken, which is to throw a lot of computers at a single experiment so that it takes less time to get results. A lot of experiments today can take weeks to get a result so reducing that time translates directly into faster progress in research and development. Unfortunately there are a lot of experiments which have been resistant to the brute force hardware approach. Four years ago almost all experiments were performed on a single machine because at that time the best techniques were unavoidably memory bandwidth dependent. A single GPU has hundreds of GB/s of bandwidth to it's own memory, but bandwidth between GPUs in the same machine is 10x slower and between machines is over 100x slower. This led to the unfortunate reality that using 100 machines was often no faster than using a single machine. To overcome that required development of techniques that could efficiently partition an experiment across a lot of machines without being sensitive to the bandwidth loss that was incurred between machines.

Karpathy's tweet is referring to recent and substantial advances in techniques that enable efficient partitioning of experiments across thousands of machines. For a lot of efforts, including probably Tesla's internal development, these techniques will translate directly into increasing the rate at which new NNs can be developed and deployed and, more importantly, will accelerate research into better ways to build and use networks. If you're a developer or researcher who has access to a sizable cluster this is really good news.
 
I have a very dumb newbie question. I vaguely remember neural networks from when I was a university student 30+ years ago. Why is this a big thing now? It's not a new technology, is it?
Nothing that's happened in the last century or two, frankly, is "new" technology, depending on how you look at it.

With that said, one of the biggest things that happened in the last few years is that GPUs have gotten so massively powerful and affordable that it's opened the doors to kinds of neural network training that were simply infeasible in the past. That probably triggered this latest AI renaissance if you'd call it that… which also resulted in a lot more research into new neural network architectures, tools for building neural nets, training techniques, etc etc etc.

So yes, conceptually it's the same idea that was thought of in the 70's, but recent research and technological advances have dramatically widened what you could apply it to.
 
Nothing that's happened in the last century or two, frankly, is "new" technology, depending on how you look at it.

With that said, one of the biggest things that happened in the last few years is that GPUs have gotten so massively powerful and affordable that it's opened the doors to kinds of neural network training that were simply infeasible in the past. That probably triggered this latest AI renaissance if you'd call it that… which also resulted in a lot more research into new neural network architectures, tools for building neural nets, training techniques, etc etc etc.

So yes, conceptually it's the same idea that was thought of in the 70's, but recent research and technological advances have dramatically widened what you could apply it to.

Thanks. I just spent some time reading about "deep learning". It's fascinating that you can "train" these things. And it's a little disturbing. Now I understand where the media hype is coming from.

I hope they're easier to train than my kids were!!!
 
I've been arguing for some time that the fears over the AI revolution were overblown. They've got a lot of potential, I think, but it's going to be a while before nobody has a job.

I can see it both ways. Certainly the doomsday where nobody has a job is far fetched IMO… but you're starting to see small examples of that here and there.

I've worked at a few places now where entry-level crash triage and bug triage jobs have progressively gotten replaced by more and more advanced AI's. At first they were laughable, but in the last year or two, they frequently do a better job than junior engineers trying to do that manually.

At this point, realistically, in terms of pattern recognition jobs, anything that takes less than a few months to teach a human engineer seems to be easily replaceable by a bot.

Maybe that's not doomsday yet, but it's one step towards that. I can only imagine what things like Google Duplex and chatbots will do for entry level Amazon or Comcast customer service. (Heck, is anyone else convinced that they're already robots? They're either robots or humans forced to read from a very static script)


EDIT: And I'd like to add too: At least in my line of work (software engineering), those who had their "work" replaced by a robot/AI have typically moved on to less menial/monotonous roles. Maybe it's different in other industries, but so far these AI's have liberated entry-level engineers of tasks they don't like to do, so they can work on fancier things that machines still cannot do.
 
Last edited:
I can see it both ways. Certainly the doomsday where nobody has a job is far fetched IMO… but you're starting to see small examples of that here and there.

I've worked at a few places now where entry-level crash triage and bug triage jobs have progressively gotten replaced by more and more advanced AI's. At first they were laughable, but in the last year or two, they frequently do a better job than junior engineers trying to do that manually.

At this point, realistically, in terms of pattern recognition jobs, anything that takes less than a few months to teach a human engineer seems to be easily replaceable by a bot.

Maybe that's not doomsday yet, but it's one step towards that. I can only imagine what things like Google Duplex and chatbots will do for entry level Amazon or Comcast customer service. (Heck, is anyone else convinced that they're already robots? They're either robots or humans forced to read from a very static script)

Right. And I'm seeing similar things. And that's what I expect to continue: a slow creep of improvement over decades. And that's exactly what we've seen for 200 years since the start of the industrial revolution.
 
Thanks. I just spent some time reading about "deep learning". It's fascinating that you can "train" these things. And it's a little disturbing. Now I understand where the media hype is coming from.

I hope they're easier to train than my kids were!!!

The main difference vs. those that were popular in the '70s is the "deep" part. Those nets were, by and large, single layer perceptrons. They looked really promising at first, but are unable to generalize well and provably can't fit many functions.

As mentioned, GPU's revolutionized everything, and did so largely by using the back propagation algorithm, which provides a quick way to calculate the portion of the error currently produced by the whole network that comes from intermediate layers(thus, allowing those layers weights to be updated). That algorithm existed way back when, but the hardware to actually use it in a non-trivial way didn't until recent years.
 
  • Helpful
Reactions: croman
@jimmy_d got a few minutes to put down some thoughts on this?

AP2 seemingly is able to brake for stopped cars (e.g. sedans) in the distance. (There are videos showing this -- but someone correct me if this is wrong)

If this is true, then Tesla's vision NN is able to detect sedan taillight/body-shape configurations, presuming radar is used secondarily as it needs to be filtered due to all the other stationary reflections.

-- Then why is braking for other large stationary objects still unsolved? Fire trucks. dump trucks, crash barriers with warning paint, etc?

Why isn't this a simple matter of training recognition on those objects?

The system is not even providing warning tones for these objects, which would indicate some (uncertain) degree of recognition...

In other words, to an engineer (but NN layman) this seems like a straightforward vision problem. This is also Tesla's biggest PR nightmare - as every crash is front page news and results in another NHTSA investigation.

This is a fraction of what comprises object recognition for true FSD. Why is this so hard?
 
  • Like
Reactions: Enginerd
AP2 seemingly is able to brake for stopped cars (e.g. sedans) in the distance. (There are videos showing this -- but someone correct me if this is wrong)

If this is true, then Tesla's vision NN is able to detect sedan taillight/body-shape configurations, presuming radar is used secondarily as it needs to be filtered due to all the other stationary reflections.

I don't think the ability to detect cars at a distance implies ot is being done via visual means. Radar can also serve as a primary sensor.

-- Then why is braking for other large stationary objects still unsolved? Fire trucks. dump trucks, crash barriers with warning paint, etc?

It many of these cases, the object was revealed after a blocking vehicle moved out of the way. The system is also tuned to reduce false positives (to prevent collisions due to sudden brake activation) at the expense of increased false negatives (which driver should be handling).

Latest tweets indicate AP team has been focused on safety aspects, so hopefully better collision avoidance is coming soon.
 
  • Helpful
Reactions: hiroshiy
Hi,

found this presentation of Andrej Karpathy (Director of AI at Tesla)

He is describing his Work at Tesla und the challenges.

Building the Software 2.0 Stack by Andrej Karpathy from Tesla

Oh wow. Thanks for the link. The best look at where Tesla is right now with their NN efforts. Interesting that the biggest issue is gathering appropriate datasets.

One thing that the talk does not address (which doesn’t mean that Tesla isn’t doing it, just that it isn’t addressed), is sensor fusion. Being able to combine radar, ultrasonic sensor and vision into one sensor system, combined by a NN of some sort.

Also, the system seems to be used only for detection, not for control. Ie. It looks like he has standard engineered software for car control, as opposed to having a NN control the car directly. Before you could do that, you would have to get the NN to understand the rules of the road. Ie some way of teaching a NN complex rules instead of just labeling things. I think this is a still unsolved problem.
 
  • Like
Reactions: andifu