Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Neural Networks

This site may earn commission on affiliate links.
total digression:

For the first several hours after I read that alphago zero paper I felt a kind of visceral dread. It was the first time I think I've felt like maybe AI might be moving too fast. This is not the kind of attitude I normally bring to this topic, but this paper was really quite a shock - it was years too soon. Everything about alphago has been 'years too soon' but this was really, really crazy too soon.

When we invented nuclear weapons one of the saving graces was that making one was not something that was doable by some pissed off asshole with a PhD, a basement, and a grudge. What would the world be like if you could make a nuke from stuff that anybody could get their hands on? Think about that for a minute.

I still believe that AI is unlikely to become the kind of technology that is the "superempowerment of the angry young man". But stuff like that alphago zero paper makes me wonder if maybe some critical breakthrough might actually be close at hand, that it might make these systems suddenly a million times more effective, and that such a thing would turn the world into something none of us recognize.

Returning to the topic of neural networks:

It's been a month since AlphaGo Zero. Time for DeepMind to revolutionize AI again:

Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours

Google's AlphaZero Destroys Stockfish In 100-Game Match - Chess.com

And if you want to read the paper it's here: https://arxiv.org/pdf/1712.01815.pdf

I read it and it blew my mind. Again. AlphaZero, which is not even a go program really, beat AlphaGo Zero with half the training (under 36 hours). And AlphaGo Zero had beaten AlphaGo Master with 20x less training and no human assistance. AlphaGo Master was the program that beat the 60 best Go players in the world back-to-back after training for about a month. And AlphaZero - this new program - is a general purpose algorithm that can learn any board game without tuning or examples or any kind of human intervention. In quick succession it bested the 3 hardest games in the world, each time taking just a few hours to learn the game starting with nothing but the rules.

Spacing of DeepMind's world-shaking papers: 12 months, then 9 months, then 6 months, then 3 months and now this one comes after only one month. Can't wait to see what January brings. Or will it be just 2 weeks this time?

One of the frequently made observations about the limits of AI is that it's narrow. Sebastian Thrun was fond of saying about AlphaGo that, even as amazing an achievement as it was, it was so narrow that it still couldn't play chess. Well now it can. It can play chess and if you give it a couple of hours it will master any other board game to a superhuman level.

I used to think (in ancient times - about 3 months ago) that RL (reinforcement learning) was a silly thing to use for training a self driving car. It would take way too long, you wouldn't get a good result, and nobody would understand what it was doing. If things keep going at this rate everything other than RL is going to be obsolete pretty soon.
 
AlphaZero still isn’t structured like a human brain. I find it very unlikely a single NN architecture can process language for instance, without separate NNs connected together in purposeful ways. BUT, there is no reason why you can’t do that, just no one has successful tried yet. I honestly think it is possible to construct thinking, human level intelligence. I just haven’t seen much progress towards that general AI goal yet, AplhaZero included.
 
AlphaZero still isn’t structured like a human brain. I find it very unlikely a single NN architecture can process language for instance, without separate NNs connected together in purposeful ways. BUT, there is no reason why you can’t do that, just no one has successful tried yet. I honestly think it is possible to construct thinking, human level intelligence. I just haven’t seen much progress towards that general AI goal yet, AplhaZero included.
Tough crowd.
 
Tough crowd.

Yeah, yeah. I do agree though that AlphaZero is pretty cool. Protein folding is an interesting application. There are underlying black and white physical mechanisms that govern the hows and whys of it. But it is way, way more complicated than the rules of chess. Will a NN of a size that can actually be run actually be able to tease out generalized rules for it? I am skeptical. Maybe they should try something between chess and proteins...
 
  • Like
Reactions: calisnow
One lSt
Yeah, yeah. I do agree though that AlphaZero is pretty cool. Protein folding is an interesting application. There are underlying black and white physical mechanisms that govern the hows and whys of it. But it is way, way more complicated than the rules of chess. Will a NN of a size that can actually be run actually be able to tease out generalized rules for it? I am skeptical. Maybe they should try something between chess and proteins...

The complexity of biological prcesses is exponentially more complex, I think the super badass checker player analogy with respect to NN is going to be the way autonomous driving gets solved
 
I was just looking through vision code in 17,48 and noticed they included traffic sign detection (not sure if it's enabled yet).
We can clearly see libraries/libdetector/traffic_signs/traffic_sign_decoder.cu with useful messages like
Code:
Failure mapping traffic sign mask: %s
Failure mapping traffic sign status: %s
Failure mapping traffic sign speed limit: %s

Are you excited yet? Hopefully comes soon ;)
 
I was just looking through vision code in 17,48 and noticed they included traffic sign detection (not sure if it's enabled yet).
We can clearly see libraries/libdetector/traffic_signs/traffic_sign_decoder.cu with useful messages like
Code:
Failure mapping traffic sign mask: %s
Failure mapping traffic sign status: %s
Failure mapping traffic sign speed limit: %s

Are you excited yet? Hopefully comes soon ;)
This is the *first* firmware that has traffic sign vision code included?
 
  • Love
Reactions: appleguru
This is the *first* firmware that has traffic sign vision code included?
I don't have a 17.46 sample, so I do not know if it was in .46 as well or not. But this stuff definitely was not there in 17.44 and prior, that was the first thing I checked.
I don't see any signs anything is propagated to the cid/ic display, though, which leads me to suspicion this is not really active yet, or at least not on my car.
 
I got a chance to look at definition files for a new set of vision NNs which I understand to be the ones which are going out in 2018.10.4. I’m going to summarize the differences here. For background on what I found in earlier networks (2017.28, 2017.34, and 2017.44) please see this post from last November: Neural Networks

Cameras

I’ve seen three new networks which I’m going to refer to as main, fisheye, and repeater. These names come from filenames used for the network definitions as well as from variable names used inside the networks. I believe main is used for both the main and narrow forward facing cameras, that fisheye is used for the wide angle forward facing camera, and that repeater is used for both of the repeater cameras.

Overview

These network definition files that I’m talking about are used to describe how the ‘neurons’ in a neural network are arranged and interconnected - the network architecture. This architecture defines the inputs and outputs for the network and the shape of the tensors (the data) that flow through the network. By comparing the architecture to other well known networks it’s possible to understand what kind of processing is occurring, how the network is being trained, what kind of performance is possible, how much data of what kind is needed to train the network, and how much computer power is required to run the network.

Review of previous network

2017.44 was an inception network closely modeled on GoogLeNet - an award winning vision processing network design that was invented by Google about 4 years ago. GoogLeNet (GLN) is probably the single most popular high performance vision network in use today because it combines high accuracy with good computational efficiency. It can be slow to train but it runs very fast when deployed. The architecture is well understood and flexible - it can easily be adapted to different kinds of imaging data. The foundation of 2017.44’s main, repeater, and pillar networks (actually, introduced in 2017.42) was almost identical to GLN with only the most minimal changes required to adapt to the camera type and to provide the particular kinds of outputs that AP2 needed. The fisheye_wiper (introduced with 17.44) was based on a truncated GLN with 3 inception layers instead of the normal 9.

All of these networks had custom output stages that took the high level abstractions generated by GLN and interpreted them in various ways that would be useful for downstream processing. The fisheye_wiper network only put out a simple value - presumably an indicator of how hard it was raining. The repeater and pillar networks identified and located six classes of objects (Note that objects here can include not just discrete items like pedestrians and vehicles but also, for instance, areas of pavement) . The main network (used twice for both main forward camera and narrow forward camera) had generic object outputs as well as some more specialized outputs (for instance, for identifying the presence of adjacent lanes and the road shoulder).

Changes in 2018.10.4

As of 2017.44 - the most recent network I’ve seen that was a substantial departure from earlier versions - there were versions of main, fisheye, and repeater networks in use and also another network referred to as ‘pillar’, which was probably used for the b-pillar cameras. I understand that pillar is not present in 2018.10.4. This could mean that the b-pillar cameras were used in 44 but are not being used in 18.10.4, or it might not. In 44 the networks for the pillar and repeater cameras were identical in structure but had different parameters. It’s possible that they could be merged functionally, with a single network being used for both repeaters and for pillars. Merging them would reduce their accuracy but it could lead to procedural and computational efficiency gains.

Changes to the network for main and narrow cameras

The main network now uses about 50% larger data flows between layers, which will increase the number of parameters by more than 2x and will substantially increase the representational power of the network and the amount of data required to train it. All other things being equal this network will have a more ‘nuanced’ set of perceptions. The inputs are the same and the outputs are the same with one exception. A new output called super_lanes has been substituted for a previously unnamed output. Super_lanes summarizes the image into a 1000 dimensional vector output, which is interesting because it probably means that the output of super_lanes is being fed into another neural network.

(BTW - The internal name on this main network is now “aknet”. Andrej Karpathy Net ? )

Changes to repeater network

The repeater network in 10.4 has been truncated to 4 inception layers where the previous repeater network was a full 9 inception layers. The outputs are the same as before - a six class segmentation map (labels each pixel in the camera view as one of 6 categories) plus bounding boxes for objects.

Changes to the fisheye_wiper network

This network remains a truncated GLN. It appears to have been rewritten in a syntax that is now similar to the other networks. The previous fisheye network was in a different syntax and seemed to have been a holdover from some earlier generation of development tools. The new fisheye has some small changes introduced to the earlier layers but it still has just 2 inception layers and it still outputs a single class value for rain (one of 5 choices which are probably various types/degrees of rain). (It recently occurred to me that snow or ice might be included in these categories.) The new version seems to break the field of view into 4 quadrants and output a class for each one where the old network did not subdivide the field of view. Maybe rain looks different against the road rather than the sky. Additionally, segmentation and bounding box outputs have been added for the fisheye, so it seems like the fisheye is also getting trained to recognize things other than rain. Which might mean that it’s also going to be scanning the field of view for cars and pedestrians, or it could mean that it’s specifically sensing stuff like bird poo and dead bugs so that it can respond appropriately.

Summary

So the main and narrow camera network is getting quite a bit more powerful, the repeater has been simplified, and fisheye has been remodeled with possibly some non-rain functions being included.

As a reminder - these networks are only for processing camera inputs. They take single frame images and interpret them individually. Downstream processing has to take these camera outputs and interpret them as a sequence, combine them with perception from other sensors, and make driving decisions. This is only a small part of the overall driving software suite, but vision is an important part of the vehicle’s perception capacity and changes to these networks might be a good indicator of progress in the development of AP.
 
I got a chance to look at definition files for a new set of vision NNs which I understand to be the ones which are going out in 2018.10.4. I’m going to summarize the differences here. For background on what I found in earlier networks (2017.28, 2017.34, and 2017.44) please see this post from last November: Neural Networks

Cameras

I’ve seen three new networks which I’m going to refer to as main, fisheye, and repeater. These names come from filenames used for the network definitions as well as from variable names used inside the networks. I believe main is used for both the main and narrow forward facing cameras, that fisheye is used for the wide angle forward facing camera, and that repeater is used for both of the repeater cameras.

Overview

These network definition files that I’m talking about are used to describe how the ‘neurons’ in a neural network are arranged and interconnected - the network architecture. This architecture defines the inputs and outputs for the network and the shape of the tensors (the data) that flow through the network. By comparing the architecture to other well known networks it’s possible to understand what kind of processing is occurring, how the network is being trained, what kind of performance is possible, how much data of what kind is needed to train the network, and how much computer power is required to run the network.

Review of previous network

2017.44 was an inception network closely modeled on GoogLeNet - an award winning vision processing network design that was invented by Google about 4 years ago. GoogLeNet (GLN) is probably the single most popular high performance vision network in use today because it combines high accuracy with good computational efficiency. It can be slow to train but it runs very fast when deployed. The architecture is well understood and flexible - it can easily be adapted to different kinds of imaging data. The foundation of 2017.44’s main, repeater, and pillar networks (actually, introduced in 2017.42) was almost identical to GLN with only the most minimal changes required to adapt to the camera type and to provide the particular kinds of outputs that AP2 needed. The fisheye_wiper (introduced with 17.44) was based on a truncated GLN with 3 inception layers instead of the normal 9.

All of these networks had custom output stages that took the high level abstractions generated by GLN and interpreted them in various ways that would be useful for downstream processing. The fisheye_wiper network only put out a simple value - presumably an indicator of how hard it was raining. The repeater and pillar networks identified and located six classes of objects (Note that objects here can include not just discrete items like pedestrians and vehicles but also, for instance, areas of pavement) . The main network (used twice for both main forward camera and narrow forward camera) had generic object outputs as well as some more specialized outputs (for instance, for identifying the presence of adjacent lanes and the road shoulder).

Changes in 2018.10.4

As of 2017.44 - the most recent network I’ve seen that was a substantial departure from earlier versions - there were versions of main, fisheye, and repeater networks in use and also another network referred to as ‘pillar’, which was probably used for the b-pillar cameras. I understand that pillar is not present in 2018.10.4. This could mean that the b-pillar cameras were used in 44 but are not being used in 18.10.4, or it might not. In 44 the networks for the pillar and repeater cameras were identical in structure but had different parameters. It’s possible that they could be merged functionally, with a single network being used for both repeaters and for pillars. Merging them would reduce their accuracy but it could lead to procedural and computational efficiency gains.

Changes to the network for main and narrow cameras

The main network now uses about 50% larger data flows between layers, which will increase the number of parameters by more than 2x and will substantially increase the representational power of the network and the amount of data required to train it. All other things being equal this network will have a more ‘nuanced’ set of perceptions. The inputs are the same and the outputs are the same with one exception. A new output called super_lanes has been substituted for a previously unnamed output. Super_lanes summarizes the image into a 1000 dimensional vector output, which is interesting because it probably means that the output of super_lanes is being fed into another neural network.

(BTW - The internal name on this main network is now “aknet”. Andrej Karpathy Net ? )

Changes to repeater network

The repeater network in 10.4 has been truncated to 4 inception layers where the previous repeater network was a full 9 inception layers. The outputs are the same as before - a six class segmentation map (labels each pixel in the camera view as one of 6 categories) plus bounding boxes for objects.

Changes to the fisheye_wiper network

This network remains a truncated GLN. It appears to have been rewritten in a syntax that is now similar to the other networks. The previous fisheye network was in a different syntax and seemed to have been a holdover from some earlier generation of development tools. The new fisheye has some small changes introduced to the earlier layers but it still has just 2 inception layers and it still outputs a single class value for rain (one of 5 choices which are probably various types/degrees of rain). (It recently occurred to me that snow or ice might be included in these categories.) The new version seems to break the field of view into 4 quadrants and output a class for each one where the old network did not subdivide the field of view. Maybe rain looks different against the road rather than the sky. Additionally, segmentation and bounding box outputs have been added for the fisheye, so it seems like the fisheye is also getting trained to recognize things other than rain. Which might mean that it’s also going to be scanning the field of view for cars and pedestrians, or it could mean that it’s specifically sensing stuff like bird poo and dead bugs so that it can respond appropriately.

Summary

So the main and narrow camera network is getting quite a bit more powerful, the repeater has been simplified, and fisheye has been remodeled with possibly some non-rain functions being included.

As a reminder - these networks are only for processing camera inputs. They take single frame images and interpret them individually. Downstream processing has to take these camera outputs and interpret them as a sequence, combine them with perception from other sensors, and make driving decisions. This is only a small part of the overall driving software suite, but vision is an important part of the vehicle’s perception capacity and changes to these networks might be a good indicator of progress in the development of AP.
Jimmy, you are a freaking genius