Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Neural Networks

This site may earn commission on affiliate links.
It is. They've already improved it in the latest fw.
And basically every release along the way too. 10.4 was a huge improvement in recognizing stopped vehicles. 2018.12 or so all the way with .18 were heavily tweaking how soon TACC started braking for perceived stopped cars. And .21.9 seemed to both improve recognizing stopped cars earlier as well as detecting partially lane offset intruders earlier.

It’s fair to say they’ve been working on the safety aspect of Autopilot and delivering improvements as they become ready.
 
  • Like
Reactions: croman
@ jimmy_d

Where you talk about whitening the database, my theory is that this why you need "emotions", if emotions are defined as only a way to quantify the importance of a memory. IMO Google is somewhat doing that here Enabling Continual Learning in Neural Networks | DeepMind
Don't know anything about this topic really but why do we remember corner cases, they confuse us or worse, become dangerous. Why do we remember a strong smell or pain.

And in your lengthy post form April, why do you say you see no evidence of FSD? I mean, perception is the hard part ,path planning is easier than a board game, if you solve perception, you got FSD. Granted perception also needs to be able to anticipate what all those objects might do but likely most of that is left for later.
 
EAP/the safety system really impressed me today. When DW was driving earlier, I think the system correctly identified a stopped car/distracted car as a potential hazard and beeped accordingly.

We were coming up to a light in the inner right turn lane. As we approached the light, we got a green right arrow, and the cars in the far right turn lane started moving/turning, as did the first car in the inner right turn lane. The second car in the inner right turn lane, which was in front of us, stayed put because the driver was distracted, and maybe a half second after my wife started braking harder because this car wasn't moving, the safety system gave a few warning beeps.

I believe the system correctly inferred that the stopped car was a possible hazard because of the green for the two right turn lanes and/or because the car wasn't following the flow of traffic. I know my wife has had much harder braking coming up to cars stopped at intersections without the car warning us about anything, so I don't think it warned us because my wife was braking harder than normal.
 
Google's recently announced TPU3 pods have a raw throughput of 100 petaflops (100,000,000,000,000 operations per second). To put that in perspective: various estimates of the computational performance of the human brain tend to fall into the range of 10^16 to 10^19 ops/second, so we are now in the zone (plus or minus a year) where cutting edge NN training hardware runs at a capacity roughly comparable to the human brain. And at the current growth rate in just a few years the best machines will be 100x the speed of the human brain, which is sort of the expected threshold where development of algorithms of human brain complexity can be done.

The former director of the Human Brain Project, Henry Markram, at one point said he thought it would take exascale computing to simulate the human brain — in the ballpark of 1 exaflop. 1 exaflop is 1000 petaflops or 10^18 flops, and 10 exaflops is 10,000 petaflops or 10^19 flops.

However, estimates go a lot higher depending on which entities in the brain you think actually do computational work. If you want to go down to the individual molecule level it’s 10^43 flops (see page 80). From what I can tell, there seems to be little consensus among neuroscientists or cognitive scientists which entities actually matter to cognition and which don’t.

Until recently, neuroscientists thought glia — brain cells that outnumber neurons roughly 10 to 1 — played no role in cognition. But increasingly glia are thought to play a role in cognitive processes like learning and memory. Whoops! We missed about a trillion cells in our understanding of how cognition works!

I would add that even if you had infinite computing power, that doesn’t mean you will be able to develop human-level artificial general intelligence. Computation is necessary but not sufficient. One idea I find compelling is that progress in AI follows progress in neuroscience. AI is essentially an exercise in biomimicry. So, no matter how much computation we have, or data, or capital, or deep learning PhDs, we won’t reach human-level AGI until we have a better understanding of the brain and cognition. Once we understand how biological intelligence works, we can copy it.

Not saying this is certainly true, but it is consistent with what we know. Deep learning doesn’t seem to provide a path to human-level AGI, even though it’s possible (as Daniel Dennett has speculated) that the human brain implements deep neural network-like processes for low-level cognitive problems — and hence it makes sense that deep neural networks can match or surpass biological neural networks on many tasks.

It seems to be an open question how much artificial intelligence will (or even can) diverge from biological intelligence. It’s a profound question.

Maybe after Tesla launches full autonomy and the stock goes to a bazillion a share I will devote the rest of my life to working on high-level theories of cognition. :p
 
Last edited:
AI is essentially an exercise in biomimicry

This used to be more true, but the underlying structure of the neurons and activations themselves aren’t as capable as our human-created AI. That might sound surprising, because, obviously, our man-made networks are nowhere near as capable on the whole at tasks like driving, etc. But nature has us bested in overall network structure(including in the ability to adapt structure at inference time) and just in sheer processing power vs. power requirements.
 
Really, why do you say that?

Early on in neural network design, people actively tried targeting just simulating the way human neurons work, with what’s called a Binary Stochastic Neuron. Those certainly do the job, in that you can make a functioning net entirely out of them. But other, man made, activation functions tend to perform better in a given architecture. ReLU(Rectified Linear Unit) is one of the commonly used. On output neurons, there’s a wide array of them, but they tend to just try to model the type of output expected, e.g. Softmax for multiple mutually exclusive outputs, sigmoid if it should be between 0 and 1, etc. But I’ve never actually seen Stochastic Binary Neurons used in the field, because, while they’re interesting, they don’t perform as well.

Notably, while we know very little about the brain overall, we know a LOT about indivual neurons. So it’s not surprising we’d be able to model them directly and improve on them.
 
Referencing this post: Seeing the world in autopilot, part deux

Am posting here because this is mostly about neural networks. Will make a link over there pointing to this post.

seeing the world in autopilot part deux observations:

First, some background:

Based on taking apart a set of AP2 binaries earlier this year I came up with a general structure for how data flow was going through the system and was able to identify some intermediate processing products from the nature of the data structures, how they were being used, and the names of variables.

This general structure has a group of advanced CNN networks processing the output from each of the 7 navigation cameras (excluding the backup camera) followed by a second set of networks that I called post-processing networks. The camera networks were identifying and localizing several classes of objects in the field of view of all of the cameras. Among the types of objects that seem to be detected were vehicles, traffic signals, and lane markings. The second layer of networks generated outputs that seemed to be focused on identifying and understanding the shape of lanes, assigning vehicles to lanes, predicting whether other vehicles were moving, stopped, or parked, and also identifying physical landmarks (mainly poles and maybe the corners of buildings).

So all of this was from looking at the code. I had started that analysis with the hope of finding a way to understand the system capabilities but being unable to see the code in action what I could come up with was pretty limited.

Now we see some beautiful output from the efforts of @verygreen and @DamianXVI which can extend my earlier observations by giving us examples of what comes out of the network.

So I’m going to interpret what’s happening in @verygreens video here in light of what I’ve seen in the code.

First - the annnotations here seem to be output from second layer of networks, not from the primary camera networks. There are various ways to show this but a simple one is thus - vehicle ID’s in the video persist from frame to frame. It’s not possible for the camera networks to do that because they only process one frame at a time and have no knowledge of other frames or the machine state other than one single frame of camera output. Downstream networks have to correlate the output from successive frames of camera network output in order to allow the ID to persist.

The major categories of annotation are: predicted lane boundaries and boundary type, vehicles, trucks, pedestrians, motorcycles, bicycles, and driveable space.

Lane boundaries predict the left and right edges of the acceptable driving lane that the AP vehicle currently occupies with color coding indicating whether a lane boundary is separating oncoming or same direction traffic. At junctions where a turn might optionally occur multiple lane boundaries will be identified representing the edges of the driving lane for each of the optional paths that have appeared. I never saw more than two options at once. Options are show for the occupied lane but otherwise the only lane boundaries predicted are the far boundaries of adjacent lanes.

It’s notable that the lane boundaries aren’t just identifying pavement marking or curbs. The boundaries are present even when pavement marking are absent and both the left and right lane boundaries appear even when only one of them is easily identifiable from what’s in the camera view. Aside from lane markings AP seems to use the presence and state of other vehicles and the presence of obstacles to predict lane boundaries.

Objects all seem to carry a confidence value in % which probably represents the confidence of the system in it’s object class prediction. Identified objects optionally carry several attributes including a lane assignment, motion state (moving, stopped, or stationary), distance, and relative velocity. It seems that objects are also labelled as to whether AP has a corresponding radar return associated with a particular object. Notably, lane assignment include making a distinction whether a lane is a parking lane (off-road) or not. Making that distinction requires a lot of context. Lane assignment also seems to have a lot of states including not just whether the object is in your lane, left, or right but also whether it’s straddling lanes and also something labeled IMM - which might be “immediately adjacent”

Drivable space represents unobstructed area that the AP vehicle has physical access to and is bounded by edge markings that indicate the kind of obstacle that is limiting the drivable space at that section of the edge. Vehicles and pedestrians are obstacles that are in a different class from other sorts of barriers. While traffic cones, bollards, and fencing aren’t called out as discretely identified objects it’s clear that AP is seeing them and recognizing them functionally because it adjusts the driving space according to their presence.

And finally we have that beautiful, beautiful path prediction arrow in orange. I love this element because it probably gives us the most abstracted and subtle insight into what AP is ‘thinking’ as it moves around the world. I’ll make the strong claim that the path prediction is the output of a neural network because it behaves probabilistically, seems to be affected by the full context of a scene, lacks hysteresis, and presents a continuous selection space. A human written heuristic is unlikely to show this behavior. From the shape of the path prediction we can see that AP is making a nuanced prediction of the road shape extending out at least a couple of hundred meters and - and this is really amazing to me - is able to usefully predict the rising/falling shaped of the road ahead and predict the probable path of road sections *which it cannot see*. So it estimates that a road around a blind curve will continue curving and it estimates the direction a road takes over a blind rise even when the road shape leading up to the rise is pretty complicated. This latter is probably the critical capability that finally solved the disastrous ‘cresting hill’ fail that finally went away when 2018.10.4 shipped out.

So some interesting things we know from this:

  1. AP2 estimates distance to objects and their relative velocity based on vision even when no radar return is available for an object. It looks like radar is thus a fully redundant backup for vision capabilities. Whether that is from stereo vision processing or from scale estimates is still open, but there’s clearly a useful degree of distance estimation that is being extracted even for items that have no radar return signal.
  2. The FOV of the camera seems to be quite a bit wider than the FOV of the radar - objects lose their radar signal at the edges of the camera FOV. This seems to be a view from the main camera - if the wide angle camera has comparable recognition accuracy then the useful FOV of the vision system is going to be enormously larger than that of the radar.
  3. AP2 identifies vehicles even when they are substantially occluded, and at quite substantial distances.
  4. Strange backgrounds seem to confound identification as much as occlusion does - maybe more. Cyclists seen against a background of traffic have much lower confidence than cyclists with a background of pavement or buildings.
  5. Radar still seems to be bad at seeing cross traffic - though it seems like vision makes up for that pretty well.
  6. At a minimum we can see that radar and vision are being fused since single objects are being given both radar and vision attributes. Is forward sonar also being fused? There doesn’t seem to be any good evidence of that here.
  7. This video doesn’t rule out the possibility that high definition maps are being used for driving, but it also doesn’t present any evidence to support it.


I’d be really interested to know if there’s any evidence of navigation data being fed into AP2 to be used as part of navigation. And it would also be interesting to know if there’s any evidence of AP gathering data to be used to create HD maps. A lot of groups seems to be relying on HD maps as a critical part of their driver assistance systems (comma ai, cruise, waymo) but so far I haven’t seen any evidence that Tesla is actually doing that - aside from some claims from a few years ago.