TMC is an independent, primarily volunteer organization that relies on ad revenue to cover its operating costs. Please consider whitelisting TMC on your ad blocker and becoming a Supporting Member. For more info: Support TMC
  1. TMC is currently READ ONLY.
    Click here for more info.

Neural Networks

Discussion in 'Autopilot & Autonomous/FSD' started by lunitiks, Nov 5, 2017.

  1. jimmy_d

    jimmy_d Deep Learning Dork

    Joined:
    Jan 26, 2016
    Messages:
    416
    Location:
    San Francisco, CA
    Returning to the topic of neural networks:

    It's been a month since AlphaGo Zero. Time for DeepMind to revolutionize AI again:

    Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours

    Google's AlphaZero Destroys Stockfish In 100-Game Match - Chess.com

    And if you want to read the paper it's here: https://arxiv.org/pdf/1712.01815.pdf

    I read it and it blew my mind. Again. AlphaZero, which is not even a go program really, beat AlphaGo Zero with half the training (under 36 hours). And AlphaGo Zero had beaten AlphaGo Master with 20x less training and no human assistance. AlphaGo Master was the program that beat the 60 best Go players in the world back-to-back after training for about a month. And AlphaZero - this new program - is a general purpose algorithm that can learn any board game without tuning or examples or any kind of human intervention. In quick succession it bested the 3 hardest games in the world, each time taking just a few hours to learn the game starting with nothing but the rules.

    Spacing of DeepMind's world-shaking papers: 12 months, then 9 months, then 6 months, then 3 months and now this one comes after only one month. Can't wait to see what January brings. Or will it be just 2 weeks this time?

    One of the frequently made observations about the limits of AI is that it's narrow. Sebastian Thrun was fond of saying about AlphaGo that, even as amazing an achievement as it was, it was so narrow that it still couldn't play chess. Well now it can. It can play chess and if you give it a couple of hours it will master any other board game to a superhuman level.

    I used to think (in ancient times - about 3 months ago) that RL (reinforcement learning) was a silly thing to use for training a self driving car. It would take way too long, you wouldn't get a good result, and nobody would understand what it was doing. If things keep going at this rate everything other than RL is going to be obsolete pretty soon.
     
    • Informative x 15
    • Like x 3
    • Love x 2
  2. Cosmacelf

    Cosmacelf Well-Known Member

    Joined:
    Mar 6, 2013
    Messages:
    8,229
    Location:
    San Diego
    AlphaZero still isn’t structured like a human brain. I find it very unlikely a single NN architecture can process language for instance, without separate NNs connected together in purposeful ways. BUT, there is no reason why you can’t do that, just no one has successful tried yet. I honestly think it is possible to construct thinking, human level intelligence. I just haven’t seen much progress towards that general AI goal yet, AplhaZero included.
     
  3. ohmman

    ohmman Plaid-ish Moderator

    Joined:
    Feb 13, 2014
    Messages:
    9,876
    Location:
    North Bay, CA
    Didn't bother stopping in for some County Line, eh?
     
    • Funny x 1
  4. calisnow

    calisnow Banned

    Joined:
    Oct 11, 2014
    Messages:
    2,867
    Location:
    Los Angeles
    Tough crowd.
     
    • Funny x 5
  5. Cosmacelf

    Cosmacelf Well-Known Member

    Joined:
    Mar 6, 2013
    Messages:
    8,229
    Location:
    San Diego
    Yeah, yeah. I do agree though that AlphaZero is pretty cool. Protein folding is an interesting application. There are underlying black and white physical mechanisms that govern the hows and whys of it. But it is way, way more complicated than the rules of chess. Will a NN of a size that can actually be run actually be able to tease out generalized rules for it? I am skeptical. Maybe they should try something between chess and proteins...
     
    • Like x 1
  6. buttershrimp

    buttershrimp Click my signature to Go Mad Max Mode

    Joined:
    Jun 17, 2017
    Messages:
    2,656
    Location:
    ATX
    One lSt
    The complexity of biological prcesses is exponentially more complex, I think the super badass checker player analogy with respect to NN is going to be the way autonomous driving gets solved
     
  7. EinSV

    EinSV Active Member

    Joined:
    Feb 6, 2016
    Messages:
    4,318
    Location:
    NorCal
    Maybe not your style, but some of the lines from Taylor Swift's "Shake it Off" seem like a good fit for AP2.:)
     
  8. verygreen

    verygreen Curious member

    Joined:
    Jan 16, 2017
    Messages:
    2,897
    Location:
    TN
    I was just looking through vision code in 17,48 and noticed they included traffic sign detection (not sure if it's enabled yet).
    We can clearly see libraries/libdetector/traffic_signs/traffic_sign_decoder.cu with useful messages like
    Code:
    Failure mapping traffic sign mask: %s
    Failure mapping traffic sign status: %s
    Failure mapping traffic sign speed limit: %s
    
    Are you excited yet? Hopefully comes soon ;)
     
    • Like x 12
    • Informative x 3
    • Love x 3
  9. calisnow

    calisnow Banned

    Joined:
    Oct 11, 2014
    Messages:
    2,867
    Location:
    Los Angeles
    This is the *first* firmware that has traffic sign vision code included?
     
    • Love x 1
  10. J1mbo

    J1mbo Active Member

    Joined:
    Aug 20, 2013
    Messages:
    1,565
    Location:
    UK
    That IS exciting! Maybe this is what Jon McNeill was talking about back in October.
     
  11. verygreen

    verygreen Curious member

    Joined:
    Jan 16, 2017
    Messages:
    2,897
    Location:
    TN
    I don't have a 17.46 sample, so I do not know if it was in .46 as well or not. But this stuff definitely was not there in 17.44 and prior, that was the first thing I checked.
    I don't see any signs anything is propagated to the cid/ic display, though, which leads me to suspicion this is not really active yet, or at least not on my car.
     
    • Like x 2
    • Helpful x 1
    • Informative x 1
  12. J1mbo

    J1mbo Active Member

    Joined:
    Aug 20, 2013
    Messages:
    1,565
    Location:
    UK
    Do you think Tesla send triggers out to specific cars to enable this stuff in "shadow mode", like they do with the image snapshots?
     
  13. verygreen

    verygreen Curious member

    Joined:
    Jan 16, 2017
    Messages:
    2,897
    Location:
    TN
    I only see triggers sent to my cars.
    That said I don't see a "shadow mode" trigger whatever that might be. They can trigger by specific circumstances to generate snapshots, that's about it.
     
    • Informative x 2
    • Helpful x 1
  14. buttershrimp

    buttershrimp Click my signature to Go Mad Max Mode

    Joined:
    Jun 17, 2017
    Messages:
    2,656
    Location:
    ATX
    • Like x 5
    • Informative x 1
    • Love x 1
  15. VT_EE

    VT_EE Active Member

    Joined:
    Apr 22, 2017
    Messages:
    2,016
    Location:
    Maryland
    • Like x 4
  16. Carl

    Carl Supporting Member

    Joined:
    Jan 12, 2013
    Messages:
    1,740
    Location:
    Belgium
    • Funny x 6
  17. jimmy_d

    jimmy_d Deep Learning Dork

    Joined:
    Jan 26, 2016
    Messages:
    416
    Location:
    San Francisco, CA
    I got a chance to look at definition files for a new set of vision NNs which I understand to be the ones which are going out in 2018.10.4. I’m going to summarize the differences here. For background on what I found in earlier networks (2017.28, 2017.34, and 2017.44) please see this post from last November: Neural Networks

    Cameras

    I’ve seen three new networks which I’m going to refer to as main, fisheye, and repeater. These names come from filenames used for the network definitions as well as from variable names used inside the networks. I believe main is used for both the main and narrow forward facing cameras, that fisheye is used for the wide angle forward facing camera, and that repeater is used for both of the repeater cameras.

    Overview

    These network definition files that I’m talking about are used to describe how the ‘neurons’ in a neural network are arranged and interconnected - the network architecture. This architecture defines the inputs and outputs for the network and the shape of the tensors (the data) that flow through the network. By comparing the architecture to other well known networks it’s possible to understand what kind of processing is occurring, how the network is being trained, what kind of performance is possible, how much data of what kind is needed to train the network, and how much computer power is required to run the network.

    Review of previous network

    2017.44 was an inception network closely modeled on GoogLeNet - an award winning vision processing network design that was invented by Google about 4 years ago. GoogLeNet (GLN) is probably the single most popular high performance vision network in use today because it combines high accuracy with good computational efficiency. It can be slow to train but it runs very fast when deployed. The architecture is well understood and flexible - it can easily be adapted to different kinds of imaging data. The foundation of 2017.44’s main, repeater, and pillar networks (actually, introduced in 2017.42) was almost identical to GLN with only the most minimal changes required to adapt to the camera type and to provide the particular kinds of outputs that AP2 needed. The fisheye_wiper (introduced with 17.44) was based on a truncated GLN with 3 inception layers instead of the normal 9.

    All of these networks had custom output stages that took the high level abstractions generated by GLN and interpreted them in various ways that would be useful for downstream processing. The fisheye_wiper network only put out a simple value - presumably an indicator of how hard it was raining. The repeater and pillar networks identified and located six classes of objects (Note that objects here can include not just discrete items like pedestrians and vehicles but also, for instance, areas of pavement) . The main network (used twice for both main forward camera and narrow forward camera) had generic object outputs as well as some more specialized outputs (for instance, for identifying the presence of adjacent lanes and the road shoulder).

    Changes in 2018.10.4

    As of 2017.44 - the most recent network I’ve seen that was a substantial departure from earlier versions - there were versions of main, fisheye, and repeater networks in use and also another network referred to as ‘pillar’, which was probably used for the b-pillar cameras. I understand that pillar is not present in 2018.10.4. This could mean that the b-pillar cameras were used in 44 but are not being used in 18.10.4, or it might not. In 44 the networks for the pillar and repeater cameras were identical in structure but had different parameters. It’s possible that they could be merged functionally, with a single network being used for both repeaters and for pillars. Merging them would reduce their accuracy but it could lead to procedural and computational efficiency gains.

    Changes to the network for main and narrow cameras

    The main network now uses about 50% larger data flows between layers, which will increase the number of parameters by more than 2x and will substantially increase the representational power of the network and the amount of data required to train it. All other things being equal this network will have a more ‘nuanced’ set of perceptions. The inputs are the same and the outputs are the same with one exception. A new output called super_lanes has been substituted for a previously unnamed output. Super_lanes summarizes the image into a 1000 dimensional vector output, which is interesting because it probably means that the output of super_lanes is being fed into another neural network.

    (BTW - The internal name on this main network is now “aknet”. Andrej Karpathy Net ? )

    Changes to repeater network

    The repeater network in 10.4 has been truncated to 4 inception layers where the previous repeater network was a full 9 inception layers. The outputs are the same as before - a six class segmentation map (labels each pixel in the camera view as one of 6 categories) plus bounding boxes for objects.

    Changes to the fisheye_wiper network

    This network remains a truncated GLN. It appears to have been rewritten in a syntax that is now similar to the other networks. The previous fisheye network was in a different syntax and seemed to have been a holdover from some earlier generation of development tools. The new fisheye has some small changes introduced to the earlier layers but it still has just 2 inception layers and it still outputs a single class value for rain (one of 5 choices which are probably various types/degrees of rain). (It recently occurred to me that snow or ice might be included in these categories.) The new version seems to break the field of view into 4 quadrants and output a class for each one where the old network did not subdivide the field of view. Maybe rain looks different against the road rather than the sky. Additionally, segmentation and bounding box outputs have been added for the fisheye, so it seems like the fisheye is also getting trained to recognize things other than rain. Which might mean that it’s also going to be scanning the field of view for cars and pedestrians, or it could mean that it’s specifically sensing stuff like bird poo and dead bugs so that it can respond appropriately.

    Summary

    So the main and narrow camera network is getting quite a bit more powerful, the repeater has been simplified, and fisheye has been remodeled with possibly some non-rain functions being included.

    As a reminder - these networks are only for processing camera inputs. They take single frame images and interpret them individually. Downstream processing has to take these camera outputs and interpret them as a sequence, combine them with perception from other sensors, and make driving decisions. This is only a small part of the overall driving software suite, but vision is an important part of the vehicle’s perception capacity and changes to these networks might be a good indicator of progress in the development of AP.
     
    • Informative x 111
    • Love x 26
    • Like x 18
    • Helpful x 8
  18. croman

    croman Active Member

    Joined:
    Nov 21, 2016
    Messages:
    4,624
    Location:
    Chicago, IL
    @jimmy_d -- we are so spoiled to have you help us by taking the time to write such informative and timely posts. Thanks so much!
     
    • Like x 19
  19. scottrobertson

    Joined:
    Jul 4, 2017
    Messages:
    130
    Location:
    United Kingdom
    @jimmy_d awesome stuff! Do you happen to know if they are using vision for the auto high-beam? And if so, have there been any changes? It's pretty terrible right now, and i would love to know if they are activity working on it.
     
    • Like x 2
  20. buttershrimp

    buttershrimp Click my signature to Go Mad Max Mode

    Joined:
    Jun 17, 2017
    Messages:
    2,656
    Location:
    ATX
    Jimmy, you are a freaking genius
     
    • Like x 6
    • Love x 2

Share This Page

  • About Us

    Formed in 2006, Tesla Motors Club (TMC) was the first independent online Tesla community. Today it remains the largest and most dynamic community of Tesla enthusiasts. Learn more.
  • Do you value your experience at TMC? Consider becoming a Supporting Member of Tesla Motors Club. As a thank you for your contribution, you'll get nearly no ads in the Community and Groups sections. Additional perks are available depending on the level of contribution. Please visit the Account Upgrades page for more details.


    SUPPORT TMC