TMC is an independent, primarily volunteer organization that relies on ad revenue to cover its operating costs. Please consider whitelisting TMC on your ad blocker and becoming a Supporting Member. For more info: Support TMC

How does "fleet learning" work?

Discussion in 'Model S' started by sandpiper, Oct 24, 2016.

  1. sandpiper

    sandpiper Active Member

    Joined:
    Sep 25, 2014
    Messages:
    2,012
    Location:
    Ontario, Canada
    Apologies if this is in the wrong section....

    I have a few questions about how this fleet learning works, at a technical level, and I'm wondering if somebody out there might be able to shed some light?

    I get how these deep neural networks can be trained to drive by following a human driver. But to do that requires a LOT of information (and bandwidth) and a lot of processing power. I doubt that the cars have the processing power to do any meaningful training in-car, and so that implies that they must send information back to the mother ship to help refine the training.

    So...

    1. What do the cars send back? Clearly the cars can't all be sending continuous raw video and sensor data back. The bandwidth demands would be absurd.

    2. Is the data limited simply to the location and correct response to sensed/known obstacles and road features? That would seem to be useful but... more minor. It would seem to not be as helpful when the car is learning more complex behaviors: navigating intersections, construction areas and parking lots; learning how to deal with on-road obstacles like birds, junk dropped off of trucks, and so on.

    3. Are the cars smart enough to send back detailed video on exceptional circumstances, like dropped objects and so-on?

    Just curious.
     
  2. calisnow

    calisnow Active Member

    Joined:
    Oct 11, 2014
    Messages:
    2,227
    Location:
    Los Angeles
    You're not getting any traction OP because this topic has been discussed and debated extensively many times here already. A quick search of the forum should bring up threads for you. If you can't find them I'll dig some up.
     
  3. Bladerskb

    Bladerskb Like how many times do i have to be right?

    Joined:
    Oct 24, 2016
    Messages:
    601
    Location:
    Michigan
    The car with 8 cameras can creates a holistic 3d view of its world in real world with annotations.

    IF an interesting situation were to take place. Or an interesting intersection or stretch of road were marked on the map for recording. All that needs to happen is for the car to record the last 30 seconds of the cars encounter and send it over to Tesla HQ.

    But wait, you don't have to send raw video over. You people keep forgetting that the Nvidia PX2 runs the DNN in real time and already processes the raw video data.

    Tesla would only need the meta data. What do i mean by meta data? everything already tagged and processed by the DNN and includes numerical and temporal representation of the space around the car, the coordinates of the objects(other cars/obstacles/pedestrain/traffic signs/lanes/road edges/etc) around the car and their velocity, and more.

    This immense data could be less than 1MB and can then be loaded into Tesla's simulator.

    They will have metadata of the exact position of every traffic light in every city. They will have metadata of position of traffic signs, stop signs, speed limits, lane markings, etc

    They will have data of every parking place and spot a tesla car ever parked in during manual mode.

    So if a tesla drives to and parks in McDonald or any business in manual mode. The car will save data of the parking structure and its spots and how exactly to navigate in and out of it and beams it up to HQ.

    if the car were to encounter a place it fails at during shadow mode. it simply does what xbox does. Records the last 30 seconds and beams it up to HQ.
     
    • Informative x 1
  4. calisnow

    calisnow Active Member

    Joined:
    Oct 11, 2014
    Messages:
    2,227
    Location:
    Los Angeles
    Nobody actually knows for sure. Mobileye had "curated" vision for tagging objects - the learning was done at Mobileye and downloaded to Teslas. Tesla however did its own sensor fusion and decision making.

    The *new* Tesla system runs on NVIDIA's Drive PX2 - and yes, NVIDIA claims it is capable of doing unsupervised learning - and NVIDIA gives OEM's a software stack for learning. *However, in the case of Tesla, Tesla claims they wrote all their own software and it can run on anyone's hardware. Whether or not the object recognition in Tesla's new software is all unsupervised learning - nobody at Tesla has publicly said, as far as I know.
     
    • Informative x 1
  5. sandpiper

    sandpiper Active Member

    Joined:
    Sep 25, 2014
    Messages:
    2,012
    Location:
    Ontario, Canada
    Sorry... accidentally cut-off my post. But you answered my question anyway!

    It would be fascinating to understand how Tesla includes this, presumably huge, stream of data into their training process. If you imagine that each jurisdiction has unique standards for traffic signs, road markers, pseudo traffic (handicapped for example) signs, etc... I struggle to believe that it would be possible for the system to learn all of these variations without some sort of directed training process.

    And is the implication, then, that system would be structured in two levels? The lower level DNN is responsible for recognizing and tagging the objects in the environment using video and sensors? And then a second level DNN is fed the metadata ?vector? and the high level navigation intent vector and then makes driving output command decisions?

    Or is that too simplistic?
     
  6. calisnow

    calisnow Active Member

    Joined:
    Oct 11, 2014
    Messages:
    2,227
    Location:
    Los Angeles
    Yes it would be fascinating but the trouble is that most of this cutting edge research is not happening at universities that are publishing papers, but at auto companies with a strong incentive to keep their secret sauce secret.

    Your two level model sounds plausible to me but I'm no computer scientist.
     
    • Informative x 1
  7. sandpiper

    sandpiper Active Member

    Joined:
    Sep 25, 2014
    Messages:
    2,012
    Location:
    Ontario, Canada
    And further, I struggle to believe that they wouldn't have some sort of procedurally programmed input for traffic rules. Such as... "no right turn on red in this state". Or "don't park within X meters of a stop light". That's not the sort of thing that you'd want to learn the hard way.

    I suppose they could have include a value for traffic tickets in the performance scoring function. :)
     
  8. maxell2hd

    maxell2hd Member

    Joined:
    Oct 24, 2016
    Messages:
    8
    Location:
    Canada
    Nvidia has a paper that briefly describes the data training process. It seems far from unsupervised:

    http://images.nvidia.com/content/tegra/automotive/images/2016/solutions/pdf/end-to-end-dl-using-px.pdf
     
    • Helpful x 1
    • Informative x 1
  9. abasile

    abasile Independent Software Eng.

    Joined:
    Oct 21, 2012
    Messages:
    533
    Location:
    San Bernardino Mts., CA (Elev. 6100' / 1800m)
    That seems to be the "traditional" approach and is certainly what would make sense to me. The idea is that object detection would need to take place at a relatively low level, to "boil down" all of the raw sensor data, essentially gobs of pixels, to higher level constructs. This would include geo-rectifying the image/video data.

    Wow, in that paper, a CNN (convolutional neural network) is used to "map raw pixels from a single front-facing camera directly to steering commands". On the face of it, this seems quite simple, yet could be feasible only because of recent improvements in GPU-based processing. Tesla is using eight cameras pointing in all directions, so Tesla's approach would obviously be more complex.

    Personally, I find the whole idea of CNNs quite interesting, as this could reduce the need for explicit object detection. As that paper points out, a good deal of experimentation would be needed to choose optimal kernels. And the initial normalization phase would be pretty important to nail down, as one would not want to have to re-train the entire system every time the camera hardware changes slightly.

    That said, I agree with sandpiper that some rule-based decision making would be needed for traffic laws and the like. To support that, objects like speed limit signs, lane markings, and traffic lights would need to be recognized explicitly. I wouldn't be surprised if Tesla utilizes multiple, separate neural networks (including CNNs) in parallel, with perhaps a rule-based system to integrate everything at the top level.
     

Share This Page