One of the main take-aways for me is that HW3 is on track for end of Q1 – *this is crazy soon*. Andrej Karpathy talked about how the 10x increase allows for the deployment of a much larger (and more accurate) neural network.
@neroden, I remember you posted this in the luvb2b-thread about your pessimism for FSD, and have been brewing on a response, so I hope it's cool with y'all than I clutter the celebration up with some attempted substance:
I think you might be missing some knowledge *collective gasp* about deep learning and the techniques Tesla and specifically Andrej Karpathy bring to the table. If one had to specify the problem specifically (as one normally does in CS), you would be right, that would take forever – but one of the major things deep learning accomplishes is *problem specification*. It figures out how to best approximate the problem itself.
One of the key factors in deep learning though – besides the specific architecture of the network – is figuring out how you can measure *when the network makes a mistake*. This is represented by the so-called "loss-function", which tells the network how wrong it was according to the desired output. It can then update the weights on each neuron to better approximate the desired result. The difficulty now though, is figuring out *when you are wrong*.
This, however, is quite a bit more simple than specifying self-driving - and shadow mode is exactly this. It enables training of a network against baseline "perfect" human behaviour. They do other stuff, of course, but shadow mode has the potential to be immensely helpful.
In case you guys haven't read Andrej Karpathys thoughts on software 2.0 (you might have seen his Autopilot lecture, which covers some of the same ground), this is a good primer – albeit a bit old:
Software 2.0 – Andrej Karpathy – Medium
Basically the FSD network will do something along these things:
Labeling each image > estimating 3D position of objects > figure out how to drive (okay simplifying here)
Each of these components are either independant software 2.0 processes, as described in the Medium post, or may even be merged into one network. I recall Andrej mentioning before that he believed "a single network to rule them all" was superior, but this was back when he was a phd and he was very coy about the techical details. He might have been working on some paper that never saw the light because he started working for Tesla. He might still believe this though – and the recent architecture improvements to the NN actually seem to suggest that indeed he still does.
Regardless, I think Tesla is (and has been for a long time) on the absolute forefront on applied deep learning and the sudden exponential increase that has happened to other problems attempted by deep learning, seem to be on the cusp of happening with FSD. Not that we'll be there in Q1 next year, but things seem to really be picking up steam, so I doubt it'll be a decade.