I was completely blown away by a post over at reddit yesterday.
A guy called
PM_YOUR_NIPS_PAPER - who says he is / was training Tesla's computer vision network - claimed:
I don't know if this guy's for real or not. Nevertheless, I find what he's saying a bit scary. Could it be true? What pieces of evidence or indications do we have either way?
A quick Google search tells me there are several software tools out there that could be used for object tracking / annotation. Even YouTube seems to have a tool for this.
YouTube-BB Dataset | Google Research
ViTBAT - Video annotation tool
Random video:
I really hope this isn't the case. Unfortunately that dude's post got me thinking ...
Tesla Self-Driving Demonstration
I think it helps to break the statement down into parts.
First thing to keep in mind is that there were
two videos. The first one has already been documented to have been done with many runs and disengagements, while second video had no disengagements. I exhaustively documented all the details of the timeline here:
FSD may require a hardware upgrade...
That video was pretty much faked. The car was hard-coded with specific GPS positions and (x,y,z) locations of traffic lights on a very controlled route.
First of all, doing this does not conflict with the self driving definition. The SAE definition excludes strategic effort from the dynamic driving task (page 29): "Strategic effort involves trip planning, such as deciding whether, when and where to go, how to travel,
best routes to take, etc. ... The definition of DDT provided above (3.4) includes tactical and operational effort but
excludes strategic effort."
Also, pre-knowledge of traffic light locations is quite common in self driving cars (you should know this already from all the previous discussions we have had).
https://wiki.unece.org/download/attachments/40009763/(ITS_AD-10-08) SAE_J3016_Taxonomy and Definitions for Terms Related to Driving Automation Systems.pdf?api=v2
They held 3-4 trial runs and picked the one that worked the best.
How to parse this depends on which video this refers to. The first video we already know that they did multiple runs (most of the days had rain) and picked the one that didn't have any disengagements. There was even speculation that the video was pieced together from multiple runs (but no evidence).
If this refers to the second video, we know there were no disengagements, so doing multiple takes is likely just the standard convention in commercial video to have multiple takes (so you have choices to pick even if all of them were "good" takes). That one did have some interesting parts like it slowing down for some joggers.
All the rectangular bounding boxes and colored line/region segmentations on the "AI view" were done after the car had made the trip, for visualization and hype purposes.
I don't find this surprising. If you zoomed in on the video, you will notice that in the car there was none of that visualization, but rather the same UI as in regular AP. The AI doesn't need that visualization to operate, so that they spent no effort on the FSD UI for
human visualization doesn't really tell you much.
Rather, the core question is if the car was doing any of the object detection/classification, lane line mapping, and road surface pixel labeling under the hood. If not, then that would be worrisome. However, the person never addressed this point directly.
I can't exactly say my sources, but if that video was true over a year ago, why is the current autopilot muchworse than what is portrayed in the video?
One easy answer to this question is the two have different codebases or FSD is in a branch that has yet to be released. From what
@verygreen released so far, AP2 code so far appears to be mainly emulating the functionality of the old mobileeye chip, which doesn't make sense as an architecture for FSD. The emulation is probably so they can more easily maintain both AP1 and AP2 (don't forget they still have to support AP1 owners even though Tesla broke with Mobileye).