I was completely blown away by a post over at reddit yesterday. A guy called PM_YOUR_NIPS_PAPER - who says he is / was training Tesla's computer vision network - claimed: I don't know if this guy's for real or not. Nevertheless, I find what he's saying a bit scary. Could it be true? What pieces of evidence or indications do we have either way? A quick Google search tells me there are several software tools out there that could be used for object tracking / annotation. Even YouTube seems to have a tool for this. YouTube-BB Dataset | Google Research ViTBAT - Video annotation tool Random video: I really hope this isn't the case. Unfortunately that dude's post got me thinking ... Tesla Self-Driving Demonstration
I’d be surprised if it were otherwise, and never assumed it was. It was a marketing demo.... basically a commercial. They didn’t claim to have a functional system at that time. And at no point did they make any statements as to the techniques used to have th car drive the route.
I truly hope the boxes were at least made by the same algorithm that drove the car - even if done in post-processing. If they were made on some completely different piece of software, that would totally be faking it. As for the use of co-ordinates, I don't automatically consider that faking. If the car didn't have the ability to follow the navigation system yet, so be it. Now hard-coding traffic lights etc. certainly does seem like a mapping approach, not what people tend to attribute to Tesla...
I kind of think I know what you mean, but I don't have a clue how they would manage that. Look at the Demo video - it's not just bounding boxes. The "car" is even "detecting" trees!!!
Drive the footage recorded by the car through the same visual NN in post-processing and use it to apply labels? That would of course require their then-FSD codebase to recognize all those things to have been possible.
Object permanence: the lines don't stop existing because a car is blocking them. Real world knowledge: lines are fairly straight and can be extrapolated. Motive data: the white arrows show displacement in the image, and the lines track those also. Regarding Reddit post: high resolution mapping is how lidar works, so even if they did use high grade maps, what's the issue? (the use of it in DARPA Grand challenge was sort of lame IMHO) Mapping intersections also seems like a good idea. As a human driver, I unknowingly blew through a traffic control signal in Indianapolis due to its position mounted to the bottom on a skywalk. My laptop can do the bounding box things and motion map, no need for post process. There is also the spot where the car slows by people near the side of the road. Not post processed. What would be missing is the ability to take a brand new route, which gives the other remapping systems trouble, I think. If I were trying to fake a Tesla self driving video, I'd just put a set of driving controls in the rear seat and tint the windows. What is claimed sounds like progress beyond that level. Drove on path Found and obeyed traffic lights. As to full vs enhanced issue. If the FSD's changes were to be included/ linked with EAP, then every change/ attempt would need full regression testing. And changes to EAP, while better for it, might not be what helps FSD. Better to apply the modifications that best improve EAP to EAP, while doing all the tear up needed to allow FSD to achieve its goals. Analogy warning: a golfer averages 20 over, do you force them to relearn the entire mechanics to get to par (while scoring worse in the interim) or give them pointers to improve their game in the short term?
It is common practice to post-process a video that shows the output of deep learning algorithms for visualization purposes. For example, the onboard system does not need to plot GPS Lat/Long on a map to use the position information.
This is kind of hard to phrase correctly, but isn’t the point of the Visualization to portray exactly what the car dectects? I mean, shouldn’t the Visualization video be a direct function of the car’s neural net input and output? To put it another way: Wouldn’t running mpeg-video recordings through some generic labeling/tracking software after the fact, be considered cheating? Not saying Tesla did it, but IF they did?
Me too. After all, Nvidia has been showing this off already in 2015... Model X mule(s) show signs of nVidia Tegra X1 Drive PX platform - no rear mirror!
Yes, it would be cheating if they did it like that. The only way doing this in post-processing to not be cheating is if the post-processing also ran on AP2 equivalent hardware/software and used the same video etc. data as the car - and it was just a technical solution to run the visual marking process after the fact, not some completely separate visual NN...
The is no reason for the onboard software to render a visualization of the neural network output. The onboard output needs to be something like, "A stop sign has been detected 250 ft ahead on the current route". The route planner can then adjust the vehicle speed as required.
True. Hence doing it in post-processing is just fine IF it is an accurate representation of how the car internally labelled things during its drive. If the car did not label things like that internally (don't mean video rendition, but internal "seeing"), then it would be a misrepresentation and cheating. I still think Tesla probably just used e.g. Nvidia's code on a hardcoded route, so nothing that could be generalized but OTOH it really happened... and otherwise did not "cheat". Maybe I'm naive after everything that has happened... my bad, if so.
I always figured that they manually did things that they assumed they could automate later. For me as long as they didn't have a guy in back driving the car and the hard coded stuff could come from automatic HD mapping then it would be an honest demo.
Boxing threats is for driver's warning systems like night vision, pedestrian alert, and cross-traffic alert. Since the seat vibrates, and/or an audio warning comes on, you need to see WHERE the deer, pedestrian, or car is to determine a correct course of action. Now if FSD has a driver's warning system, then sure, those boxes have value. But with AV, it's the threat it CAN'T evaluate that is the issue, not the ones it can. Can't turn on a warning buzzer without knowing there is an obstacle.
Yes @AnxietyRanger, it did really happen. So I guess my thread title is a little misleading Should've called it "FSD Demo Video Visualization Graphics Completely Fake"? That was IMO the most impressive part of the whole video. Now I don't know how to feel anymore, I'm just confused
Obviously it is a preplanned route with a lot of hard coding. Afterall it is a future capability demo. The visualizations are a representation of what the system infers. Very similar to the color images we see in astronomy of nebulae. As long there is no human driving it in REAL TIME, sitting inside the car or even remotely it is not fake.