Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.

lunitiks

Cool James & Black Teacher
Nov 19, 2016
2,698
5,996
Prawn Island, VC
I was completely blown away by a post over at reddit yesterday.

A guy called PM_YOUR_NIPS_PAPER - who says he is / was training Tesla's computer vision network - claimed:
That video was pretty much faked. The car was hard-coded with specific GPS positions and (x,y,z) locations of traffic lights on a very controlled route. They held 3-4 trial runs and picked the one that worked the best.

All the rectangular bounding boxes and colored line/region segmentations on the "AI view" were done after the car had made the trip, for visualization and hype purposes.

I can't exactly say my sources, but if that video was true over a year ago, why is the current autopilot much worse than what is portrayed in the video?

I don't know if this guy's for real or not. Nevertheless, I find what he's saying a bit scary. Could it be true? What pieces of evidence or indications do we have either way?

A quick Google search tells me there are several software tools out there that could be used for object tracking / annotation. Even YouTube seems to have a tool for this.

YouTube-BB Dataset | Google Research
ViTBAT - Video annotation tool

Random video:

I really hope this isn't the case. Unfortunately that dude's post got me thinking ...

Oahi-lit-firebrand.jpg


Tesla Self-Driving Demonstration
 
Last edited:
I was completely blown away by a post over at reddit yesterday.

A guy called PM_YOUR_NIPS_PAPER - who says he is / was training Tesla's computer vision network - claimed:


I don't know if this guy's for real or not. Nevertheless, I find what he's saying a bit scary. Could it be true? What pieces of evidence or indications do we have either way?

A quick Google search tells me there are several software tools out there that could be used for object tracking / annotation. Even YouTube seems to have a tool for this.

YouTube-BB Dataset | Google Research
ViTBAT - Video annotation tool

Random video:

I really hope this isn't the case. Unfortunately that dude's post got me thinking ...

Oahi-lit-firebrand.jpg


Tesla Self-Driving Demonstration

I’d be surprised if it were otherwise, and never assumed it was. It was a marketing demo.... basically a commercial. They didn’t claim to have a functional system at that time. And at no point did they make any statements as to the techniques used to have th car drive the route.
 
I truly hope the boxes were at least made by the same algorithm that drove the car - even if done in post-processing. If they were made on some completely different piece of software, that would totally be faking it.

As for the use of co-ordinates, I don't automatically consider that faking. If the car didn't have the ability to follow the navigation system yet, so be it. Now hard-coding traffic lights etc. certainly does seem like a mapping approach, not what people tend to attribute to Tesla... ;)
 
I kind of think I know what you mean, but I don't have a clue how they would manage that. Look at the Demo video - it's not just bounding boxes. The "car" is even "detecting" trees!!!

Drive the footage recorded by the car through the same visual NN in post-processing and use it to apply labels?

That would of course require their then-FSD codebase to recognize all those things to have been possible.
 
And would the car "detect" road surface and lane lines through a car?

Object permanence: the lines don't stop existing because a car is blocking them.
Real world knowledge: lines are fairly straight and can be extrapolated.
Motive data: the white arrows show displacement in the image, and the lines track those also.

Regarding Reddit post: high resolution mapping is how lidar works, so even if they did use high grade maps, what's the issue? (the use of it in DARPA Grand challenge was sort of lame IMHO)

Mapping intersections also seems like a good idea. As a human driver, I unknowingly blew through a traffic control signal in Indianapolis due to its position mounted to the bottom on a skywalk.

My laptop can do the bounding box things and motion map, no need for post process.

There is also the spot where the car slows by people near the side of the road. Not post processed.


What would be missing is the ability to take a brand new route, which gives the other remapping systems trouble, I think.

If I were trying to fake a Tesla self driving video, I'd just put a set of driving controls in the rear seat and tint the windows. What is claimed sounds like progress beyond that level.

Drove on path
Found and obeyed traffic lights.

As to full vs enhanced issue. If the FSD's changes were to be included/ linked with EAP, then every change/ attempt would need full regression testing. And changes to EAP, while better for it, might not be what helps FSD. Better to apply the modifications that best improve EAP to EAP, while doing all the tear up needed to allow FSD to achieve its goals.

Analogy warning: a golfer averages 20 over, do you force them to relearn the entire mechanics to get to par (while scoring worse in the interim) or give them pointers to improve their game in the short term?
 
This is kind of hard to phrase correctly, but isn’t the point of the Visualization to portray exactly what the car dectects? I mean, shouldn’t the Visualization video be a direct function of the car’s neural net input and output?

To put it another way: Wouldn’t running mpeg-video recordings through some generic labeling/tracking software after the fact, be considered cheating?

Not saying Tesla did it, but IF they did?
 
To put it another way: Wouldn’t running mpeg-video recordings through some generic labeling/tracking software after the fact, be considered cheating?

Yes, it would be cheating if they did it like that.

The only way doing this in post-processing to not be cheating is if the post-processing also ran on AP2 equivalent hardware/software and used the same video etc. data as the car - and it was just a technical solution to run the visual marking process after the fact, not some completely separate visual NN...
 
This is kind of hard to phrase correctly, but isn’t the point of the Visualization to portray exactly what the car dectects? I mean, shouldn’t the Visualization video be a direct function of the car’s neural net input and output?

To put it another way: Wouldn’t running mpeg-video recordings through some generic labeling/tracking software after the fact, be considered cheating?

Not saying Tesla did it, but IF they did?

The is no reason for the onboard software to render a visualization of the neural network output. The onboard output needs to be something like, "A stop sign has been detected 250 ft ahead on the current route". The route planner can then adjust the vehicle speed as required.
 
The is no reason for the onboard software to render a visualization of the neural network output. The onboard output needs to be something like, "A stop sign has been detected 250 ft ahead on the current route". The route planner can then adjust the vehicle speed as required.

True. Hence doing it in post-processing is just fine IF it is an accurate representation of how the car internally labelled things during its drive.

If the car did not label things like that internally (don't mean video rendition, but internal "seeing"), then it would be a misrepresentation and cheating.

I still think Tesla probably just used e.g. Nvidia's code on a hardcoded route, so nothing that could be generalized but OTOH it really happened... and otherwise did not "cheat".

Maybe I'm naive after everything that has happened... my bad, if so.
 
Boxing threats is for driver's warning systems like night vision, pedestrian alert, and cross-traffic alert.
Since the seat vibrates, and/or an audio warning comes on, you need to see WHERE the deer, pedestrian, or car is to determine a correct course of action.

Now if FSD has a driver's warning system, then sure, those boxes have value. But with AV, it's the threat it CAN'T evaluate that is the issue, not the ones it can. Can't turn on a warning buzzer without knowing there is an obstacle.
 
Last edited: