Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Waymo’s “commercial” ride-hailing service is... not yet what I hoped

This site may earn commission on affiliate links.
What I do not think happens is consumer cars recording driver inputs and camera inputs and then using those to train NNs to drive or even positively see. I don’t think the cars learn as such.

Let's make this more clear. Do you mean to train each individual car how to drive or to train the NN program? If it's the later I can't imagine anyone developing NN not to use such a data advantage. Waymo had even been using simulated drive data to train their NN. I'm sure they would be very excited if they could get their hands on such quality real world data. Even for the former there are probably some degrees of that happening. We all know certain AP functions are not available until the car has been driven for some miles.
 
I'm sure they would be very excited if they could get their hands on such quality real world data.
It looks like you don't really realize how mostly poor Tesla "real world data" is.

We all know certain AP functions are not available until the car has been driven for some miles.
This is just working out each individual car's camera (mis)alignment because they cannot get any sort of reasonable tolerance at install time (don't tell me you already forgot how they even needed to increase tolerances in software to clip even more of the image out because it was wildly outside of what they imagined as possible).
 
Do yourself a favor and search Tesla videos online before making your silly comments again.
Not even a link this time? If you go to the official Tesla youtube channel you will quickly find out that there's nothing of substance there. The autopilot section is mostly stuffed with old fluff videos of 2+ years of age with a notable exception of a video-manual of how to use navigate on autopilot that's 4 weeks old.

Since you appear to prefer non-forum posts as arguments, this post has a pretty coherent explanation why Tesla fleet "valuable footage" is actually mostly garbage: Why Tesla's FSD Approach is Flawed
 
Not even a link this time? If you go to the official Tesla youtube channel you will quickly find out that there's nothing of substance there. The autopilot section is mostly stuffed with old fluff videos of 2+ years of age with a notable exception of a video-manual of how to use navigate on autopilot that's 4 weeks old.

Since you appear to prefer non-forum posts as arguments, this post has a pretty coherent explanation why Tesla fleet "valuable footage" is actually mostly garbage: Why Tesla's FSD Approach is Flawed

WT is that? Waymo gets quality mapping data from Waze users? Huh? And he knows neural net could solve 99% or 99.9% of it but not 99.99% or 99.9999%? To be honest I'd rather wasting my time to read your fabricated stories than to read article like that. At least saves me a click.
 
WT is that? Waymo gets quality mapping data from Waze users? Huh? And he knows neural net could solve 99% or 99.9% of it but not 99.99% or 99.9999%? To be honest I'd rather wasting my time to read your fabricated stories than to read article like that. At least saves me a click.
let's not move goal posts, ok? The topic at hand is the wealth of data Tesla cars upload to Tesla that are then used to train NNs. The problem - Tesla has little control over what the cars upload and then need a lot of labo to actually label messages (and otherwise select whic ones are good and which ones are bad for inclusion into training sets.

The mapping data is a separate issue and Tesla does get some valuable stuff there, though not as much as people like Waze who have more users hence more data.
 
You know Waze only gets location data and location derived speed from phones do you? That article was a bigger joke than even some of those things said here.

Since one of the subjects here is whether Tesla gets quality data from its cars and how it is utilizing that here is what's from the horse's mouth. Chris Lattner wrote this, since deleted but luckily some have recorded it, after he left Tesla so there is no motivation for him to cover for Tesla. Mater of fact if there is a motivation it will only be to bad mouth Tesla like a lot of people have motivation to.
"One of Tesla's huge advantages in the autonomous driving space is that it has tens of thousands of cars already on the road," Lattner wrote. "We built infrastructure to take advantage of this, allowing the collection of image and video data from this fleet, as well as building big data infrastructure in the cloud to process and use it."
An ex-Tesla exec reveals how the company is transforming itself into a data powerhouse
 
Last edited:
You know Waze only gets location data and location derived speed from phones do you? That article was a bigger joke than even some of those things said here.

Since one of the subjects here is whether Tesla gets quality data from its cars and how it is utilizing that here is what's from the horse's mouth. Chris Lattner wrote this, since deleted but luckily some have recorded it, after he left Tesla so there is no motivation for him to cover for Tesla. Mater of fact if there is a motivation it will only be to bad mouth Tesla like a lot of people have motivation to.
"One of Tesla's huge advantages in the autonomous driving space is that it has tens of thousands of cars already on the road," Lattner wrote. "We built infrastructure to take advantage of this, allowing the collection of image and video data from this fleet, as well as building big data infrastructure in the cloud to process and use it."
An ex-Tesla exec reveals how the company is transforming itself into a data powerhouse
None of that matters though.

Yes, they built infrastructure to collect the data and store it. I can confirm it. He does not say they ARE collecting significant amounts of data, he does not say they boosted their numbers to go through those immense numbers of images (needs lots of labor) they are all ready to collect.

As for the waze, when your goal is to see where the traffic jams and routes people travel are, you only need data and location. And Tesla is doing this too (as a separate effort), this separate mapping effort as is outlined in the post is ok in quality of data, it's the visual data collection that people tend to overvalue.
 
he does not say they boosted their numbers to go through those immense numbers of images (needs lots of labor) they are all ready to collect.

The Information reported that “Tesla pays firms that employ people to digitally label the cars and other objects within that imagery.”

Tesla could have a large number of contractors and it would be indiscernible in its R&D spending. 5,000 people * $15/hour * 40 hours/week * 50 weeks/year / 4 quarters per year = $37.5 million per quarter
 
The Information reported that “Tesla pays firms that employ people to digitally label the cars and other objects within that imagery.”

Tesla could have a large number of contractors and it would be indiscernible in its R&D spending. 5,000 people * $15/hour * 40 hours/week * 50 weeks/year / 4 quarters per year = $37.5 million per quarter
Ok. Though label cars and other object is not the same as evaluating what happened and such, that would probably cost more than $15/hour.
 

“If your lidar sensor says there's nothing in your path, then there's nothing in your path, especially when you have two independent sensors looking in the direction of travel.”​

If this is true, why did that Uber prototype in Arizona kill that pedestrian?

”Tesla's fleet advantage is no advantage at all. You can easily collect road imagery for less than $1/mile.​

The metric that matters most is not sheer volume of images, but (labelled) images of unique objects per semantic class. Variety and rarity of data is important too, not just volume.
 
Ok. Though label cars and other object is not the same as evaluating what happened and such, that would probably cost more than $15/hour.

I’ve tried to look up wages for this kind of work on Glassdoor but couldn’t find anything.

The average entry-level wage for a paralegal is $26/hour. At that wage, 5,000 image labellers working 40 hours a week, 50 weeks a year, would cost $65 million per quarter. That’s 18.5% of the Q3 2018 R&D budget.

I haven’t found a good estimate of how long it takes to do semantic segmentation labelling for a single still image or a frame of video. That would help us estimate throughput. Say it takes 30 minutes per image. With 5,000 labellers working 10 million hours per year, the throughput would be 20 million images semantically segmented per year.

Of course, if it’s 3 minutes per image then it’s 200 million images, and if it’s 300 minutes, it’s 2 million images. So I would like very much to find a number from a reliable source for that key variable.
 
If this is true, why did that Uber prototype in Arizona kill that pedestrian?
because the sensor could still be ignored? I think that's what happened there? We don't have a way to independently verify this anyway unless you happen to have the raw data and can share?

The metric that matters most is not sheer volume of images, but (labelled) images of unique objects per semantic class. Variety and rarity of data is important too, not just volume.
True. So if you have a (trained) person driving and press a button any time something strange happens (Even better if there are multiple buttons for different strange things) vs car randomly or semi-randomly deciding to take fixed-length fragments of video and pictures, which one would you think result in better quality per mile driven and quality per frame data? Assume that per-frame analysis is going to be the same in both cases.

The average entry-level wage for a paralegal is $26/hour. At that wage, 5,000 image labellers working 40 hours a week, 50 weeks a year, would cost $65 million per quarter. That’s 18.5% of the Q3 2018 R&D budget.
Don't forget benefits (that's not part of hourly wages).

Depending on complexity and quality of the labeling might be quite a skilled task and as such priced accordingly too (I've no idea either way, the "highlight all cars in this frame" type is most likely low skill and relatively cheap)
 

A few things that bothered about this blog post:
  • the author complains that Jimmy isn’t a domain expert, but the author also appears not to be a domain expert (for all I know, Jimmy has far more knowledge and experience with neural nets than the author)
  • the author makes assertions about the level of accuracy needed from neural networks and the fundamental upper limit of neural network accuracy without (in my opinion) doing enough to clarify and justify these claims
  • on a similar note, the author seemingly self-contradictorily says that neural networks aren’t good enough for full self-driving while also saying that full self-driving is coming “perhaps sooner than people expect” — maybe the author is overlooking the problem that lidar can’t reliably identify lane lines, traffic signs, traffic lights, and other features of the environment that don’t have depth
  • the author cites a casual, non-scientific analysis of fatality rates from random anonymous people, including a Twitter troll
  • the author claims, without evidence, that partial autonomy isn’t safer, despite two independent findings from the IIHS and NHTSA that HW1 Autopilot is correlated with reduced insurance claims and reduced airbag deployments, respectively
  • the tone of the post is rude and condescending
 
Last edited:
A few things that bothered about this blog post:
  • the author complains that Jimmy isn’t a domain expert, but the author also appears not to be a domain expert (for all I know, Jimmy has far more knowledge and experience with neural nets than the author)
  • the author makes assertions about the level of accuracy needed from neural networks and the fundamental upper limit of neural network accuracy without (in my opinion) doing enough to clarify and justify these claims
  • on a similar note, the author seemingly self-contradictorily says that neural networks aren’t good enough for full self-driving while also saying that full self-driving is coming “perhaps sooner than people expect” — maybe the author is overlooking the problem that lidar can’t reliably identify lane lines, traffic signs, traffic lights, and other features of the environment that don’t have depth
  • the author cites a casual, non-scientific analysis of fatality rates from random anonymous people, including a Twitter troll
  • the author claims, without evidence, that partial autonomy isn’t safer, despite two independent findings from the IIHS and NHTSA that HW1 Autopilot is correlated with reduced insurance claims and reduced airbag deployments, respectively
  • the tone of the post is rude and condescending
I just linked the post because I agree with the "potential value of data that Tesla can get from it's cars for NN-training purposes (not mapping!) is way overestimated" part of it and don't need to explain it myself for the Nth time. It was obvious to me from the early on it was mostly useless without huge labor investment AND Tesla behavior sort of confirmed it (enabling wide random data collection and then quickly curtailed it. You can argue they were just trying to prove their infrastructure, I guess, but I think that's only part of reason).

There are some inaccuracies in the argument that Tesla has no control over data collected (they do via triggers), but otherwise I think it's mostly good. The fantasies of people that think reinforced learning blah blah and superior labeling NNs would train themselves are unfounded. Those same people also often bring generated worlds into this without seemingly understanding that that undermines the original argument of advanced NNs doing their thing by "just observing".
 
because the sensor could still be ignored? I think that's what happened there?

Yeah, that’s what was reported. If true, it shows that the author’s claim that lidar provides certainty with regard to obstacle detection is incorrect.

Depending on complexity and quality of the labeling might be quite a skilled task and as such priced accordingly too (I've no idea either way, the "highlight all cars in this frame" type is most likely low skill and relatively cheap)

Yeah, I agree. Semantic segmentation labelling and bounding boxes labelling doesn’t seem more complex than paralegal work. It seems like you are fundamentally just drawing boxes and labelling pixels in an image: road, sidewalk, car, pedestrian, etc. I could be wrong, but it seems that way to me.

I have no idea what kind of labelling might be used for behaviour prediction, or for path planning. Or how complex it is.


True. So if you have a (trained) person driving and press a button any time something strange happens (Even better if there are multiple buttons for different strange things) vs car randomly or semi-randomly deciding to take fixed-length fragments of video and pictures, which one would you think result in better quality per mile driven and quality per frame data?

Waymo: ~11 million miles
Tesla: ~2.8 billion HW2 miles

So, Tesla is working with ~250x more miles. If HW2 Teslas are only 1/125th as good as Waymos at capturing interesting data on a per mile basis, on an aggregate basis Tesla has 2x more interesting data from its HW2 fleet.

For some purposes, I don’t think the data has to be particularly strange or interesting. You just need a lot of images of unique objects in various semantic classes, e.g. cars, semi trucks, pedestrians, crosswalks, stop signs.

I wonder if Waymo can use Google Street View images for this purpose. I don’t know whether it would need to or prefer to use images captured by the same camera configuration as is used in Waymo vehicles. Apparently self-driving cars can be sensitive to that; Cruise reportedly was having trouble just switching from one version of is test vehicle to the next.

Although now that I know Street View was compiled with 10 million miles of driving, I’m rethinking the ceiling on image collection...

Apparently there are 4.1 million miles of roadways in the United States. So you could drive every roadway in both directions in 8.2 million miles, and it would take no more than 32.8 million miles (4.1 * 8) to drive every lane. With 98.4 million miles, you could drive every lane at least three times.

For the U.S., the ceiling for static features of the environment should be 100 million miles. At an average of 25 mph, driving 100 million miles would take 4 million hours. If you paid people $25/hour to do this, it would cost $100 million. For $100 million, you could capture comprehensive images of every roadway in the United States. Hm.

To do 8.2 million miles, it would cost $8.2 million. In theory, that would be enough to capture one or two images of every unique fixed, static object on U.S. roadways.

I wonder if any company like Waymo, Tesla, Mobileye, etc. has already done this? I guess it might be pointless because the volume might be too high.

8.2 million miles / 25 miles per hour = 328,000 hours, or 1.18 billion seconds

1.18 billion seconds * 30 frames per second = 35.4 billion frames

30 minutes to annotate each frame -> 17.7 billion hours of labour

5 minutes to annotate each frame -> 2.9 billion hours of labour

1 minute to annotate each frame -> 590 million hours of labour

If you were able to get annotation time down to 1 minute per frame, and if you outsourced the work to poor countries and paid people $2/hour, you could do it with $1.2 billion. A feasible amount for Waymo or Tesla, if spread out over multiple years. But on less optimistic assumptions it’s going to be tens of billions or hundreds of billions of dollars.

Incidentally, similar math applies to making an HD map of the entire United States, since it’s a similar task.
 
Last edited:
Yeah, that’s what was reported. If true, it shows that the author’s claim that lidar provides certainty with regard to obstacle detection is incorrect.
huh, if sensor provided correct info and was ignored, how it disproves the author's point?

8.2 million miles / 25 miles per hour = 328,000 hours, or 1.18 billion seconds

1.18 billion seconds * 30 frames per second = 35.4 billion frames

30 minutes to annotate each frame -> 17.7 billion hours of labour

5 minutes to annotate each frame -> 2.9 billion hours of labour

1 minute to annotate each frame -> 590 million hours of labour
This is a very strange calculation you have here. It's not like the roadways are static even for a minute, let alone days. There are all sorts of dynamic objects and even static objects change all the time.
Waymo: ~11 million miles
Tesla: ~2.8 billion HW2 miles

So, Tesla is working with ~250x more miles. If HW2 Teslas are only 1/125th as good as Waymos at capturing interesting data on a per mile basis, on an aggregate basis Tesla has 2x more interesting data from its HW2 fleet.
Those are different miles (Tesla's are more diverse), BUT we know they don't trigger all the time (waymo likely does not either?) And it's not really needed. We know that unknown stuff is unlikely to be caught by Tesla (simply because there's no good way to do it), but when you have a trained driver it's much easier. That is not to say Tesla does not have test cars on the roads with trained drivers. Those miles are a lot more valuable.

Based on what we have seen we can assume perhaps at most 20-30 snapshots per Tesla hw2+ car per week at most (covering no more than 10 seconds per snapshot)! Sure, a single snapshot could be more than one frame but they are still for close distances.
So with this in mind you can calculate a ceiling on number of actually captured miles for Tesla.
 
One framework for thinking about the potential value of HW2 fleet data is compiling massive libraries of training data for neural networks. This is mostly the way I’ve been thinking about it.

Another framework is to think about the development process as a feedback loop where Autopilot driving errors (flagged by disengagements, aborts, and crashes) lead engineers to identify a problem. That leaders the engineers to design a trigger to collect sensor data relevant to that problem.

A slide from an Andrej Karpathy talk:

JzOthbA.jpg


With Tesla’s approach, you can actually test lane keeping for 600 million miles, catch errors, and try to fix the failure modes that cause those errors. Then push your fix to ~150,000 cars, watch how the fix works for 100 million miles (driven in under 3 months), and iterate.
 
Last edited:
huh, if sensor provided correct info and was ignored, how it disproves the author's point?

If the sensor bounces a laser pulse off the object, but the software can’t reliably determine whether the object is an obstacle or not, lidar doesn’t solve the problem of obstacle detection.

Similarly, photons bounce off of objects into cameras, but that doesn’t mean the software can reliably determine whether there is an obstacle in the road.

The author’s argument was that lidar provides many nines of reliability for obstacle detection, but I don’t see what evidence there is to support that argument. It might be right for all I know, but what’s the evidence?

Mobileye also believes it’s possible to do full autonomy with just cameras in the near term, so Tesla is not alone in its thinking. The “everyone else is doing it” argument is only half-true.

This is a very strange calculation you have here. It's not like the roadways are static even for a minute, let alone days. There are all sorts of dynamic objects and even static objects change all the time.

Yes, but in terms of capturing images of many, many unique objects per semantic class for fixed objects like lane lines, cross walks, traffic signs, traffic lights, and so on, it would be very comprehensive. I think it’s reasonable to guess that you could capture images of 90%+ of all the unique objects in these classes in the entire United States.

The upper limit on the number of training examples for a neural network is the number of examples that exist in the world.
 
Last edited: