because the sensor could still be ignored? I think that's what happened there?
Yeah, that’s what
was reported. If true, it shows that the author’s claim that lidar provides certainty with regard to obstacle detection is incorrect.
Depending on complexity and quality of the labeling might be quite a skilled task and as such priced accordingly too (I've no idea either way, the "highlight all cars in this frame" type is most likely low skill and relatively cheap)
Yeah, I agree. Semantic segmentation labelling and bounding boxes labelling doesn’t seem more complex than paralegal work. It seems like you are fundamentally just drawing boxes and labelling pixels in an image: road, sidewalk, car, pedestrian, etc. I could be wrong, but it seems that way to me.
I have no idea what kind of labelling might be used for behaviour prediction, or for path planning. Or how complex it is.
True. So if you have a (trained) person driving and press a button any time something strange happens (Even better if there are multiple buttons for different strange things) vs car randomly or semi-randomly deciding to take fixed-length fragments of video and pictures, which one would you think result in better quality per mile driven and quality per frame data?
Waymo: ~11 million miles
Tesla: ~2.8 billion HW2 miles
So, Tesla is working with ~250x more miles. If HW2 Teslas are only 1/125th as good as Waymos at capturing interesting data on a per mile basis, on an aggregate basis Tesla has 2x more interesting data from its HW2 fleet.
For some purposes, I don’t think the data has to be particularly strange or interesting. You just need a lot of images of unique objects in various semantic classes, e.g. cars, semi trucks, pedestrians, crosswalks, stop signs.
I wonder if Waymo can use Google Street View images for this purpose. I don’t know whether it would need to or prefer to use images captured by the same camera configuration as is used in Waymo vehicles. Apparently self-driving cars can be sensitive to that; Cruise reportedly was having trouble just switching from one version of is test vehicle to the next.
Although now that I know Street View was compiled with 10 million miles of driving, I’m rethinking the ceiling on image collection...
Apparently there are
4.1 million miles of roadways in the United States. So you could drive every roadway in both directions in 8.2 million miles, and it would take no more than 32.8 million miles (4.1 * 8) to drive every lane. With 98.4 million miles, you could drive every lane at least three times.
For the U.S., the ceiling for static features of the environment should be 100 million miles. At an average of 25 mph, driving 100 million miles would take 4 million hours. If you paid people $25/hour to do this, it would cost $100 million. For $100 million, you could capture comprehensive images of every roadway in the United States. Hm.
To do 8.2 million miles, it would cost $8.2 million. In theory, that would be enough to capture one or two images of every unique fixed, static object on U.S. roadways.
I wonder if any company like Waymo, Tesla, Mobileye, etc. has already done this? I guess it might be pointless because the volume might be too high.
8.2 million miles / 25 miles per hour = 328,000 hours, or 1.18 billion seconds
1.18 billion seconds * 30 frames per second = 35.4 billion frames
30 minutes to annotate each frame -> 17.7 billion hours of labour
5 minutes to annotate each frame -> 2.9 billion hours of labour
1 minute to annotate each frame -> 590 million hours of labour
If you were able to get annotation time down to 1 minute per frame, and if you outsourced the work to poor countries and paid people $2/hour, you could do it with $1.2 billion. A feasible amount for Waymo or Tesla, if spread out over multiple years. But on less optimistic assumptions it’s going to be tens of billions or hundreds of billions of dollars.
Incidentally, similar math applies to making an HD map of the entire United States, since it’s a similar task.