Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Waymo’s “commercial” ride-hailing service is... not yet what I hoped

This site may earn commission on affiliate links.
Well here's the thing. The reason eyeq3 implementations have sucked is because the development are outsourced to tier-1s who do the bare. For example delphi sources almost all of the adas features for almost every car company. The very few companies that don't outsource it and bring it all inhouse actually deliver great products (tesla with ap1 and Gm with supercruise).

For example GM has around 20 engineers working on supercruise, Tesla famously had 100+ engineers that worked on AP1.

Other car companies simply drop in the trash that Delphi give them.
Not only that, we know that mobileye eyeq4 is used in FSD cars as the main vision system so we know they are capable of much more.
So the logic that @strangecosmos uses to scoff mobileye tech just doesn't add up.

Sure we can’t just look at what some companies have done with MobilEye tech to assess that tech. We always need to look further. Anyone can use a good component poorly that’s just basic tech 101. Good chips are used in poor products all the time.

Autopilot 1 and GM Supercruise are great examples of what last generation EyeQ3 could already do when mated to a competent ADAS development. And now we are in the EyeQ4 generation on the chip end.
 
Question here, how many useful miles does Tesla collect from users? I don't think the cars have the bandwidth to stream all data constantly do they? Even if it was about using wifi when parked, it's 1Gb+ of data for every battery charge. I guess it is event based, and logs only the critical situations. But a lot of critical situations are not detected by the autopilot.
So how much of autopilot driven miles are actually recorded and useful?

Based on info provided by @verygreen. Its possibly about 0.1% of the miles you drive. A frame from each camera is about 100mb.
another thing i wanted to note is that its interesting that waymo has hundreds of millions of labeled pics of vehicles alone.

That means Waymo has amassed “hundreds of millions” of vehicle labels alone. To help put that in context, Waymo’s head of perception Arnoud estimated that a person labeling a car every second would take 20 years to reach 100 million. Operating every hour of every day of every week, and hitting 10 labels a second, it still takes Waymo’s machines four months to scroll through that entire dataset during its training process, Arnoud says.
 
  • Like
Reactions: acoste
A frame from each camera is about 100mb.
nah, a frame is ~2.5M. ~100Mb give you 10 frames at 1fps from every camera. It's a lot better when the frames are compressed.

another thing i wanted to note is that its interesting that waymo has hundreds of millions of labeled pics of vehicles alone
And that compares to other players how? Tesla does not make this info public and I am sure there are image databases that are well labeled that are up for sale as well.
 
Stating that Tesla might be ahead of the pack is not even funny.

I said that we don’t know which company has the neural networks with the highest accuracy on computer vision tasks relevant to vehicle autonomy. This includes, for example, neural networks running on GPU clusters in Tesla’s offices on Palo Alto that no one outside of Tesla have seen. If you don’t work for Tesla, how do you know whether these neural networks perform worse, the same, or better than neural networks used by Waymo — which, unless you work for Waymo, you also haven’t seen?

I don’t think it should be that controversial an idea that you don’t know how well a neural network performs if you haven’t measured its performance, and if you haven’t even seen it in action for so much as a single moment.

The reason I went through the trouble of citing the research from Facebook and Google was to show that neural network performance on image classification continues to increase with more training examples, even into the hundreds of millions and billions. Since HW2 Teslas have driven an estimated ~2.8 billion miles and seem to upload something like ~140 MB per day on average, if a single still image from one HW2 camera is 1.2 MB, then if ~95% of the data uploaded were still images, that would mean Tesla has uploaded around 10 billion images.

That would be ~3x larger than the 3.5 billion-image dataset used by Facebook in the research I cited, which is the largest database of training images I’ve heard of. Whereas the ImageNet challenge uses 1000 semantic classes, autonomous vehicles only need to classify something like 50-100 semantic classes. So the number of semantic classes is something like 1/10th to 1/20th for autonomous driving than for ImageNet. A 10 billion-image dataset, then, would have around 30x to 60x as many training images per semantic class as the Facebook dataset.

It seems to me that the metric that matters most for image classification is training images of unique objects per semantic class. I think Tesla by driving ~2.8 billion miles and taking 3-4 single-camera snapshots per mile on average has the potential to capture many more images of unique objects than Waymo by driving 11 million miles and recording every frame. Especially since something like 1/6th of these miles are in the same geofenced areas in the Phoenix suburbs.

This is a reason to think Tesla’s neural networks might have higher accuracy on image classification, although I don’t claim to know whether that’s the case. I can only speculate. I don’t actually know.

But I think if you want to think about which company might be ahead on image classification, you have to consider the factors that from research we know impact neural network performance. To dismiss well-founded first principles considerations about neural network performance out of hand is... unjustifiable.

To say that you know how well a neural network performs without ever observing or measuring its performance is to confuse knowledge and speculation. Knowledge requires clear evidence.

To rudely dismiss someone’s opinion without even attempting to cite reasoning or evidence is bad behaviour, not real technical discussion. It is possible — and necessary — to disagree on technical topics respectfully, and to cite reasoning and evidence to substantiate your argument.
 
Last edited:
This includes, for example, neural networks running on GPU clusters in Tesla’s offices in Palo Alto that no one outside of Tesla has seen.

I should add that this isn’t a hypothetical. Tesla’s Director of AI Andrej Karpathy recently said this:

“...my team trains all of the neural networks that analyze the images streaming in from all the cameras for the Autopilot. For example, these neural networks identify cars, lane lines, traffic signs and so on. The team is incredibly excited about the upcoming upgrade for the Autopilot computer which Pete briefly talked about.​

This upgrade allows us to not just run the current neural networks faster, but more importantly, it will allow us to deploy much larger, computationally more expensive networks to the fleet. The reason this is important is that, it is a common finding in the industry and that we see this as well, is that as you make the networks bigger by adding more neurons, the accuracy of all their predictions increases with the added capacity.​

So in other words, we are currently at a place where we trained large neural networks that work very well, but we are not able to deploy them to the fleet due to computational constraints. So, all of this will change with the next iteration of the hardware. And it's a massive step improvement in the compute capability. And the team is incredibly excited to get these networks out there.”​
 

Thanks for sharing this. Useful for thinking about why visual HD maps are important:

“While other sensors such as radar and LiDAR may provide redundancy for object detection – the camera is the only real-time sensor for driving path geometry and other static scene semantics (such as traffic signs, on-road markings, etc.). Therefore, for path sensing and foresight purposes, only a highly accurate map can serve as the source of redundancy.”​

Last night, I was trying to figure this out with regard to Mobileye’s approach: unless HD maps use human annotation, how do they provide redundancy, since the same neural networks are doing inference for HD mapping and real-time perception?

Other than Mobileye, I think most (all?) companies that make visual HD maps upload images, and then get humans to label them. The redundancy comes from the human labeler. Since Mobileye just uploads a few kilobytes of metadata about what the car thinks it sees, that redundancy isn’t there.

The best answer I could come up with to justify Mobileye’s approach is that if a vehicle makes a real-time perception error 1 in 20 times, then it can weigh that inference against the HD maps, which represent other vehicles coming to a different conclusion 19 out of 20 times. But I’m not sure if this makes sense is practice.

For example, what if the environment changes? Will the vehicle assume its real-time perception is wrong, and the (now outdated) HD maps are right? How do you deal with disagreements between real-time perception and HD maps (which, without annotation, are essentially non-real-time perception)?

This also got me thinking about lidar:

“While other sensors such as radar and LiDAR may provide redundancy for object detection – the camera is the only real-time sensor for driving path geometry and other static scene semantics (such as traffic signs, on-road markings, etc.).”​

If one of the hard parts of computer vision for vehicle autonomy is recognizing depthless features of the environment like painted lane lines and traffic signs, then this provides context to Elon’s comments about lidar and cameras. To get to full autonomy, you need the car to flawlessly perceive lane lines, traffic signs (e.g. stop signs), traffic lights, turn signals, crosswalks, painted arrows, and so on. You need advanced camera-based vision.

Once you get to that point with camera-based vision, your neural networks might be so good at object detection (e.g. vehicle, pedestrian, and cyclist detection) using camera input that you no longer need lidar to achieve human-level or superhuman performance.

I say “human-level” because even if object detection is only as good as the average human, autonomous cars will still be safer because their reaction time is faster. Human reaction time under ideal conditions is 200-300 milliseconds. Braking reaction time might be more like 530 milliseconds. For people age 56 and up, the same study found a reaction time of 730 milliseconds. The actual number might be 2 seconds+. In contrast, we already have AEB systems with reaction times below 200 milliseconds.

Factoring in unideal conditions — distracted, drowsy, or drunk driving — the average for humans is probably much worse.
 
Last edited:
The best answer I could come up with to justify Mobileye’s approach is that if a vehicle makes a real-time perception error 1 in 20 times, then it can weigh that inference against the HD maps, which represent other vehicles coming to a different conclusion 19 out of 20 times. But I’m not sure if this makes sense is practice.
Sounds sensible to me... I can imagine a traffic sign, a pot hole, or lane lines, that my car misses - or has low confidence on - because it’s snowing, it’s «pitch dark», there is dense fog or a strange shadow or glare or something else obscuring my camera.

A frequently updated (crowd sourced) HD map could help, no? Making my car self-drive more safely, and/or being more agile?

What if a bird pooped on my windshield and my wipers couldn’t get that stuff away?

I’m sure an HD map could provide much needed redundancy in many different scenarios like these, where you «loose» your primary sensor. If nothing else for just basic emergency action.

I imagine an HD map could also provide «foresight» in the sense that my car «knows» what’s 200 meters up ahead even though my vision/radar/lidar system can’t see it. So that could make a difference on e.g. how fast my car decides it’s safe to go.

I guess it’s basically down to the driving logic algorithms. How to handle sensor/input disagreements and confidence levels. I suppose this is where Kalman filtering weighs in.

Of course HD maps has its shortcomings just like any other sensor. The world actually changes, i.e. the maps become outdated and your cameras could be seeing just fine. So you can get a disagreement where the «cameras are right» instead of «wrong». Which is, in principle, the same issue as when your lidar, radar or other non-camera sensor inputs «bad» data.

Kalman filtering
 
  • Like
Reactions: strangecosmos
And of-course the usual fallacies that happen in all Tesla discussions rears its head up again in the previous posts.

Moving the goal post and McNamara fallacy (You pick one quantitative observations (or metric) that is easy to measure and use it to solely make decisions, while ignoring all others. The reason given is often that these other observations cannot be proven.)

Moving the goal post

How many times have we heard this before?
When AP2 release your model 3 will self deliver itself!
They just started collecting billions of data, give them time!
They already have self-driving software Didn't you see that one video?!
Wait till the cross country drive, elon will shock the world!
Andrej has only been there acouple months, wait till he has full control of the helm!
Wait till NOA gets here, mad max will be L3!
That's FAKE NEWS! Wait till HW3 gets here then they can use their REAL NN!

Also From: Don't compare demo fleet with internal research software against production AP software in consumers hands!
To: Don't compare production NN against internal research test NN because Tesla's internal NN has the best accuracy!


McNamara fallacy

Tesla has billions of images. With more data NN are more accurate, therefore Tesla must have the best NN accuracy. Since Tesla has the best NN internally then "Tesla has immense lead in self driving."


This of-course ignores the vast majority of observations we have seen.

  • We have seen the evolution of their NN from guys like @verygreen since it dropped in 2017 so we know how advanced it was at any stage. From when it sucked badly to when it became okay.
  • Engineers and employees that Lead AP development have dates in their profiles that correspond to what they worked on, including internal timeline. This include creating and getting the foundation software stack up and running by November 2016. Contrary to Elon's statements that EAP will be here in December. Also development timeline for things like the NN for all 8 cameras (late 2017), wipers NN, when ap3 hardware chip began development, when the new radar began development, when software for the new chip began development. We have all these timelines so we know exactly what they are working on internally which also supports what we later saw in production and continue to see.
  • current AP NN still lack dozens of detections (traffic light, etc).
  • current AP NN still contain vast amount of false positive and false negatives.
  • current AP still runs into cars and obstacles.
  • You don't need to crowd source images from hundreds of thousands of cars to do accurate object detection as proven by mobileye.
  • Your AP1 doesn't forget what a car, bike, etc looks like when you go from city to city, state to state, and country to country.
  • In the last 9 months of Waymo's 2017 disengagement report only 2 disengagements was because of perception problem.
  • https://www.dmv.ca.gov/portal/wcm/c...-97f6f24b23cc/Waymofull.pdf?MOD=AJPERES&CVID=
  • Waymo defines Perception Discrepancy as
  • Perception Discrepancy - In this type of event, a component of the vehicle’s perception system (e.g., camera, lidar, radar) fails to detect an object correctly. An example was the failure of our self-driving car to recognize that a “no right on red” lighted sign was activated. This sign is only active for a specific time period during the day. The driver disengaged to prevent the car from making a right on red even though there was no risk of collision.
  • https://www.dmv.ca.gov/portal/wcm/c...994c1e125c77/Waymo_supp.pdf?MOD=AJPERES&CVID=
  • Waymo removed the driver in some percentage of rides in phoenix so they must be confident enough in their perception to do that.
  • Elon will never withhold minor advancement or progress from being released.
  • Etc

In conclusion, a company could tomorrow release a L4 car that works on every paved road in the US, available to be purchased in the dealership. And the response from a hardcore Tesla fan would be; well Tesla could have an internal NN that works in Canada, Europe, etc, not just in the US, so we still don't know whose ahead. Trent will never admit Tesla is behind in anything, under no circumstance or evidence.
 
Last edited:
  • Like
Reactions: caligula666
And that compares to other players how? Tesla does not make this info public and I am sure there are image databases that are well labeled that are up for sale as well.

I actually agree with you. Getting data is quite easy. Its simply a logistic problem.
Hiring 100 people to drive around in multiple cities 8 hours a day for a month with only a snapshot being taken every 10 seconds will yield you 8,640,000 million images.

You can even partner with taxis, trucks, bus, and transportation businesses, etc.

Comma ai for example has over 165,000 hours of 30fps video from his users. If he only takes one frame every 10 seconds to make up his image database. He would have 59,400,000 million images.

That's how easy it is to get data. Then you apply augmentations on the dataset you want to train and your final dataset could reach in the hundreds of millions very easy.

Compared to the other topics which are being brought up here. Getting data is very easy. You can either collect it, buy it (raw or labeled), label it your or outsource the labeling.

real time HD map on the other hand is very hard. You have companies like Nvidia, HERE, Tom Tom, Bosh (radar map), Ushr, CivalMap, DeepMap, carmera, lvl5, etc trying to do it and failing because they don't have the algorithm to automatically add semantics to what their mapping system is seeing. Meaning they end up doing everything 100% manually.

Driverless cars: mapping the trouble ahead | Financial Times

I guess it’s basically down to the driving logic algorithms. How to handle sensor/input disagreements and confidence levels. I suppose this is where Kalman filtering weighs in.

The reason why i like Amnon is that he thinks outside the box and doesn't just follow what everyone is doing. Everyone is basically more or less following Waymo. For example instead of doing sensor fusion, he says its better to have a system that drives completely on vision only and completely on radar and lidar only. Which makes sense because if something happens to one sensor, you won't have a battle on what sensor is correct, you simply drive the car using the other sensor modality. So instead of it being "sensor/input disagreement" it will be two completely separate system disagreeing on the output, kinda like how it is on an airplane.
 
Last edited:
  • Helpful
Reactions: FloridaJohn
I can totally feel where @electronblue is coming from. Stating that Tesla might be ahead of the pack is not even funny.
I said that we don’t know which company has the neural networks with the highest accuracy on computer vision tasks relevant to vehicle autonomy.

Unfortunately that is not what you said @strangecosmos. You replied to saying generally Tesla could be ”far ahead” of Waymo and MobilEye on my comment of ”perception, mapping and other things” they have already implemented. Here is what you actually said and what I disagreed with you and what @caligula666 is referring to. My bolded emphasis added for clarity:
How far behind is Tesla with their perception, mapping and other things Waymo and MobilEye have already implemented ie how much of any perceived advantage is used to simply catch up?
I think this is unanswerable question. For all we know, Tesla could be far behind Waymo and Mobileye, or Tesla could be far ahead. I just don't see how we can tell either way.
I think this is completely detached from reality to suggest Tesla could be far ahead in perception and mapping. In my view it is belittling and unrealistic view of the work and the results shown by Waymo and MobilEye. I think this helps explain why I feel you veer into overt optimism on Tesla and overt pessimisim on the competition in some posts we talked about.
Basically you presented a position that the autonomy implemented by Waymo and MobilEye at this time could be far behind Tesla at this time. I disagreed with that possibility strongly. Then you shut down the conversation with curt reply without argumentation.
To rudely dismiss someone’s opinion without even attempting to cite reasoning or evidence is bad behaviour, not real technical discussion. It is possible — and necessary — to disagree on technical topics respectfully, and to cite reasoning and evidence to substantiate your argument.

I fear we also disagree on who rudely dismissed who without argumentation. I responded at length and had explained my position and argumentation in a multitude of posts and was not the one who shut the discussion down either. I also acknowledged the different merits of your theory as I acknowledged Tesla has some advantages that could allow them to catch up faster for example.
 
Last edited:
  • Like
Reactions: caligula666
instead of doing sensor fusion, he says its better to have a system that drives completely on vision only and completely on radar and lidar only. Which makes sense because if something happens to one sensor, you won't have a battle on what sensor is correct, you simply drive the car using the other sensor modality.
Brilliant. Did you know that a Tesla also has 12 sonars (establishing a protective cocoon)? So that’s a dozen redundant sensor modalities right there. Take that, schmobileye
 
  • Funny
Reactions: jimmy_d
Seriously though, sensor fusion is about more than redundancy, no? It’s also about information gain, compared to using each sensor individually. So with camera & radar fusion, you should get a more accurate reading of the traffic (esp. wrt. relative speeds and distances) than with camera only. Wouldn’t you agree?
 
  • Like
Reactions: jimmy_d
Seriously though, sensor fusion is about more than redundancy, no? It’s also about information gain, compared to using each sensor individually. So with camera & radar fusion, you should get a more accurate reading of the traffic (esp. wrt. relative speeds and distances) than with camera only. Wouldn’t you agree?

Conventionally yes. But its how you use the info gain that matters.
Amnon is saying if you fuse it all initially then you end up stuck with one sensing system and you would be forced to use all the information from all modality at all times. So if any of the sensor disagree at any moment then you start panicking.

But if you do the sensor fusion afterwards, you can selectively apply the information from specific sensors ONLY when you need it in certain situations and in certain angles.

For example if you are at a Michigan U turn with cars going 70 MPH and you are making a LEFT on the STOP sign (green/red path).
Your completely separate Vision system would then at that moment accept information from the right corner radar and maybe the right lidar to detect fast approaching cars.

Michigan%2Bleft.png


I think that's what Amnon is trying to get at. Which to me is genius.
 
Last edited:
  • Helpful
Reactions: lunitiks
@electronblue

Its quite clear to everyone on this forum that Trent @strangecosmos has a distorted reality view and will downplay anything non-Tesla and can't handle when anyone calls him out on his BS. I think even @S4WRXTTCS and @lunitiks(for the first time ever) can agree with me on this.

You see, I can admit i'm pro mobileye, but even i don't drink my own kool-aid, i just sell it *cough* I mean..I know waymo has a currently diminishing lead.

When you have a guy who says Elon releasing level 5 software in 2019 has the same probability of happening as Level 5 taking till 2025 should tell you everything that you need to know about them and their bias.

Trent loves screaming abuse but its just that he can't handle people who challenge his kool-aid stand license.

Really? Aren’t you the person who keeps being disingenuous by comparing demo videos to commercially available systems?
 
Once you get to that point with camera-based vision, your neural networks might be so good at object detection (e.g. vehicle, pedestrian, and cyclist detection) using camera input that you no longer need lidar to achieve human-level or superhuman performance.

I say “human-level” because even if object detection is only as good as the average human, autonomous cars will still be safer because their reaction time is faster. Human reaction time under ideal conditions is 200-300 milliseconds. Braking reaction time might be more like 530 milliseconds. For people age 56 and up, the same study found a reaction time of 730 milliseconds. The actual number might be 2 seconds+. In contrast, we already have AEB systems with reaction times below 200 milliseconds.

Factoring in unideal conditions — distracted, drowsy, or drunk driving — the average for humans is probably much worse.

Even at low speeds, 100 milliseconds (0.1 seconds) of reaction time can make a big difference. 10 metres per second is only 36 km/h (22 mp/h). So 100 milliseconds of reaction time = 1 metre of stopping distance. That can be the difference between hitting someone or stopping well clear of them.

25 metres per second is 90 km/h (56 mp/h). 100 ms of reaction time = 2.5 metres of stopping distance. 200 ms of reaction time = 5 metres of stopping distance. 1 second of reaction time = 25 m of stopping distance.