Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Does Tesla have a big math problem?

This site may earn commission on affiliate links.
I’ve been thinking. I wonder if Tesla really does have the correct hardware to do FSD.

Does as a 1280x960x36fps have enough resolution and speed to operate at 75mph? That’s 110 feet per second or 3 feet per frame.

For safety reasons, the car has to take action in less than one second so it has a limited number of frames in which to take action.

Less frames means more processing power and bandwidth required to get through all the neural net iterations.

So, while big frame per second numbers look good, is it really good enough?

Second question, is 1280x960 enough resolution? People see at significantly higher resolution and we can differentiate objects at further distances than a camera of this resolution. That means Tesla has less frames in which it even has a chance to identify a car coming at you in T intersection at 60-70mph.

At 80m advertised side camera range (not a long way), a car coming at 70mph will reach you in about 2.6 seconds. That means the car has to analyze the situation, react, and make it past the perpendicular opposing lanes in 93 camera frames. Think of all the times where you had to gun it so you weren’t stuck at a T intersection for minutes.

HW3 sounds like a big step along the way but I don’t get how the current camera setup can handle high speed differential situations in a variety of conditions when there are no stereo side cameras to assist in image processing.

Can someone tell me how I’m wrong?
 
To illustrate why I think it’s a problem I reference the study below.

Almost all people use over three second gaps when turning across oncoming traffic unless they’re driving aggressively, which Tesla does not currently do. This doesn’t even count the instances in which people knowingly creep into the lane to navigate that shorter gap.

https://escholarship.org/content/qt8455h5gq/qt8455h5gq.pdf
 

Attachments

  • 436FAD85-7187-4C76-9BE5-C7822B169294.png
    436FAD85-7187-4C76-9BE5-C7822B169294.png
    241.6 KB · Views: 336
  • Informative
Reactions: neroden
My back of the envelope calculation using the side camera and its effective range of 80 meters is that in order to cross two lanes of traffic with a model x you need to travel 36.6 feet.

An on comming car traveling at 75 miles per hour will be seen 2.4 seconds before it arrives. At maximum acceleration the model x could have traveled 78 feet. Or twice the needed distance. Or it could have made it by only accelerating at half max.
 
What do you mean by NN iterations?

He obviously doesn't understand how neural nets are used in autonomous driving systems. Which is common unless you're on the team doing actual ground-breaking work. The thought that those who designed the system didn't calculate the required resolution of the cameras is laughable. The peanut gallery loves to try to throw peanut shells but the people with their nose to the grindstones just ignore the uninitiated.
 
At 80m advertised side camera range (not a long way), a car coming at 70mph will reach you in about 2.6 seconds.
Can someone tell me how I’m wrong?

I don't think it is clear what exactly those ranges mean. Take a look at the (compressed) front dashcam (with a 150 m quoted range camera). You can easily identify vehicles at 450 meters. With multiple frames, moving vehicles seem detectable at 1000 m or more.

If we triple the 80 m for the side camera, you get 7.8 seconds. Plenty of time, right?
 
I’ve been thinking. I wonder if Tesla really does have the correct hardware to do FSD.

Does as a 1280x960x36fps have enough resolution and speed to operate at 75mph? That’s 110 feet per second or 3 feet per frame.

For safety reasons, the car has to take action in less than one second so it has a limited number of frames in which to take action.

Less frames means more processing power and bandwidth required to get through all the neural net iterations.

So, while big frame per second numbers look good, is it really good enough?

Second question, is 1280x960 enough resolution? People see at significantly higher resolution and we can differentiate objects at further distances than a camera of this resolution. That means Tesla has less frames in which it even has a chance to identify a car coming at you in T intersection at 60-70mph.

At 80m advertised side camera range (not a long way), a car coming at 70mph will reach you in about 2.6 seconds. That means the car has to analyze the situation, react, and make it past the perpendicular opposing lanes in 93 camera frames. Think of all the times where you had to gun it so you weren’t stuck at a T intersection for minutes.

HW3 sounds like a big step along the way but I don’t get how the current camera setup can handle high speed differential situations in a variety of conditions when there are no stereo side cameras to assist in image processing.

Can someone tell me how I’m wrong?
This is speculative FUD.
 
It's an interesting question.

At 80m with a 90 degree view you get about 0.1m per pixel. 160m 0.2 m per pixel 240m 0.3 m per pixel.

A model 3 is 1.85m wide so you would get about 18 pixels at 80m. But only 9 pixels at 160m away.

A motor cycle is about 0.8m so you might get 8 pixels at 80m.

But if you sit there long enough you start to get an idea of how the road bends. How many lanes it has along each section and knowing the numbers of lanes you can estimate the distance away a section of road is based up the expected number of pixels it is in width.

Then we can start to estimate what type of vehicle is moving along and how fast they are traveling.

So this question has convinced me that the cameras. Even single ones can collect enough info to make good decisions.

Thanks for possing it and making me think.
 
I trust the engineers at Tesla that they analyzed these issues long before they announced the HW2 hardware. They must be confident that the camera set up is good enough or they would not have used it. And they know the exact characteristics of the cameras and the software that we don't know. We are only making a "best guess" based on public information.

Now, there might be other issues with the HW2 hardware but something as basic as detecting a fast moving car, is probably not one of them.
 
I’ve been thinking. I wonder if Tesla really does have the correct hardware to do FSD.

Does as a 1280x960x36fps have enough resolution and speed to operate at 75mph? That’s 110 feet per second or 3 feet per frame.

Can someone tell me how I’m wrong?

Can you drive? If so, you pretty well just proved Tesla correct.

When you look at the physics of the eye and brain, you'll find that you really don't do much better than that. Increasing the resolution and scan rate is actually extremely detrimental to being able to process the data, because the number of calculations start to sky rocket.

Try getting a old VHS tape and watch someone driving, I suspect (except for wanting a wider field of view) that you will feel quite comfortable at 640x480.
Also, a large number of people actually don't have the vision to see 1280x960. But they are able to still be relatively safe on the road.

And by the way, the car's response times are hugely better than yours are. That's quite evident by the number of people who are sure that the car is about to do something wrong and take over. Reality is that the car was okay, it just didn't need as much time to recover.
 
There's definitely a math problem as far as the 80m is concerned. It's simply not far enough to avoid a crash at typical highway speeds when turning 90 degrees from a side road.

I posted the maths in another thread here:

Take the slowest FSD car. Model X 60D which can accelerate 0-100km/h in 6.2 seconds. That equates to acceleration of 4.5m/s²

I'm going to be kind and assume that the Model X can turn 90 degrees in 0 seconds. Something that is impossible...

But anyway...

A car on the road is travelling 100km/h which is 27.77m/s

Assuming the car is 80.1m away and can't be detected.

Solving for when the cars crash is pretty easy.

t x 27.77 - 80.1 = 1/2 (4.5) t²

That gives the quadratic function 2.25t² - 27.77t + 80.1 = 0.

The cars crash 4.595 seconds after the ModelX 60D decides it's all clear and turns..

That's assuming instant decision making and an instant 90 degree turn. So the reality is the cars crash earlier. Also note that this is 100km/h (62 mp/h), so if you're looking at 75 mp/h it's even worse...

So the side cameras have to either see much further than 80m, or FSD is never going to work on these roads (and will need to be limited to specific intersections and roads with lower limits). I also have no idea how they are going to deal with sun glare at sunrise/sunset with a single side camera. Maybe limit the hours the car can drive itself too...

I agree that the hardest case is probably motorcycles due to the small size and limited number of pixels to work with. Also potentially the worst outcome if the Tesla pulls out in front of one.
 
I think the question is better phrased more generally as " is a resolution of 1280x960 sufficient for any level of FSD?" Followed by "is 36 fps sufficient at any given resolution for reliable highway FSD?"

For the first, I'm highly confident that no, 1280x960 is not sufficient for either Level 4 or Level 5 autonomous driving because it is barely enough resolution to see what is just in front of you. In order to make safe lane changes and to consistently differentiate between background "noise" and near-term traffic, you need higher resolution. In fact, if you want to read traffic signs at any meaningful distance in front of you, you need higher resolution. This is likely one of the reasons why sign recognition hasn't been enabled yet - they cannot get it to work reliably because there is not enough resolution to consistently read signs until you are literally right upon them (which leaves no reaction time). I'd estimate that if you want confident L4 or L5 performance, a minimum resolution of 3840 x 2160 would be required to "zoom in" on far away signs, or even to read "dirty" signs that require significant image enhancement on the fly. I also have a nagging suspicion that false positive breaking events would go down as resolution increases as it would better enable the NN to differentiate what is currently causing "confusion".

As far as 36 fps, that is a very strange frame rate. I wonder if that is a limitation for dashcam / sentry capture to keep file sizes down to a rational value. Otherwise, it feels like you want frame rate as fast as allowable to improve the overall accuracy of the system. I'd think 60 fps to 120 fps would probably be ideal when combined with a resolution of 3840 x 2160.

So ultimately, I think HW 3.0 gets us to some confident L3 equivalent AutoPilot. There will need to be lots of manual override heuristics to create the illusion of better performance, and likely that will wind up being a mixed bag that limits approval for offering a robotaxi service or anything like it.

Hopefully, HW 4.0+ will invest heavily in camera resolution and frame rate processing.

That said, I am skeptical that Tesla will offer HW 3.0 owners upgrades to HW 4.0 as they will consider feature complete with aggressive heuristics sufficient.
 
  • Informative
  • Like
Reactions: Hrtme and neroden
I think the question is better phrased more generally as " is a resolution of 1280x960 sufficient for any level of FSD?" Followed by "is 36 fps sufficient at any given resolution for reliable highway FSD?"

For the first, I'm highly confident that no, 1280x960 is not sufficient for either Level 4 or Level 5 autonomous driving because it is barely enough resolution to see what is just in front of you. In order to make safe lane changes and to consistently differentiate between background "noise" and near-term traffic, you need higher resolution. In fact, if you want to read traffic signs at any meaningful distance in front of you, you need higher resolution. This is likely one of the reasons why sign recognition hasn't been enabled yet - they cannot get it to work reliably because there is not enough resolution to consistently read signs until you are literally right upon them (which leaves no reaction time). I'd estimate that if you want confident L4 or L5 performance, a minimum resolution of 3840 x 2160 would be required to "zoom in" on far away signs, or even to read "dirty" signs that require significant image enhancement on the fly. I also have a nagging suspicion that false positive breaking events would go down as resolution increases as it would better enable the NN to differentiate what is currently causing "confusion".

As far as 36 fps, that is a very strange frame rate. I wonder if that is a limitation for dashcam / sentry capture to keep file sizes down to a rational value. Otherwise, it feels like you want frame rate as fast as allowable to improve the overall accuracy of the system. I'd think 60 fps to 120 fps would probably be ideal when combined with a resolution of 3840 x 2160.

So ultimately, I think HW 3.0 gets us to some confident L3 equivalent AutoPilot. There will need to be lots of manual override heuristics to create the illusion of better performance, and likely that will wind up being a mixed bag that limits approval for offering a robotaxi service or anything like it.

Hopefully, HW 4.0+ will invest heavily in camera resolution and frame rate processing.

That said, I am skeptical that Tesla will offer HW 3.0 owners upgrades to HW 4.0 as they will consider feature complete with aggressive heuristics sufficient.

The problem with very higher resolutions is that it will require a lot more data processing. So it will be much more taxing on the computer. I am guessing that Tesla calculated that 1280x960 is the sweet spot that is good enough for FSD but also not too taxing on the AP2 computer.
 
  • Disagree
Reactions: Hrtme
There's definitely a math problem as far as the 80m is concerned. It's simply not far enough to avoid a crash at typical highway speeds when turning 90 degrees from a side road.

I posted the maths in another thread here:

Take the slowest FSD car. Model X 60D which can accelerate 0-100km/h in 6.2 seconds. That equates to acceleration of 4.5m/s²

I'm going to be kind and assume that the Model X can turn 90 degrees in 0 seconds. Something that is impossible...

But anyway...

A car on the road is travelling 100km/h which is 27.77m/s

Assuming the car is 80.1m away and can't be detected.

Solving for when the cars crash is pretty easy.

t x 27.77 - 80.1 = 1/2 (4.5) t²

That gives the quadratic function 2.25t² - 27.77t + 80.1 = 0.

The cars crash 4.595 seconds after the ModelX 60D decides it's all clear and turns..

That's assuming instant decision making and an instant 90 degree turn. So the reality is the cars crash earlier. Also note that this is 100km/h (62 mp/h), so if you're looking at 75 mp/h it's even worse...

So the side cameras have to either see much further than 80m, or FSD is never going to work on these roads (and will need to be limited to specific intersections and roads with lower limits). I also have no idea how they are going to deal with sun glare at sunrise/sunset with a single side camera. Maybe limit the hours the car can drive itself too...

I agree that the hardest case is probably motorcycles due to the small size and limited number of pixels to work with. Also potentially the worst outcome if the Tesla pulls out in front of one.


You are assuming the oncoming traffic is going to rear-end a car who pulls out and only makes it to 73 MPH before the 75MPH oncoming traffic catches up? While of course that will occasionally happen because human drivers suck, it would totally be the oncoming traffic at fault in the collision.

1 - I don't think I've ever seen a road with a 75 MPH speed limit that has people entering the road via a stop-sign. A car going 75 on a road with stop signs is probably going 15-20 MPH over the speed limit.

2 - If you turn onto the road and make it anywhere close to the speed limit you are not going to be rear ended and if you do it is the car in the rear's fault.

In a more realistic scenario, the speed limit is 55, oncoming traffic is going 60. As long as you make it to 50 before the other cars would catch up you are fine. That still might be hard for the cameras, but it is not nearly as hard as when you assume that the car has to worry about people going way over the speed limit and are also not paying any attention to merging traffic.
 
1 - I don't think I've ever seen a road with a 75 MPH speed limit that has people entering the road via a stop-sign. A car going 75 on a road with stop signs is probably going 15-20 MPH over the speed limit.
Huh? The maths I showed was for 62mp/h (100km/h). And I gave the Tesla impossibly good conditions, the ability to turn 90 degrees full throttle instantaneously (in 0 seconds).

All I'm trying to point out is that cars at speed travel large distances quickly and 80m was never enough from a theoretical standpoint.

2 - If you turn onto the road and make it anywhere close to the speed limit you are not going to be rear ended and if you do it is the car in the rear's fault.

On what planet? Are you honestly trying to say that if a car pulls out in front of oncoming traffic and it causes a crash then it is somehow the other cars fault!?

You can't rely on the other car to take evasive action to avoid a crash. Would you really want to be driven by a car that relies on the other party to jump on the brakes? What if they are looking down to adjust the radio, or temperature? It's only a couple of seconds required.
 
I honestly don’t see how Tesla gets to 5 on the current hardware. They need to be able to have ridiculous detail at far distances on side cameras to do a T turn and it appears I’m not the only one who thinks that.

That doesn’t make Tesla bad or evil, it’s just reality.

For those who think 70mph is an unrealistic scenario, I encourage you to do a right on red on CA-76 or having the ability to see an oncoming car potentially running a red light.
 
I honestly don’t see how Tesla gets to 5 on the current hardware. They need to be able to have ridiculous detail at far distances on side cameras to do a T turn and it appears I’m not the only one who thinks that.

"ridiculous detail"? The car needs to be able to detect moving objects [vehicles]. Stationary objects don't matter. You are stopped. If the car incorrectly decides that it is not clear, it checks again 1/36th of a second later. This decision can be biased heavily toward don't go. If an eagle or something is confused for a car, so what? It will be gone in a second or two. There is no "phantom braking" vs "collision" tradeoff being made. It is sit here safely mildly inconveniencing people vs collision.

We will see in the coming years what they are able to do with the sensors we have.

For those who think 70mph is an unrealistic scenario, I encourage you to do a right on red on CA-76 or having the ability to see an oncoming car potentially running a red light.

For a while, I lived in a small town where the only way out of town was to make a right or left from a stop sign onto 4 lane highway with a 65 mph speed limit. Crashes happened. In foggy conditions (and night), you pretty much have to hope people have their headlights on.
 
Last edited: