Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Firmware 9 in August will start rolling out full self-driving features!!!

This site may earn commission on affiliate links.
Do we know anything about the connections and bandwidth of the wiring harnesses to the cameras? Being able to swap the camera for a higher resolution one with more dynamic range doesn't seem particularly difficult. Especially if they designed it with that in mind. Even if they only replaced the center trifocal cluster, they could be able to recognize objects significantly further out.

I'm not saying HW3 will need this, but it doesn't seem like it would be significantly more difficult/expensive than swapping out the processor.
I can't imagine the dynamic range is that different between the different cameras, otherwise they'd need a different neural network?
 
I can't imagine the dynamic range is that different between the different cameras, otherwise they'd need a different neural network?

I'm wondering about camera replacements in general. Not just for 2.0 to 2.5, but even from 2.5 to a hypothetical 3.0.

I don't think the increased dynamic range should be a problem for the neural network. It should be able to accommodate a wide variety of lighting conditions and a higher dynamic range should just allow it to see more in the shadows/highlights. A higher resolution camera should also be possible on the same system because the same network would have to identify objects of varying sizes in the frame. It would just be able to make sense of it sooner.

Other benefits might include better low-light sensitivity or higher frame rates to reduce motion blur of high-speed objects.
 
My point being, there was probably less than 1% chance Tesla or SpaceX would survive. When you're repeatedly told something is impossible, only to survive or thrive, that hubris can morph 20% into 100%. I don't think it's bordering on criminal if you have experience with throwing more people/brainpower, money, and time into a problem to solve it more effectively than most other companies.
That sounds like a gambling problem. Just because you won before doesn't make it legal for you to offer investment fund which you say will double your money by next day, if you know the chances of doubling that money are 20%. Imagine someone offering investment which will double your money in 1 day, that someone then bets it all twice on a roulette color. Mathematically there is a 23% change of quadrupling the money in which case the investors get paid and and fund manager pockets an equal share of the profit. 77% you lose it all, but the fund manager believed he could win because last week he bet on the roulette and won three times in a row, so twice in a row should be a cakewalk - does it make it legal so sell such investment telling people you'll double their money without disclosing the fact that it's not guaranteed?

All that said, I think we can all agree communications seems to be a weak link with Tesla. Clear policies on prepayment of features and refund conditions should be part of the buying experience.
I absolutely agree, except the problem then becomes, what if you bought the car solely for that feature, do you get a refund for the whole car? For example, someone who upgraded a one year old AP1 to AP2 in order to to get EAP and FSD, and after the 3 years is trading it in without getting EAP, do they get a refund for the upgrade since they could have just been driving the old AP1 car instead? Maybe instead of refunds, Tesla should just sell things as they are, phrase it like a kickstarter campaign telling people that they are giving Tesla money to experiment and if it works out they'll get a reward in a form of a feature. Of course, I suspect they'd sell a lot less this way (I wouldn't have bough a 20% chance my car will make 691hp instead of 400hp for $25K for example).

Tesla has been given a lot of slack by early adopters because they've consistently made up for these shortcomings in other areas (e.g. uncorking or unexpected firmware features). Early adopters are also more likely to want to see the company thrive. The backlash will only get worse as we see the release of the $35k Model 3.
Yea, I was one of those early adopters who gave them slack. I don't plan to buy the $35K Model 3, but am also no longer onboard with giving them slack. I feel they screwed me on the P85D power, AP1 also doesn't even provide reliable blind spot detection which was an official feature of AP1 before Tesla erased it, and I still hate the v9 "cheap tablet like" software with its tiny buttons and having to click through ever-more menu levels while driving to do things like change suspension level, turn on headlights, or make a phone call, not being able to see split apps (or see apps at the top), browser being completely dead, SDK never materializing so I could integrate my radar detector and other functions like a few physical buttons. For completeness, I did get one thing from Tesla that was a surprise bonus I did not pay for - they uncorked our 2017 S75D acceleration for free, even though they didn't have to. Service center has always gone above and beyond too, if it wasn't for that I would note have been a repeat customer. That said, unless something changes drastically, I will not be buying more Teslas. Since I like Tesla, I hope things do change.
 
Last edited:
I'm wondering about camera replacements in general. Not just for 2.0 to 2.5, but even from 2.5 to a hypothetical 3.0.

I don't think the increased dynamic range should be a problem for the neural network. It should be able to accommodate a wide variety of lighting conditions and a higher dynamic range should just allow it to see more in the shadows/highlights. A higher resolution camera should also be possible on the same system because the same network would have to identify objects of varying sizes in the frame. It would just be able to make sense of it sooner.

Other benefits might include better low-light sensitivity or higher frame rates to reduce motion blur of high-speed objects.
I don't think that it's at all obvious that changing the input to a neural network will work all that easily. There's a large body of research around fooling neural networks with pretty small changes to images that in most cases humans don't notice, e.g. mistaking a turtle for a rifle.
 
The AP2/2.5 radar resolution is notoriously poor in that it cannot differentiate a stopped fire-truck 100m ahead in your motorway lane from the iron railings along the roadside, thus current AP simply ignores the input and without qualms ploughs into the former if the driver in just that moment happens to be fighting with the shitty USB playlists or whatever on the touchscreen.

IMO, the only reason to use RADAR at all is for parking, and even then, it's just a fallback. For the most part, you should be able to get all the data you need from combining pairs of cameras with depth mapping.


From what we have seen in Verygreen's extracted video feeds, the resolution and automatic exposure control both seem to be pretty sub par and it is notable that MobilEye is going with much better cameras for its EyeQ4 systems AFAIK in production 2019 Audi A8:
Sony Releases the Industry's Highest Resolution 7.42 Effective Megapixel Stacked CMOS Image Sensor for Automotive Cameras

Just because they are going with them doesn't mean that higher resolution is actually needed. IMO, it is at least as likely that Sony gave MobilEye a good deal. :)

Adding 6x more pixels to represent the same visual angle doesn't necessarily give you any more useful information than you already have, but it does mean you need (at least) 6x the processing power to process it. The folks analyzing HW3 have concluded that it will do somewhere in the neighborhood of four times as many operations per second as what MobileEye plans to deliver in EyeQ5 in 2020. That means the MobileEye design can process one frame of data for every 24 frames that the Tesla design can process. The Tesla design is said to be able to handle over 200 FPS per camera with HW3. That means the MobileEye design, assuming similarly complex self-driving software, would only be able to process 8–10 FPS at full resolution, which would be wholly inadequate for real-world self driving.

Assuming those numbers are correct, then IMO, the only way that a 7.2 MP camera has a prayer of being usable for self-driving with their hardware would be if they sub-sampled it down to a much lower resolution for processing, and only used the 7.2 MP data in dashcam mode. Otherwise, it is simply way too much data to process in real time. (I suppose pedantically, they could use subsampled images for object detection, and higher resolution data for analyzing certain objects of interest, but even then, that's a lot of data.)
 
I don't think that it's at all obvious that changing the input to a neural network will work all that easily. There's a large body of research around fooling neural networks with pretty small changes to images that in most cases humans don't notice, e.g. mistaking a turtle for a rifle.

I think the tweaking of the images to fool the NN has to be somewhat contrived. I don't think random noise or other normal variances causes the problem.

I could be wrong though, if someone else has more info on this.
 
It's my understanding that the cameras' output is downsampled anyway to be used by the neural network and not used at its full resolution. In which case more pixels won't help except to perhaps decrease noise due to downsampling.

I never understood why AP needs three different zoom level cameras pointing forward.

You could accomplish the same thing with a sensor that has way more pixels and uses the widest lens.
If the NN wants to concentrate on a narrow field of view, just use the middle pixels of the high-res sensor.
 
Adding 6x more pixels to represent the same visual angle doesn't necessarily give you any more useful information than you already have, but it does mean you need (at least) 6x the processing power to process it. The folks analyzing HW3 have concluded that it will do somewhere in the neighborhood of four times as many operations per second as what MobileEye plans to deliver in EyeQ5 in 2020. That means the MobileEye design can process one frame of data for every 24 frames that the Tesla design can process. The Tesla design is said to be able to handle over 200 FPS per camera with HW3. That means the MobileEye design, assuming similarly complex self-driving software, would only be able to process 8–10 FPS at full resolution, which would be wholly inadequate for real-world self driving.

Assuming those numbers are correct, then IMO, the only way that a 7.2 MP camera has a prayer of being usable for self-driving with their hardware would be if they sub-sampled it down to a much lower resolution for processing, and only used the 7.2 MP data in dashcam mode. Otherwise, it is simply way too much data to process in real time. (I suppose pedantically, they could use subsampled images for object detection, and higher resolution data for analyzing certain objects of interest, but even then, that's a lot of data.)

I think what you forget is that MobilEye’s vision networks have so far been much more efficient than Tesla’s perhaps due to differences in their general approach. While I have no idea if and how MobilEye plans to make use of their added resolution I think simply comparing fps based on operations per second will not yield you a correct comparison.

It is amazing how well even the old EyeQ3 does for its limited processing power for example compared to AP2 including traffic sign detection. It seems MobilEye is using a combination of techniques to reach this efficiency while Tesla is brute-forcing it with deep-learned NNs that seem to require considerably more power for similar results.
 
  • Like
Reactions: OPRCE
I never understood why AP needs three different zoom level cameras pointing forward.

You could accomplish the same thing with a sensor that has way more pixels and uses the widest lens.
If the NN wants to concentrate on a narrow field of view, just use the middle pixels of the high-res sensor.

Optical zoom via lenses is much less noisy than the digital variety, at least on my phone. Not sure though if that is the real or only reason to go trifocal.
 
IMO, the only reason to use RADAR at all is for parking, and even then, it's just a fallback. For the most part, you should be able to get all the data you need from combining pairs of cameras with depth mapping.

Other manufacturers, e.g. 2019 Audi A8, have it in all 4 corners but the rear ones would certainly be useful for reliably detecting traffic overtaking @300kmh on the German Autobahn in rain and spray. If rumours around here are correct the Model S facelift with AP2 included rear corner radars in the original design but these were sadly yanked after parting ways with MobilEye.

"The folks analyzing HW3 have concluded that it will do somewhere in the neighborhood of four times as many operations per second as what MobileEye plans to deliver in EyeQ5 in 2020."

Some folks may well have reached that conclusion based on sparse raw numbers and optimistic supposition but I find it impossible to believe Tesla's HW3 will actually outperform EyeQ5 at all. If it turns out to do equally well I will be delighted, though only time will tell.
 
I never understood why AP needs three different zoom level cameras pointing forward.


Bugs. One camera can get occluded pretty easily. Two cameras can get halfway occluded, which means that the car would lose the stereo vision that is necessary for proper object distance detection. So three is actually the minimum number of cameras that can safely be used for a self-driving system.
 
  • Disagree
Reactions: whitex
"The folks analyzing HW3 have concluded that it will do somewhere in the neighborhood of four times as many operations per second as what MobileEye plans to deliver in EyeQ5 in 2020."

Some folks may well have reached that conclusion based on sparse raw numbers and optimistic supposition but I find it impossible to believe Tesla's HW3 will actually outperform EyeQ5 at all. If it turns out to do equally well I will be delighted, though only time will tell.

If you mean hardware performance, I would be shocked if Tesla's HW3 weren't dramatically faster. MobileEye's EyeQ5 is only about a fourth as fast as Google's third-generation TPUs, and a sixth as fast as NVIDIA's top-end GPUs from last year. MobileEye is way behind the curve speed-wise.

This is not to say that they won't be able to pull off better overall system performance through a more optimal neural network architecture, of course.

I think what you forget is that MobilEye’s vision networks have so far been much more efficient than Tesla’s perhaps due to differences in their general approach. While I have no idea if and how MobilEye plans to make use of their added resolution I think simply comparing fps based on operations per second will not yield you a correct comparison.

It is amazing how well even the old EyeQ3 does for its limited processing power for example compared to AP2 including traffic sign detection. It seems MobilEye is using a combination of techniques to reach this efficiency while Tesla is brute-forcing it with deep-learned NNs that seem to require considerably more power for similar results.

That's certainly possible. Then again, there's also a very real possibility that their approach will result in an infinitely growing pile of edge cases that eventually result in scrapping the whole design.

Consider, for example, the problem of object detection in the road. You either create a more general NN that recognizes what is and is not road or you create a set of smaller NNs that recognize cars, people, and dogs. What happens when these two approaches encounter a gorilla? The more general NN says, "That's not the road". The more complex set of independent NNs says, "That's not a car, a person, or a dog."
 
Last edited:
  • Like
Reactions: OPRCE
Consider, for example, the problem of object detection in the road. You either create a more general NN that recognizes what is and is not road or you create a set of smaller NNs that recognize cars, people, and dogs. What happens when these two approaches encounter a gorilla? The more general NN says, "That's not the road". The more complex set of independent NNs says, "That's not a car, a person, or a dog."

It is my understanding that this is already solved by EyeQ4 and in addition also by use of Lidar as redundancy so power in vision is probably not an issue. MobilEye simply are very efficient in vision. Power demands in driving policy seems much more fruitful debate as it is a more of an open question even for MobilEye.
 
  • Like
Reactions: OPRCE
Bugs. One camera can get occluded pretty easily. Two cameras can get halfway occluded, which means that the car would lose the stereo vision that is necessary for proper object distance detection. So three is actually the minimum number of cameras that can safely be used for a self-driving system.

You don't need stereo vision for that. The traditional way is to use SLAM.
  • Main Forward Camera: Max distance 150m with 50° field of view
  • Narrow Forward Camera: Max distance 250m with 35° field of view
  • Wide Forward Camera: Max distance 60m with 150° field of view

tesla-second-gen-autopilot-sensors-suite.png
 
You don't need stereo vision for that. The traditional way is to use SLAM.

Traditional? For self-driving? AFAIK, as of two years ago, monocular SLAM in self-driving cars was in the early research stages. I suspect we'll see it in ten years, at the earliest.

Binocular imaging (from two cameras) can get you an instant depth map with remarkably little effort. Monocular SLAM (from a single camera), because it requires inter-frame comparison, is significantly more complex. You have to start with motion vector compensation and rotation compensation to adjust for bumps in the road between two frames of video taken a fraction of a second apart and then apply a much more complex algorithm for depth mapping.

The approach is attractive because of the simpler camera hardware requirements, but in practice, it has higher latency, and requires a lot more CPU power, several times as much framebuffer capacity, etc. It is orders of magnitude more complex, and when you're dealing with a neural network that is already exceeding the limits of hardware just to stay within the lanes, asking it to do inter-frame depth mapping on top of that would be nuts.
 
  • Like
Reactions: OPRCE
Traditional? For self-driving? AFAIK, as of two years ago, monocular SLAM in self-driving cars was in the early research stages. I suspect we'll see it in ten years, at the earliest.

SLAM is a technique that stems from the mid-80s and early 90s. It's well known for "ages" in computer vision.
Also there's SFM and bundle adjustment. And deep learning of course.

Binocular imaging (from two cameras) can get you an instant depth map with remarkably little effort. Monocular SLAM (from a single camera), because it requires inter-frame comparison, is significantly more complex. You have to start with motion vector compensation and rotation compensation to adjust for bumps in the road between two frames of video taken a fraction of a second apart and then apply a much more complex algorithm for depth mapping.
They have 3 different front cameras to begin with to cope with different focal length etc. Then they would have to double the amount of cameras. You have a min distance depending on what camera setup you have with stereo cams.

The approach is attractive because of the simpler camera hardware requirements, but in practice, it has higher latency, and requires a lot more CPU power, several times as much framebuffer capacity, etc. It is orders of magnitude more complex, and when you're dealing with a neural network that is already exceeding the limits of hardware just to stay within the lanes, asking it to do inter-frame depth mapping on top of that would be nuts.

If the approach was attractive you would have seen it everywhere. Converting stereo vision to point cloud is not trivial. You have FPGAs delivering 10-40 FPS: Karmin2 – Nerian's 3D Stereo Camera. Because the cameras have to be synchronized as well. It will be accurate within a limited distance, sure if you need all the 3D information at that exact frame, which is important for stills and shooting movies. But not as important for computer vision since you have a bit give on the frame rates etc to react to something. It takes time building the 3D models anyway.
 
Just because they are going with them doesn't mean that higher resolution is actually needed. IMO, it is at least as likely that Sony gave MobilEye a good deal. :)

Adding 6x more pixels to represent the same visual angle doesn't necessarily give you any more useful information than you already have, but it does mean you need (at least) 6x the processing power to process it.

When you are driving 80/90 MPH and you are trying to see a small object like a tire on the road, 150-200m away it 100% matters. You need to know precisely what it is so you can accurately respond to it. Its vital that you don't over react or under react. The difference between seeing a blur versus seeing an actual clear image is the reduction of false positive and false negative. A low res camera will be a false positive galore at very far distance. This is why higher res camera is desired.

image02.jpg


image04.jpg


image03.jpg


The folks analyzing HW3 have concluded that it will do somewhere in the neighborhood of four times as many operations per second as what MobileEye plans to deliver in EyeQ5 in 2020. That means the MobileEye design can process one frame of data for every 24 frames that the Tesla design can process. The Tesla design is said to be able to handle over 200 FPS per camera with HW3. That means the MobileEye design, assuming similarly complex self-driving software, would only be able to process 8–10 FPS at full resolution, which would be wholly inadequate for real-world self driving.

When you say the folks, are you quoting me? Did you see the part where the EYEQ4 which has 2.5 TOPs at 3 watts supports level 3 and level 4 self driving while Tesla struggles to get sign recognition, lane keeping and adaptive cruise control working on a 10 TOPS chip?

They still can't match a 6 years old eyeq3 chip in amount of features.
EyeQ4 on the other hand is simply not a fair comparison because Its 1000x more complex than the networks running on AP2 right now, yet it has 4x less power.

EyeQ4 can process up to 12 camera inputs at 30 frames per sec.
That shows you technological ingenious. Here's what Mobileye says about their chip tech.

The Mobileye-Intel approach is contrary to industry common practice in the field, which is to over-subscribe the computing needs during R&D (i.e., “give me infinite computing power for development”) and then later try to optimize to reduce costs and power consumption. We, on the other hand, are executing a more effective strategy by under-subscribing the computing needs so that we maintain our focus on developing the most efficient algorithms for the sensing state, driving policy and vehicle control.

Mobileye's fleet for example runs on four EyeQ4. Each chip is doing something different.

  1. So they use one eyeq4 for sensing.
  2. Another eyeq4 for driving policy.
  3. Another eyeq4 for their RSS which approves or rejects outputs from the driving policy eyeq4.
  4. Then another eyeq4 for fail operation backup.

By summer of this year when EyeQ5 (10 watts) is ready. They will continue to separated for safety reasons even when they have 10x computing power in EyeQ5. So they will use 3x EyeQ5 and an optional Fail operation board with a single EyeQ5 but this fail operation chip will run everything (sensing, driving policy, rss and vehicle control).

The Eyeq5 will be an overkill with ridiculous low utilization. Amnon once said, one of the double lane merge algorithm they were testing took 1% of the EyeQ4 processing ability. That's how efficient their algorithms and chips are.
 
Last edited:
When you are driving 80/90 MPH and you are trying to see a small object like a tire on the road, 150-200m away it 100% matters.
Informative post
But for cameras, speed doesn’t decrease resolution. Even a camera with the best resolution put to the test on a bumpy road going fast will suffer from the same problem the same amount. That has more to do with the light entering the camera. More frames a second would fix that, not more resolution
 
SLAM is a technique that stems from the mid-80s and early 90s. It's well known for "ages" in computer vision.

The technique, yes. But real-time monocular SLAM has never been used in anything even approaching self-driving cars by anyone. The technique is still very experimental for that purpose, and it also relies on things like having a huge database of recognized objects to guess how rigid objects are, which makes it presumptively unsafe in the context of self-driving cars that have to make the right decision every time, instantly.

Doing it that way would be beyond crazy.


They have 3 different front cameras to begin with to cope with different focal length etc. Then they would have to double the amount of cameras.

As a counterexample, consider the dual-camera setup on an iPhone. That uses two cameras with different focal length to generate a depth map every time you take a picture. There's no requirement that stereoscopic cameras have similar focal lengths.


If the approach was attractive you would have seen it everywhere. Converting stereo vision to point cloud is not trivial. You have FPGAs delivering 10-40 FPS: Karmin2 – Nerian's 3D Stereo Camera. Because the cameras have to be synchronized as well. It will be accurate within a limited distance, sure if you need all the 3D information at that exact frame, which is important for stills and shooting movies. But not as important for computer vision since you have a bit give on the frame rates etc to react to something. It takes time building the 3D models anyway.

It is, indeed, computationally expensive, but remember that you don't need a complete, precise map of the entire scene — only a rough approximation, and only in areas of interest (read "things that are not obviously the road surface"). Besides, there's not necessarily a need to create a point cloud or build a model. The whole point of using a massive neural network, as I understand it, is that it can learn that the relationship between where objects appear within different cameras' views gives an indication of distance, without needing to actually compute the depth for each pixel (which would be way more detail than is needed anyway).

If that's wrong, then I'm pretty sure Tesla's FSD plan is pretty much doomed.