Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Rumor: HW4 can support up to 13 cameras

This site may earn commission on affiliate links.
FSD 10.9 release notes:
"Upgraded generalized static object network to use 10-bit photon count streams rather than 8-bit ISP tonemapped images by adding 10-bit inference support in the AI compiler stack. Improved overall recall by 3.9% and precision by 1.7%."
 
HDMI used to be uncompressed. It has now introduced compression (called DSC):
I get it. But you made it sound like the fact that HDMI 2.1 allows compression is a problem. It’s not, its optional and you have 48Gb bandwidth available uncompressed.

DSC is optional, and where supported is automatic, but only as required — if a system can send a given format uncompressed, it will. The caveat for interoperability is that all devices in line from Source to Sink need to support DSC in order for it to work.
 
Last edited:
FSD 10.9 release notes:
"Upgraded generalized static object network to use 10-bit photon count streams rather than 8-bit ISP tonemapped images by adding 10-bit inference support in the AI compiler stack. Improved overall recall by 3.9% and precision by 1.7%."
Which is quite different from a 30-bit RGB (or similar) stream and arguments about colorimetry, frame rate and resolution. The raw data has wide dynamic range precisely because it is raw data, and hence the need for more bits in the luminance channel.

In fact, if you stop and think about it, the NN has very little need for color data at all .. basically traffic signals, car stop lights, and a few lane markings/signs. Pretty much all you need is a few bits saying "this is red" etc .. you certainly dont need to worry about "this is a slightly more russety red than the pixel next to it" that is needed for the aesthetic appeal of photos. All you really need for the NNs is a few dozen or so color "bands" and detailed luminance data.
 
Which is quite different from a 30-bit RGB (or similar) stream and arguments about colorimetry, frame rate and resolution. The raw data has wide dynamic range precisely because it is raw data, and hence the need for more bits in the luminance channel.

In fact, if you stop and think about it, the NN has very little need for color data at all .. basically traffic signals, car stop lights, and a few lane markings/signs. Pretty much all you need is a few bits saying "this is red" etc .. you certainly dont need to worry about "this is a slightly more russety red than the pixel next to it" that is needed for the aesthetic appeal of photos. All you really need for the NNs is a few dozen or so color "bands" and detailed luminance data.
There's a problem with binning: If you make colors into a few bands, then adjacent color values in different bands are marked as different while far colors on the ends of a band are marked as the same. A slight shift turns into a big jump, for instance a color shift of two can take points from the ends of the same bin (similar) to two bins apart (significantly different).

Original AP2 camera sensor setup was 3x grey/ clear + red, then they added blue R,C,C,B. So half the pixels are luminance (theoretically doubling the stated resolution or sensitivity for greyscale).
 
There's a problem with binning: If you make colors into a few bands, then adjacent color values in different bands are marked as different while far colors on the ends of a band are marked as the same. A slight shift turns into a big jump, for instance a color shift of two can take points from the ends of the same bin (similar) to two bins apart (significantly different).
But ultimately it IS binned .. the car has to decide if its looking at a red, yellow or green traffic signal. Same for other color driven aspects of driving. As for big jumps .. that's the point, really. The car really wants to know "what is the probability that this is a red light vs a green light" etc, which is essentially weighted binning.
 
But ultimately it IS binned .. the car has to decide if its looking at a red, yellow or green traffic signal. Same for other color driven aspects of driving. As for big jumps .. that's the point, really. The car really wants to know "what is the probability that this is a red light vs a green light" etc, which is essentially weighted binning.
Sure, if all you ever use color for is three versions of lights.
 
FJFg-8IXsAEKOcL


These have been driving around Palo Alto.

Appears to be OnSemi Eval Kit
MARS-configuration-features-new.png


Taking a wild guess the sensor being tested is the second generation to the HW3 AR0136AT cameras which would be
AR0233AT.
 
As I said, color is basically used for traffic signals, some lane markings and signage and (possibly in the future) emergency vehicle identification. In fact, there is a good reason you DONT need much more color info to drive .. color-blind humans!
If the color-blind argument is used, then traffic light color might also not be needed, as there are some that cannot see those colors. Instead, it may be the position of the light. The top light is always red (in the case of sideways lights, the left is red), etc.
 
If the color-blind argument is used, then traffic light color might also not be needed, as there are some that cannot see those colors. Instead, it may be the position of the light. The top light is always red (in the case of sideways lights, the left is red), etc.
HW4 will almost certainly use RCCB pixels. There is just so much gained in terms of extra luma information at zero real life cost of losing some chroma information. Only reason they may go RGB is if they are using virtual side mirrors and want the colourspace to be a bit more lifelike.
 
  • Like
Reactions: drtimhill
HW4 will almost certainly use RCCB pixels. There is just so much gained in terms of extra luma information at zero real life cost of losing some chroma information. Only reason they may go RGB is if they are using virtual side mirrors and want the colourspace to be a bit more lifelike.
With 3 front cameras they can conceivably do both color arrays. Since mobileye didn't find much value in the medium wide camera and eliminated it , I suspect that would be the best candidate for full color array. Of course it can go full b/w for maximum night vision also.
 
I should note this is not how raw video outputs from the image sensor. A 4K image sensor will only has 3840h * 2160v pixels, not 3 channels of such. The 3 channels of color are generated after demosaicing (which can be done by the image signal processor in the computer, in HW3 there is a ISP on the HW3 chip that does it). That means for 4K60p at 10-bit, it only requires 5 Gbps over the coax cable.

Oh, yeah. I forgot that camera vendors claim 4K resolution when they really only have O(8M) subpixels. It's kind of a lie, in that each color channel effectively has only ~720p resolution, but whatever. 😂


In reality the NNs don't need that much resolution from the cameras either (Tesla's current cameras aren't even FHD). Maybe the center camera can have a 4K sensor for digital zoom purposes, but it'll likely be used in a crop mode, and in that mode most image sensors today have readout modes that can output a crop with no additional processing, which means you don't need to send a 4K signal through the wire (just the cropped one).

I would be cautious about saying something that broad. The reason for using higher resolution is to detect moving objects sooner, when they're occupying a smaller area of the sensor. You're not going to want to cycle between a wide-angle view and a bunch of digital crops based on what the computer thinks is interesting, because that would increase your overall latency for detection by a large amount.


Note, Tesla does not use SDI, it uses MIPI CSI-2 (common standard that image sensors use) which is serialized over FPD-Link III (commonly used in automotive industry for backup cameras) which can be transmitted over a single coax cable:
But with the exception of the rear camera, AFAIK, Tesla uses a twisted pair for each camera. If I'm wrong, and they're using coax, then great. But if they have to replace it anyway, fiber is a lot cheaper than coax. RG-11/U rated for 12 GHz operation (e.g. for 12G-SDI) costs almost $3 per foot. It is also really big stuff, with a four-inch bend radius.

Fiber optic cabling costs about 25 cents per foot, likely with more like a one inch bend radius.

One of these things can easily be routed around a car window. I'll let you guess which. 🙂


I imagine for higher resolutions, if they stick with CSI and FPD-Link, they'll just add more cable pairs and it'll still be drastically cheaper than using fiber. MIPI is cognizant of support of 8K or higher resolutions, and has a 18 wire design (6 trios) that delivers 34Gbps which allows for 50MP at 10-bit, which is more than good enough for forseeable applications:
https://www.mipi.org/sites/default/files/MIPI_CSI-2_Specification_Brief.pdf

Have you ever tried to route a bundle of 18 wires? It's hard enough to tightly route an Ethernet cable that contains only 8 wires, and you're talking about more than double that many, not including power. Times four cameras. Somehow that just doesn't seem very practical compared with running one (or even four) optical fibers.

On the cable end if they stick with 4Gbps FPD-Link III, you can squeeze by 10-bit 4K48p (or maybe slightly lower frame rate or res) on a single cable. If you double up the cables/controllers you get 8Gbps, which will give you 10-bit 4K60p. 3 cables gives you 12Gbps, which gets you past 10-bit 6k60p (6144x3160). So on and so forth.

That might be just barely good enough, yes. And that works with the existing twisted pair wiring, assuming the existing wiring is good enough. So it might be possible to do this with just a camera swap. Nice. But if they have to upgrade the wiring, IMO, it makes a lot more sense to switch to fiber than to switch to a bundle of 18 wires that are each just waiting to fail. Remember, the more wires, the more potential points of failure, the harder it is to route the bundle around corners, etc.
 
  • Like
Reactions: Sigma4Life
On what are you basing these claims? Why do you need 10 bits? And why 60 fps and 4K? Pipeline latency has very little to do with frame rate, and much more to do with resolution and bit depth (for which you provide no information as to why these are needed).
The 4K part was based on the general assumption that Tesla will probably move to 4K sensors.

Tesla moved to 10-bit depth on their cameras because 8 bits per pixel doesn't give adequate contrast ratios outdoors in real-world conditions. The more dynamic range the camera produces, the more likely you are to be able to recognize a pedestrian moving in a shadowed area on a bright, sunny day. For comparison purposes, my DSLR shoots at 14-bit depth, which allows for a decent amount of shadow recovery, and the difference is, frankly, huge. When I suggested 10 bits per pixel, that was more a minimum. If they're replacing the cameras anyway, I'd expect them to go for at least 12 bits per pixel; even cell phone cameras can do that these days, and that 4x improvement in contrast ratio over 10 bpp matters a great deal.

The absolute minimum latency for any image processing pipeline is the amount of time it takes to get the image into the computer in the first place. And anything that involves motion detection requires a minimum of two frames. So the minimum latency required for detecting that something interesting is happening is half as long if the camera is running at 60 fps as it is at 30 fps. This, of course, ignores processing time, but to the extent that resolution, bit depth, etc. affect processing time, that can be compensated for by throwing more hardware at the problem, whereas nothing can reduce the latency impact caused by the delay between frames other than running at a higher frame rate.
 
What do you mean by "zoom" ?

Does anyone actually try to zoom in on far away objects ? Eye obviously changes focus ...
Tesla has three cameras in the front: a wide-field camera, a medium-field camera, and a narrow-field camera. The wide-field camera is probably next to useless because of how low the resulting angular resolution (wide field of view at low resolution). The narrow-field camera (a.k.a. the "zoomed" camera) gives a narrower field of view, for spotting small things farther away, and is potentially extremely useful, as is the medium-field camera.

The difference between the current medium-field camera (50-degree field of view) and narrow-field camera (35-degree field of view) means that the narrow-field camera has less than half again more angular resolution than the medium-field camera. Moving from the existing 1280x960 cameras to a single 4K camera (3840 x 2160) would basically *triple* the effective resolution, which could potentially make the separate narrow-field camera unnecessary, assuming that the current camera is within a factor of two of being good enough.

So my guess would be that they will remove *both* the narrow-field *and* the wide-field camera, and instead have three medium-field cameras, with one aimed somewhat left, one aimed somewhat right, and the third aimed in the center. With a 50-degree field of view per camera, you would then end up with roughly the same horizontal coverage as the existing wide-field camera's 150-degree field of view, but with the angular resolution of the narrow-field camera across that entire area. You would lose the wide-field camera's extra coverage vertically, but unless the self-driving stack is trying to avoid cruise missiles, that probably isn't a problem. 😂