Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

HW2.5 capabilities

This site may earn commission on affiliate links.
There is only one NN at this moment, we can clearly observe it's the one being run all the time. So given absense of any other neural nets on ape at this time, this one must be the one, right?

only-one.jpg
 
Putting my life in the hands of an intern since I got the car in March... frustrating

But Karpathy tweeted a long time ago "Driving around PA with a Ludicrous mode Model X, testing a new Autopilot build. I see it will take a while before this gets old." It gives me bit of a hope, if he meant Ap...

And also "interns welcome!" Help!

Thx you guys for dissecting the system.
 
The current resolution makes sense if the current network is intended to be pretty much a drop in replacement for the old Mobileye unit, which uses lower resolution cameras.

Outside of this network, is the code largely still the same for AP1 and AP2? That would suggest that this may only be a temporary measure.

Also could explain the delay in things like rain sensor ability. Those features are being developed in the new build? If this is true then all the reason by analogy based on progress we've seen so far with AP2 could be way off.
 
btw there's another interesting thing I noticed. Apparently there's no plans to include the interior camera in HW2.5 in S/X cars.

When the model3 bits in ape2.5 were introduced in 17.18.xx the new interior camera was added into the code (named selfie). And then in 17.32 they split out hw2, hw2.5 (and I assume hw2.5 for model 3) build artifacts into separate images and all references to the selfie camera disappeared at the same time.

I am sure this is not because they suddenly decided to get rid of the selfie camera.
 
it's a bit hard to know what it does but it was never accessing any cameras before. Everything camera-related was in the vision task.

Apparently driver monitor states are somewhat cryptic:
Code:
DetectedState
NotDetectedState
StrikeOutState
VisualWarningState

This sounds like the current states for Autopilot using steering wheel torque - I feel torque, I don't feel torque, the driver failed three audio prompts in an hour (or forced the car over 90 mph,) and I'm in the process of prompting the driver to hold the wheel.
 
Apparently 104x160 is adequate to meet the objectives of the AP2 40.1 application.
Just for fun I downsampled some frames grabbed from some of the HW2 cameras under different conditions (source) to 104x160. Kind of gives you a sense of what level of detail we're talking about. These are full size (Try zooming in 500x.)...

Main camera, night time:
Main (104_160).jpg


Narrow camera, highway:
Narrow.jpg


Wide (fisheye) camera, rainy:
Fisheye.jpg


Rear-view camera:
RV.jpg


I mean is this what the AP2.5 ECU is actually dealing with? Why on earth do they have such high-res cameras and camera sensors installed? Despite @jimmy_d's excellent posts, my brain doesn't seem to comprehend this. Instead, my brain screams the question: Wouldn't the reasonable thing to do be to exploit every single pixel from the new camera sensors? Shouldn't we expect to have an on-board computer that chews through all of it, even if it would spit much of it away after doing it's thing? Why downsample before Vision gets a chance to look over it?

And how about this Tesla job ad description:
  • You will work on the Camera software pipeline running on the target product platform, to deliver high resolution images at high framerate to a range of consuming devices (CPU, GPU, hardware compressors and image processors)

Just asking. Hoping for intelligent answers...
 
I mean is this what the AP2.5 ECU is actually dealing with? Why on earth do they have such high-res cameras and camera sensors installed? Despite @jimmy_d's excellent posts, my brain doesn't seem to comprehend this. Instead, my brain screams the question: Wouldn't the reasonable thing to do be to exploit every single pixel from the new camera sensors? Shouldn't we expect to have an on-board computer that chews through all of it, even if it would spit much of it away after doing it's thing? Why downsample before Vision gets a chance to look over it?

And how about this Tesla job ad description:


Just asking. Hoping for intelligent answers...
Just a thought -- this is very possibly why street signs are still not being read and interpreted. There are nowhere near enough pixels to pull meaningfully accurate data off the street sign at that resolution...
 
I mean is this what the AP2.5 ECU is actually dealing with? Why on earth do they have such high-res cameras and camera sensors installed? Despite @jimmy_d's excellent posts, my brain doesn't seem to comprehend this. Instead, my brain screams the question: Wouldn't the reasonable thing to do be to exploit every single pixel from the new camera sensors? Shouldn't we expect to have an on-board computer that chews through all of it, even if it would spit much of it away after doing it's thing? Why downsample before Vision gets a chance to look over it?

Likely because they are fairly standard sensors across a wide range of use cases. They might have a specific filter on them for automotive use, but they're likely used for a lot more applications. The particular sensor picked for AP2 is likely used in thousands of different applications where you simply pick from 3 or 4 different filter types (Mono, Bayer, etc).

There also isn't really a downside to the extra resolution. It used to be that you wanted to have large pixels to have a good sensitivity, and that's still true to some degree. But, modern sensors are pretty sensitive and offer pixel binning. Where the camera sensors bins 4 pixels so it's kinda like one large pixel. It significantly increases the sensitivity with a tradeoff of resolution.

It's also likely that the data input size into the NN will change with time, and with NVidia SOC/GPU being upgraded. But, it's not very likely the sensors will ever get upgraded. So if I was the engineer I'd want to use the best sensor I could even if i wasn't using all the pixels.
 
Last edited:
  • Informative
Reactions: lunitiks
Just a thought -- this is very possibly why street signs are still not being read and interpreted. There are nowhere near enough pixels to pull meaningfully accurate data off the street sign at that resolution...

This is another good point. Or at least it leads to good point. :)

What I would do if I was designing a system is I would have a high resolution camera(s) where image data was downsized to go into an NN that looked for objects/lanes/etc. Then out of that NN would be a crop region for the street sign. But, this data won't be used to crop a region on the downsized image. It would be used to crop a region on the original image.

That way I could get a higher resolution image for what was needed to read the sign. But, it would still be fairly small since it's only a crop of the original image. It would be really tiny if it was a crop of the downsized image that went into the NN.
 
Last edited:
Ya but isn't the concern really if the NN actually gets enough data in the first place to understand it is a stop sign? Or a bicyclist. Or which way the pedestrian over there is looking/heading. Or if those are rain drops we need to automatically wipe away...

Yeah, it's all a trade off of speed/capacity versus accuracy.

Of figuring out how small the downsample image can be before accuracy falls off.

Of all the demo's, examples I've ran this is probably the one most sensitive to resolution. Now I don't know if it has any applications within a self driving car. It could be used to detect if someone was giving the car the middle finger though. Or it could be used to predict where the person was walking.

GitHub - CMU-Perceptual-Computing-Lab/openpose: OpenPose: Real-Time Multi-Person Keypoint Detection Library for Body, Face, and Hands

To get it to work on my Jetson TX2 with any reasonable frame rate required a fairly small image size.

It's a fun example since it will work on Windows/Linux/etc. I think it's only requirement is a modern NVidia graphics card.