Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

HW2.5 capabilities

This site may earn commission on affiliate links.
Ok, I wonder if somebody just jumped ahead of themselves here and forgot to exclude the extra NNs or what, but so far it does not look like anything's majorly different being done with the cameras. Sadly they removed the explicit logging of what camera does what.

Of course there's no calibration data for all those other cameras too (very visible warning about that).

I still need to actually drive the car to see if anything changes, the map messages have certainly changed a bit too now.

They just wanted to get jimmy_d's thoughts on the new Neural Nets before fully committing to them. :)

I should add in all fairness to Tesla that I don't have much issue with a GoogLeNet based network being used. The great thing about the Inception architect of GoogLeNet is it performs well under strict constraints on memory and computational budgets. Plus the tools NVidia provides tend to be limited in what they support acceleration wise. It was likely picked for having less risk associated with it. Especially since it was likely going to be replaced.

The NN might seem a bit intern'ish like to some high level research person, but there are probably lots of reasons they went with what they did.

I'm less disappointed by the Network choice/architect, and more disappointed in the lack progress on the vision front. Heck the cars don't even have a side camera view. Why can't that automatically come on with a turn signal engagement? Just simple stuff.
 
Last edited:
Let me offer my limited thoughts on this matter, for once, perhaps I may be able to contribute something beyond a bathroom humor.

I mean is this what the AP2.5 ECU is actually dealing with? Why on earth do they have such high-res cameras and camera sensors installed? Despite @jimmy_d's excellent posts, my brain doesn't seem to comprehend this. Instead, my brain screams the question: Wouldn't the reasonable thing to do be to exploit every single pixel from the new camera sensors? Shouldn't we expect to have an on-board computer that chews through all of it, even if it would spit much of it away after doing it's thing? Why downsample before Vision gets a chance to look over it?

Just asking. Hoping for intelligent answers...

This is a great question, and makes me think of an analogy with autism. The issue with autism is not the number of neurons in the brain or its computing power, but is almost the opposite. It's the lack of a well-timed pruning process that allows the brain to be efficient at parsing data and then assigning salience. In other words, if the brain that can't eliminate any of the noise from the perceived environment, then picking up on social cues becomes almost impossible. Without a massive pruning process, it is too difficult to know which noise to eliminate.

Perhaps this is why a lower number of pixels is so important. It's attempting to parse the data since adding more pixels likely complicates the task exponentially.

Your rear-view data stream is included in your main visual data stream to your brain, but the brain segments it off and extracts meaningful data from it knowing that "that little blob of vision is separate." Often it's more efficient to bundle multiple data sources into software as a single data-stream even though they're technically different streams just so that you're copying/operating on the data once instead of having overhead of processing 8 different cameras. You could naively scale and crop the other cameras and then slap them onto the side of the main camera. There is some interesting research on how compound eye animals see. A fly is not creating a perfect spherical stitched image in their brain like how our brain synthesizes two eyes into one. How objects move between compound eye views is actually in of itself interesting to their vision systems.

This is a very interesting post IMHO, and reminds me of @lunitiks explanation of AP2 versus AP1 (the alien picture versus the cyclops cat).

However, the eyes of a fly are not akin to having multiple cameras. Without actually knowing much about entymology, my guess is that the fly's eyes would be more like two primitive eyes with a kaleidoscopic lenses.

My guess that the eye of the fly is superior from an evolutionary perspective because it is better at peripheral vision without much specificity at all, kind of the opposite of an eagle's eye, where it can see crazy detail. I suspect that the challenge between EAP and FSD is going to be making a really good fly's eyes for EAP.... and to do FSD, we'd need to mimic a primitive human brain... which is infinitely more complex.

So I'm not certain if @im.thatoneguy's description of human vision would accurately reflect what is actually happening in the human brain with respect to how the brain assigns meaning to different visual fields, though his software data-stream comment is very insightful for someone like me without the programming background. Since I'm more of a brain nerd, I go back to the learning the fact that the human brain must be trained to see. There has been some interesting clinical observations of patients with respect to rare instances in which there has been a reversal of certain forms of congenital blindness, i.e. situations where a person may be able to have achieve vision for the first time. Their brain can't recognize anything at all except bright chaos for a very long time.
 
Just for fun I downsampled some frames grabbed from some of the HW2 cameras under different conditions (source) to 104x160. Kind of gives you a sense of what level of detail we're talking about. These are full size (Try zooming in 500x.)...

Main camera, night time:
View attachment 256146

Narrow camera, highway:
View attachment 256147

Wide (fisheye) camera, rainy:
View attachment 256148

Rear-view camera:
View attachment 256149

I mean is this what the AP2.5 ECU is actually dealing with? Why on earth do they have such high-res cameras and camera sensors installed? Despite @jimmy_d's excellent posts, my brain doesn't seem to comprehend this. Instead, my brain screams the question: Wouldn't the reasonable thing to do be to exploit every single pixel from the new camera sensors? Shouldn't we expect to have an on-board computer that chews through all of it, even if it would spit much of it away after doing it's thing? Why downsample before Vision gets a chance to look over it?

And how about this Tesla job ad description:


Just asking. Hoping for intelligent answers...
I can think of some fairly obvious use cases:
1) Cropping. This is useful for calibration and also as a way to do digital zoom. You can have networks that work only on 104x160 and it can identify an interesting area and crop in and process always at that size. For example, the network recognizes something as a sign in that 104x160 picture, and then it crops in and processes another 104x160 picture of the sign.
2) Pixel averaging for better low light sensitivity (as noted by @S4WRXTTCS). It used to be you want larger pixels for better low light sensitivity, but the thicknesses of sensor layers have drastically reduced so you don't really gain much of an advantage from going with a lower resolution sensor with larger pixels.
3) Higher quality imaging pipeline and better sharpness. Usually the higher resolution imaging chips provide a higher data rate pipeline and downsampling usually results in better sharpness than the native resolution. For the same reason it's practically always better to record in 4K and downsample to 1080p than to record 1080p directly.

However, do note that at 1.2MP, the sensor isn't really that high resolution anyways. The lowest you can go is VGA which is 0.3MP.
 
Interesting ... Tesla’s Autopilot is under attack by Apple co-founder Steve Wozniak, billionaire shorter, and media
In an interview with CNBC, Wozniak made some confusing comments about Tesla’s “self-driving” claims relating to Autopilot:

“Sometimes Tesla’s are dangerous because of what they call ‘autopilot,'” says Wozniak. “You get thinking, oh, ‘It is easy, I can reach over and not look for a few seconds and that is the second your car drifts over the line,” he says, adding that it is “easy to make mistakes, especially certain weather conditions and whatnot.” The Apple co-founder added:

“Tesla has in people’s mind that they have cars that will just drive themselves totally and it is so far from the truth so they have deceived us. […] Driving my Tesla, over and over and over there are unusual situations on any road anywhere and every single human being alive — dumb or smart — would be able to get through it and the Tesla can’t”

Of course, Tesla doesn’t actually say that the current software in its vehicles results in autonomous driving. The company still asks its drivers to always stay alert and to monitor their vehicles at all time. Wozniak’s issue with Autopilot seems to be that it could let people think that it is autonomous though it’s clear why.
 
Interesting ... Tesla’s Autopilot is under attack by Apple co-founder Steve Wozniak, billionaire shorter, and media
In an interview with CNBC, Wozniak made some confusing comments about Tesla’s “self-driving” claims relating to Autopilot:

“Sometimes Tesla’s are dangerous because of what they call ‘autopilot,'” says Wozniak. “You get thinking, oh, ‘It is easy, I can reach over and not look for a few seconds and that is the second your car drifts over the line,” he says, adding that it is “easy to make mistakes, especially certain weather conditions and whatnot.” The Apple co-founder added:

“Tesla has in people’s mind that they have cars that will just drive themselves totally and it is so far from the truth so they have deceived us. […] Driving my Tesla, over and over and over there are unusual situations on any road anywhere and every single human being alive — dumb or smart — would be able to get through it and the Tesla can’t”

Of course, Tesla doesn’t actually say that the current software in its vehicles results in autonomous driving. The company still asks its drivers to always stay alert and to monitor their vehicles at all time. Wozniak’s issue with Autopilot seems to be that it could let people think that it is autonomous though it’s clear why.
Can you move this to a different thread? Although I believe I saw this news posted already elsewhere in this thread: Happy Birthday AP 2.0

Not sure why you posted it here. We have enough off topic posts here. The thread just got back on track with some new information: let's keep it that way.
 
Awesome!! Thank you for sharing. Also how are you able to see the network specification?? This is blowing my mind that is a hardly modified GoogleLeNet network.

A friend gave me a copy. I also recently received copies of earlier versions, so I'm going to take them apart to see how the network has been changing over time.

The bottom 21 layers of the network stack is 100% GoogLeNet, with only a single difference. Because GLN outputs a label it doesn't need to preserve position information, so they drive the frames all the way down to 1/32 of full size (with 1024 layers in 5b) in GLN before using using linear+softmax to generate the label. But 40.1 only goes down to 1/16 size even though they keep the 5a and 5b with the original GLN kernel depth. Because Tesla is re-expanding the frames and generating output frames that retain position data they probably need to avoid getting too small. This makes layers 5a and 5b 4x as compute intensive in 40.1 as it is in GLN (2Gops per frame).

Have been thinking about why they would go to frames as output instead of coordinates or labels or something easier to process at the back end and it struck me that they could be training the bottom 21 layers unsupervised by building inception as an autoencoder. Am going to see if I can find any examples of other people doing this - maybe the output deconvolution layers are 'borrowed' from another project too.
 
They just wanted to get jimmy_d's thoughts on the new Neural Nets before fully committing to them. :)

I should add in all fairness to Tesla that I don't have much issue with a GoogLeNet based network being used.

Yeah, I might have overstressed the 'everybody uses this - even college classes' bit. Of course the reason it's popular is that it's a really good network architecture. As you say, really efficient. The inception blocks that were invented for it are something like a 5x improvement over regular CNNs in terms of compute needs. It's slower to train, though, but that's a good tradeoff for an embedded application. Google's dev team did a hero's work on this network design, so why fix it if it isn't broken?
 
  • Like
  • Love
Reactions: S4WRXTTCS and _TTT_
I hope the brain and vision theory musings aren't seen as off topic, .. it's some fascinating stuff regarding the implications of how they must try to solve EAP and FSD, but it could be too much of an interpolation to stay in the thread.
 
The wide zoom one is called.... tada! fisheye_autowiper ;)
heart-beating.gif


I knew it. I knew it, I knew it, I knew it. Tesla: I knew it. Whohoo. Verygreen made my day. CASE CLOSED!
 
Sorry about that. I get all emotional about that rain sensing stuff. But seriously, let's take a minute and think about how awesome this actually is. AFAIK, Tesla will be the first *ever* to implement vision based rain sensing wipers on production cars. If they get it right, the wiper functionality may become superior to everything that's out there. Totally blows my mind. Small pleasures, small pleasures.

Allright, don't mind me. Please proceed
 
What are these map messages??
It used to complain that it cannot open sqlite db that indeed was not there. This firmware comes with a db schema so I imagined it would create the db. Alas, it does not (not on my car anyway). But the message in the logs changed to something about contacting maps task now.
 
Can you see anything more related to it? Do they collect data which is uploaded together with manual wiper actions performed by the driver for comparison/tuning perhaps?
I doubt it. Next rain is forecast on Saturday, so I might try it out then.

Meanwhile I am fishing for evidence these NNs are used at all, but the method I was using in 17.40 and prior suddenly no longer works so I am improvising a more complicated one, sigh.
 
Sorry about that. I get all emotional about that rain sensing stuff. But seriously, let's take a minute and think about how awesome this actually is. AFAIK, Tesla will be the first *ever* to implement vision based rain sensing wipers on production cars. If they get it right, the wiper functionality may become superior to everything that's out there. Totally blows my mind. Small pleasures, small pleasures.

Allright, don't mind me. Please proceed

More likely.... it sortof works, and we get 400 posts whining about why they didn't go with a normal rain sensor. Slowly it improves to the point where it's pretty good and everything goes silent.