FSD AP improvements in upcoming v11 from Lex Fridman interview

Cosmacelf · Dec 31, 2021

The recent Lex Fridman interview of Elon had some interesting tidbits about the next major FSD beta version (https://www.youtube.com/watch?v=DxREm3s1scA).

First, something simple. Tesla’s AI inference chip (each car has two of them) has an image signal processor that processes each image frame from the eight cameras, and the resulting processed image is what the car neural net then sees, and more importantly, what it was trained on. This processing is similar to what any digital camera does. The unprocessed image looks like what a raw image file looks like (for those photo buffs) and isn’t human useful since a human really can’t see much in the resulting image.

But that initial processing takes a huge 13 milliseconds. For a system that runs at a 27ms frame rate, that’s like half your time budget. Moreover, the processed image has a lot less data in it that the raw image (which is basically photon counts). In particular, the raw image would allow a computer to see much better in very low light situations than with a processed image.

So, Tesla is bypassing the image processor entirely in V11 and the neural net will use just the raw, photon count, images. Doing this means Tesla will have to completely retrain its neural nets from scratch since the input is so different. This is yet another example of “the best part is no part” thinking and should offer huge improvements since they will now have more accurate input and more time for other kinds of processing. BTW, the top left of this image of the Tesla inference chip is what they are bypassing. It isn’t a complete waste since they still need the processed images for sentry mode and whatnot, but it won’t be part of the critical FSD time loop.

r/teslamotors - FSD AP improvements in upcoming v11 from Lex Fridman interview

Tesla FSD inference chip

v11 will also push even more C code into the neural net. Currently, the neural net outputs what Elon called “a giant bag of points” that is labeled. Then C code turns that into vector space which is a 3D representation of the world outside the car. V11 will expand the neural net so that it will produce the vector space itself, leaving the C code with only the planning and driving portions. Presumably this will be both more accurate and faster.

Not all the neural nets in the car use the surround video pipeline yet. Some still process perception camera by camera, so that’s getting addressed as well.

In the end, lines of code will actually drop with this release.

Other interesting things. They have their own custom C compiler that generates machine code for the specific CPUs and GPUs on the AI chip. All the hardcore time sensitive code is written in C.

So, that’s the info Elon told us from the interview. As I wrote in my last deep dive about FSD (Layman's Explanation of Tesla AI Day), Tesla has a lot of optimizations it can still do, and this is an example of a couple of them.

EVNow · Dec 31, 2021

Cosmacelf said:
The recent Lex Fridman interview of Elon had some interesting tidbits about the next major FSD beta version

Good info - but there are multiple threads on this already. May be you can delete this and post it in the Beta 11 sticky at the top ?

Chippylad · Dec 31, 2021

Cosmacelf said:
The recent Lex Fridman interview of Elon had some interesting tidbits about the next major FSD beta version (https://www.youtube.com/watch?v=DxREm3s1scA).

First, something simple. Tesla’s AI inference chip (each car has two of them) has an image signal processor that processes each image frame from the eight cameras, and the resulting processed image is what the car neural net then sees, and more importantly, what it was trained on. This processing is similar to what any digital camera does. The unprocessed image looks like what a raw image file looks like (for those photo buffs) and isn’t human useful since a human really can’t see much in the resulting image.

But that initial processing takes a huge 13 milliseconds. For a system that runs at a 27ms frame rate, that’s like half your time budget. Moreover, the processed image has a lot less data in it that the raw image (which is basically photon counts). In particular, the raw image would allow a computer to see much better in very low light situations than with a processed image.

So, Tesla is bypassing the image processor entirely in V11 and the neural net will use just the raw, photon count, images. Doing this means Tesla will have to completely retrain its neural nets from scratch since the input is so different. This is yet another example of “the best part is no part” thinking and should offer huge improvements since they will now have more accurate input and more time for other kinds of processing. BTW, the top left of this image of the Tesla inference chip is what they are bypassing. It isn’t a complete waste since they still need the processed images for sentry mode and whatnot, but it won’t be part of the critical FSD time loop.

Tesla FSD inference chip

v11 will also push even more C code into the neural net. Currently, the neural net outputs what Elon called “a giant bag of points” that is labeled. Then C code turns that into vector space which is a 3D representation of the world outside the car. V11 will expand the neural net so that it will produce the vector space itself, leaving the C code with only the planning and driving portions. Presumably this will be both more accurate and faster.

Not all the neural nets in the car use the surround video pipeline yet. Some still process perception camera by camera, so that’s getting addressed as well.

In the end, lines of code will actually drop with this release.

Other interesting things. They have their own custom C compiler that generates machine code for the specific CPUs and GPUs on the AI chip. All the hardcore time sensitive code is written in C.

So, that’s the info Elon told us from the interview. As I wrote in my last deep dive about FSD (Layman's Explanation of Tesla AI Day), Tesla has a lot of optimizations it can still do, and this is an example of a couple of them.

Don’t worry about people fussing about whether you have posted this in the correct thread … this is a brilliant plain English explanation of that part of the interview. Keep posting wherever and whenever you have more insights to share!

daktari · Dec 31, 2021

Cosmacelf said:
Doing this means Tesla will have to completely retrain its neural nets from scratch since the input is so different.

This is the most interesting part. How long to retrain? Did they already collect raw pictures from the cars? Do they have to label all again? Or do they already have the training set ready to go? It sounds like the n'th rewrite...

And we all know raw from our DSLRs are giant files compared to any compressed format. Do the cars have enough bandwidth for this, and compute power? Do the plan on compresing the raw data?

Cosmacelf · Dec 31, 2021

daktari said:
This is the most interesting part. How long to retrain? Did they already collect raw pictures from the cars? Do they have to label all again? Or do they already have the training set ready to go? It sounds like the n'th rewrite...

Great questions. Presumably you can go backwards from processed images and recreate a raw image. It won’t be 100% the same, but it might be enough for training. Going forward they can grab raw images (And probably have been for a while).

daktari said:
And we all know raw from our DSLRs are giant files compared to any compressed format. Do the cars have enough bandwidth for this, and compute power? Do the plan on compresing the raw data?

Note that for transmission purposes, you can still compress raw images. But yes, there is a lot of devil in the details that Elon didn’t tell us!

Terminator857 · Dec 31, 2021

EVNow said:
Good info - but there are multiple threads on this already. May be you can delete this and post it in the Beta 11 sticky at the top ?

I hate pinned threads. Useless stale info most of the time. Comes across like spam to me. The good stuff is below. Prefer non consolidated threads with accurate subject.

Goose66 · Dec 31, 2021

I got the distinct impression from Elon in the Lex Freidman interview that we can expect v11 to be "worse" before things get better. As I said in another thread, I was quite surprised because I thought a lot this (e.g., the 8-camera surround video and neural networks for vector-space generation) was already done in the current "pure vision" stack.

EVNow · Dec 31, 2021

Terminator857 said:
I hate pinned threads. Useless stale info most of the time. Comes across like spam to me. The good stuff is below. Prefer non consolidated threads with accurate subject.

I prefer well kept and upto date pinned threads - that people actually use. Like the market thread in investors forum.

Just see all the FSD Beta threads - several of them for each release.

In this case, the whole is more than the sum of its parts.

ps : Yes, they do need to be kept updated and cleaned up. Like the one below - I use it quite a bit (well, used to when I followed deliveries more closely).

Wiki - Tesla Europe Registration Stats

This wiki belongs to the thread EU Market situation and outlook. Click HERE (CTRL-click to open in new tab) to add new data using a Google form. Any forum member can contribute. We encourage you to do so! Your entry will appear after the data is double-checked by other volunteers. If the data...

teslamotorsclub.com

Cosmacelf · Dec 31, 2021

@EVNow, the reason I didn’t use your pinned thread is that most pinned threads contain stale info (like @Terminator857 said). Yours may not, but how would I know that since the rest of TMC isn’t like that. Also your pinned thread has two subjects, one of which was anticipation, so it wasn’t relevant. That’s why I never even looked at it. To each his own, but now you know why.

EVNow · Dec 31, 2021

Cosmacelf said:
@EVNow, the reason I didn’t use your pinned thread is that most pinned threads contain stale info …

I know Tesla if agile but the thread was started yesterday …

zigmeister · Jan 1, 2022

Great sounds like 20 steps back one step forward to me.

karbomusic · Jan 1, 2022

Small clarification The 13ms is for all 8 cameras combined, something like ~1.5ms per cam FYI. Elon makes it sound like per camera at first but his next couple of sentences sort of clarify it. I'm also not entirely swallowing his great night vision statement, because he's not accounting for sensor noise. It's not like the sensors produce nothing if photons are hitting it. I'm not saying its not better than expected, just he doesn't fill in the details.

Cosmacelf · Jan 1, 2022

karbomusic said:
Small clarification The 13ms is for all 8 cameras combined, something like ~1.5ms per cam FYI. Elon makes it sound like per camera at first but his next couple of sentences sort of clarify it. I'm also not entirely swallowing his great night vision statement, because he's not accounting for sensor noise. It's not like the sensors produce nothing if photons are hitting it. I'm not saying its not better than expected, just he doesn't fill in the details.

Quite right on the 13ms combined. And yes, sensor noise is a thing. But neural nets can mitigate that in rather interesting ways. Past knowledge allows the neural net to be more confident of a noisy image. For instance, if a car off in the distance passes under a streetlight, and then is in shadows as it comes towards you, the neural net can merge the confident prediction of a car with later noisy images to be fairly confident that the noisy block is still a car (Pretend car has no headlights, or you’re looking at a bicyclist).

Little side story that happened to me. I was fumbling in the dark trying to find the keyhole for my key. All I could see was typical sensor noise, nothing distinct. By accident, my key fell into the keyhole and at that moment, all of a sudden I could “see” the key and keyhole. My brain used the extra information of the fact that there must be a key and keyhole where my hand was to disambiguate my vision sensor noise, and voila I could literally see the keyhole whereas 1 second before I could not.

Goose66 · Jan 1, 2022

zigmeister said:
Great sounds like 20 steps back one step forward to me.

I think that's the nature of "foundational rewrites" in a Machine Learning system. The idea is that the performance after multiple iterations in v11 will be better than the what could be achieved through additional iterations of v10.X. I don't expect to see a 10.9 at this point. If they are already saying that v11 is "fundamentally" different and requires retraining with "8-camera surround video," then what would be the point in putting out another version on the now defunct vision stack?

Sporty · Jan 1, 2022

Goose66 said:
I think that's the nature of "foundational rewrites" in a Machine Learning system. The idea is that the performance after multiple iterations in v11 will be better than the what could be achieved through additional iterations of v10.X. I don't expect to see a 10.9 at this point. If they are already saying that v11 is "fundamentally" different and requires retraining with "8-camera surround video," then what would be the point in putting out another version on the now defunct vision stack?

That’s what happened with normal fsd. The rewrite started and we got a year of no updates. I hope they are further along this time, and that they don’t put everyone on that task. We could still use some updates on the current builds.

KArnold · Jan 1, 2022

Goose66 said:
then what would be the point in putting out another version on the now defunct vision stack?

Cold weather improvements? Yet more games? Better juvenile humor? LOTS of good reasons!

Cosmacelf · Jan 1, 2022

daktari said:
And we all know raw from our DSLRs are giant files compared to any compressed format. Do the cars have enough bandwidth for this, and compute power? Do the plan on compresing the raw data?

I just realized I didn’t do a great job of explaining this above. The current neural net does not process a compressed image (like a jpeg image), it works on the uncompressed output of the image signal processor. So processing raw would maybe add 1 or 2 more bits per channel (RGB) to process.

Goose66 · Jan 2, 2022

Sporty said:
That’s what happened with normal fsd. The rewrite started and we got a year of no updates. I hope they are further along this time, and that they don’t put everyone on that task. We could still use some updates on the current builds.

Sounds like they are very close to what will be v11 already. However, as far as I can tell, they are still working to finish implementing the overall 8-camera vision system and hydranets that Karpathy talked about like two or three years ago. I have always been skeptical of how far FSD can go since I started driving on EAP/NOA in 2018, but I am still surprised that these "foundational rewrites" are taking so long. And now, yet again, Elon is predicting they won't be done until the end of the year (2022), which makes me think they still aren't close to "complete," whatever that means. At some point I imagine they will accept the limitations of the current sensor suite, pop a bottle of champagne and call it "done," and then move on to HW4 or HW5 with new sensors that will be the base on the L5/Robotaxi that Elon has been promising. Maybe by the end of 2023?

Cosmacelf · Jan 2, 2022

Goose66 said:
Sounds like they are very close to what will be v11 already. However, as far as I can tell, they are still working to finish implementing the overall 8-camera vision system and hydranets that Karpathy talked about like two or three years ago. I have always been skeptical of how far FSD can go since I started driving on EAP/NOA in 2018, but I am still surprised that these "foundational rewrites" are taking so long. And now, yet again, Elon is predicting they won't be done until the end of the year (2022), which makes me think they still aren't close to "complete," whatever that means. At some point I imagine they will accept the limitations of the current sensor suite, pop a bottle of champagne and call it "done," and then move on to HW4 or HW5 with new sensors that will be the base on the L5/Robotaxi that Elon has been promising. Maybe by the end of 2023?

Yes. Unlike Elon, I always thought it was going to take a long time. I based this on my knowledge of what neural nets, especially the kind Tesla is using, are capable of. They will get there, but it may take another Tesla AI chip version, and mostly likely, a better camera suite. The current system has a hard time seeing 90 degrees left and right for, really, any kind of intersection.

daktari · Jan 2, 2022

Cosmacelf said:
I just realized I didn’t do a great job of explaining this above. The current neural net does not process a compressed image (like a jpeg image), it works on the uncompressed output of the image signal processor. So processing raw would maybe add 1 or 2 more bits per channel (RGB) to process.

Ok, I am confused now. You wrote:

Cosmacelf said:
First, something simple. Tesla’s AI inference chip (each car has two of them) has an image signal processor that processes each image frame from the eight cameras, and the resulting processed image is what the car neural net then sees, and more importantly, what it was trained on. This processing is similar to what any digital camera does. The unprocessed image looks like what a raw image file looks like (for those photo buffs) and isn’t human useful since a human really can’t see much in the resulting image.

But that initial processing takes a huge 13 milliseconds. For a system that runs at a 27ms frame rate, that’s like half your time budget. Moreover, the processed image has a lot less data in it that the raw image (which is basically photon counts). In particular, the raw image would allow a computer to see much better in very low light situations than with a processed image.

to me, this sounds very much like compressing the raw image data, before NN interpretation.
So with this new architecture, what happens really? The "photon to NN" sound mostly like some salesman mumbo jumbo.

Does a decent technical explanation exist somewhere?

FSD AP improvements in upcoming v11 from Lex Fridman interview

Well-Known Member

Well-Known Member

Member

Active Member

Well-Known Member

Active Member

Member

Well-Known Member

Well-Known Member

Well-Known Member

Member

Member

Well-Known Member

Member

Member

Active Member

Well-Known Member

Member

Well-Known Member

Active Member

Similar threads