Decoding FSD Beta 9.2 release notes

Terminator857 · Aug 15, 2021

https://twitter.com/x/status/1426934048136499209

> Clear-to-go boost through turns on minor-to-major roads (plan to expland to all roads in v9.3)
Accelerates faster during turns.

> Improved peek behavior where we are smarter about when to go around the lead vehicle by reasoning about the causes for lead vehicle being slow.
When is peak? When it is needed most? Right before a decision is needed to be made?

> v1 of the multi-modal predictions for where other vehicles (are) expected to drive. Partial implementation
Multi-model means multiple modes. FSD will have predictions for different possibilities what a car will do. For example: 70% chance will continue, 20% chance will turn, 10% will stop. Multi-model may mean that the algorithm used depends upon the situation. For example if near an intersection.

> New lanes network with 50k more clips (almost double) from the new auto-labeling pipeline.
More lanes mapped? Is this a right turn lane, a bike lane, etc...

> New VRU velocity model with 12% improvement to velocity and better VRU clear-to-go performance.
VRU = vulnerable road user. Pedestrian, bike, escooter, etc... Thanks to Daniel in SD for that.

> Model trained with "Quantization-aware-training" (QAR), an improved technique to mitigate int8 quantization.
int8 quantization is used during machine learning training. Initially floating point was used, but to conserve memory a switch was made to int8. This introduces rounding errors. QAR is a technique used to take into account the rounding error, and provide better overall performance.

> Enabled inter-soc synchronous compute scheduling between vision and vector space processes.
They have a new task scheduler that is synchronous based. In other words, discrete time blocks are allocated for computer vision and vector computation.

> Planner in the loop is happening in v10.
The planner is what FSD is going to do. "in the loop" suggest this will be given a discrete amount of time to compute also. This is compared to an unbounded amount of time to compute and interrupted as needed by the task/process scheduler.

> Shadow mode for new crossing/merging targets network which will help improve VRU control.
There is a new algorithm being tested for VRU prediction for pedestrians, bikes, e-scooters, etc that are crossing in the cars path or merging.

Bladerskb · Aug 15, 2021

wow they are in early development of their prediction model, they are further behind than i thought.

Terminator857 · Aug 15, 2021

Bladerskb said:
wow they are in early development of their prediction model, they are further behind than i thought.

Can you give us your thoughts on what the notes mean? Especially about the ones with question marks above. Thanks!

diplomat33 · Aug 15, 2021

Terminator857 said:
> Improved peek behavior where we are smarter about when to go around the lead vehicle by reasoning about the causes for lead vehicle being slow.
When is peak? When it is needed most? Right before a decision is needed to be made?

I am thinking peek behavior refers to figuring out the best time for initiating a passing maneuver. So if the car decides that yes, it needs to pass a slow moving lead car, it then needs to decide when it is the best time to start passing the car.

Terminator857 said:
> New lanes network with 50k more clips (almost double) from the new auto-labeling pipeline.
More lanes mapped? Is this a right turn lane, a bike lane, etc...

Yeah. Tesla is using more clips for auto-labeling. More data will make the NN better at detecting different types of lanes.

Terminator857 · Aug 15, 2021

As Bladerskb suggests, multi-modal may also mean that ML is being used in addition to hard coded algorithms.

Enginerd · Aug 15, 2021

I think "multi-modal" (not model) refers to the different modes of transportation sharing the roads: cars, motorcycles, bicycles, busses, semis, pedestrians, etc.

diplomat33 · Aug 15, 2021

The new prediction and planning NN will require compute power. Will the current FSD computer be good enough?

EVWatcher · Aug 15, 2021

Terminator857 said:
https://twitter.com/x/status/1426934048136499209
> Clear-to-go boost through turns on minor-to-major roads (plan to expland to all roads in v9.3)
Accelerates faster during turns.

> Improved peek behavior where we are smarter about when to go around the lead vehicle by reasoning about the causes for lead vehicle being slow.
When is peak? When it is needed most? Right before a decision is needed to be made?

> v1 of the multi-modal predictions for where other vehicles (are) expected to drive. Partial implementation
Multi-model means multiple modes. FSD will have predictions for different possibilities what a car will do. For example: 70% chance will continue, 20% chance will turn, 10% will stop. Multi-model may mean that the algorithm used depends upon the situation. For example if near an intersection.

> New lanes network with 50k more clips (almost double) from the new auto-labeling pipeline.
More lanes mapped? Is this a right turn lane, a bike lane, etc...

> New VRU velocity model with 12% improvement to velocity and better VRU clear-to-go performance.
VRU = vulnerable road user. Pedestrian, bike, escooter, etc... Thanks to Daniel in SD for that.

> Model trained with "Quantization-aware-training" (QAR), an improved technique to mitigate int8 quantization.
int8 quantization is used during machine learning training. Initially floating point was used, but to conserve memory a switch was made to int8. This introduces rounding errors. QAR is a technique used to take into account the rounding error, and provide better overall performance.

> Enabled inter-soc synchronous compute scheduling between vision and vector space processes.
They have a new task scheduler that is synchronous based. In other words, discrete time blocks are allocated for computer vision and vector computation.

> Planner in the loop is happening in v10.
The planner is what FSD is going to do. "in the loop" suggest this will be given a discrete amount of time to compute also. This is compared to an unbounded amount of time to compute and interrupted as needed by the task/process scheduler.

> Shadow mode for new crossing/merging targets network which will help improve VRU control.
There is a new algorithm being tested for VRU prediction for pedestrians, bikes, e-scooters, etc that are crossing in the cars path or merging.

> Improved peek behavior where we are smarter about when to go around the lead vehicle by reasoning about the causes for lead vehicle being slow.
When is peak? When it is needed most? Right before a decision is needed to be made?

Peek !== Peak

Im guessing they added more code to the car will steer to the left or right a bit to try and peek ahead to see whats going on further down the road.

> New lanes network with 50k more clips (almost double) from the new auto-labeling pipeline.
More lanes mapped? Is this a right turn lane, a bike lane, etc...

I think network refers to new NN. Likely meaning they have a new network to the replace an older one with double the training data. Most significant aspect of this sentence to me is the reference to auto-labeling pipeline which I believe means DOJO

enzo · Aug 15, 2021

Bladerskb said:
wow they are in early development of their prediction model, they are further behind than i thought.

Cannot imply this. There could have been years of sub-1.0 versioning, or they restarted versioning after going vision-only.

ModalYYJ · Aug 15, 2021

Good summary of first fsd beta 9.2 drives here First Drives: Tesla FSD Beta 9.2 and Release Notes [VIDEOS] - TeslaNorth.com

ZeApelido · Aug 15, 2021

Bladerskb said:
wow they are in early development of their prediction model, they are further behind than i thought.

Is the glass half-full or half-empty?

Like. I said before, you can't really develop the best prediction / planning algorithms if your perception algorithm is crap.

Maybe, now, finally, their perception algorithm is progressing to "better than crappy". Maybe.

It may not take them 5 years to catch up on prediction and planning though.

Maybe.

Bladerskb · Aug 15, 2021

Enginerd said:
I think "multi-modal" (not model) refers to the different modes of transportation sharing the roads: cars, motorcycles, bicycles, busses, semis, pedestrians, etc.

Not quite: It basically means the prediction of multiple future trajectories of an agent in contrast to a single trajectory. Ex: Vehicles are multi-modal because they can do multiple things; go straight, turn left, turn right, etc

qdeathstar · Aug 15, 2021

wow, this is good news. Missed the rollout date by a few days. Overall, i consider that a win.

linux-works · Aug 16, 2021

Bladerskb said:
wow they are in early development of their prediction model, they are further behind than i thought.

> Enabled inter-soc synchronous compute scheduling between vision and vector space processes.
They have a new task scheduler that is synchronous based. In other words, discrete time blocks are allocated for computer vision and vector computation.

yeah, that's a redo of something pretty fundamental. that should have been tested and validated at the POC level, really early.

the stuff about 8bit int and floating point, that also worries me. a lot.

what the heck is going on? are you guys just stumbling in the dark? sure seems like it.

linux-works · Aug 16, 2021

diplomat33 said:
The new prediction and planning NN will require compute power. Will the current FSD computer be good enough?

my hint take-away is that they are now trying to 'fit' things into their compute model.

if you convert FP to int, that's because of speed. your stuff is too slow. its not about memory. the mem diff between 1 byte and a float (32bit) or even a double (64 bit) is not that much, unless we're talking about most of ram being used for this set of structs.

but speed is the thing that comes to mind, when I see a dev going from fp to int.

that has all kinds of issues and should not be done lightly. its not just round-off, but also timing diffs. and that ALL the tests have to be re-validated (not just re-run but re-validated).

pretty big oops, here.

emmz0r · Aug 16, 2021

it's even int8 which is a byte.

linux-works · Aug 16, 2021

emmz0r said:
it's even int8 which is a byte.

and since its signed (if we take that literally) then its not a full 0-255 range, but its 127/128 plus or minus, with sign. so the range of absolute value is half, with signed data types vs unsigned.

its definitely a speed-up when you do this, but .... it makes me wonder WHY they did this. this is huge. like when y2k happened and we had to look at all the code that made assumptions about data width and re-test it with some unit tests that may not actually have been written.

you dont just change data types like that toward the end without a really good reason. it adds a lot of risk and needs lots more time to validate, until you get confidence that the new data model works as well as the old (just faster).

I wish them will on this. it would not be something I would want to do unless there was no other choice. or, its paying technical debt, now, instead of a more costly fix, later on.

MP3Mike · Aug 16, 2021

linux-works said:
my hint take-away is that they are now trying to 'fit' things into their compute model.

My take-away is that you don't have any idea what is going on.

linux-works said:
if you convert FP to int, that's because of speed. your stuff is too slow. its not about memory. the mem diff between 1 byte and a float (32bit) or even a double (64 bit) is not that much, unless we're talking about most of ram being used for this set of structs.

Nothing in those release notes say anything about changing a data type.

emmz0r · Aug 16, 2021

From what I read it's the training that has changed to account for int8 (which has been there for I don't know how long), maybe verygreen knows.
The OP also says this

Improving INT8 Accuracy Using Quantization Aware Training and the NVIDIA TAO Toolkit | NVIDIA Technical Blog

Deep neural network (DNN) models are routinely used in applications requiring analysis of video stream content. These may include object detection, classification, and segmentation. Typically…

developer.nvidia.com

mark95476 · Aug 16, 2021

Yup. Tesla's NPU/inferencing engine was designed around int8.

It's laughable what that Linux guy posts. S/he's pretending and doing very badly since it's so obvious.

Hint: It's digital therefore it's quantized.

MP3Mike said:
My take-away is that you don't have any idea what is going on.

Nothing in those release notes say anything about changing a data type.

Decoding FSD Beta 9.2 release notes

Active Member

Senior Software Engineer

Active Member

Average guy who loves autonomous vehicles

Active Member

Member

Average guy who loves autonomous vehicles

Member

Member

Member

Active Member

Senior Software Engineer

Completely Serious

Active Member

Active Member

Senior Software Engineer

Active Member

Well-Known Member

Senior Software Engineer

Active Member

Similar threads