Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Investor Engineering Discussions

This site may earn commission on affiliate links.
Limited L4 was reached by the likes of Waymo and Cruise, and reportedly both their compute and sensor requirements are much higher than Tesla and furthermore if you consider their approach as deficient, that means the superior approach is going to be even more (to an unknown degree obviously) compute and/or sensor intensive.
I don't know that that follows.
Doing a task inefficiently with the wrong tools takes more effort than efficiently with the right ones. Time will show which is the case.
 
Would you explain this more?

Are you saying that Tesla is packing the compute budget with test code that might not even be controlling the car in order to make faster development progress by exploiting excess compute capacity that would otherwise go to waste, and Green may be misinterpreting that as FSD Beta itself maxing out the FSD computer?

Is that essentially like running some neural nets in shadow mode even with FSD actively running?

Does there even exist any way to conclusively determine whether this is happening without inside information from Tesla's AI team?
Yes, use the spare compute and other viable non-maxed sub-systems when you can, have a policy that terminates them when/if needed by higher priority threads/calls/requests.

We know they run shadow code as they've called that out recently. How much or if this is every build is unknown...

Also, unless @verygreen has been able to determine what code is doing what, we simply don't know what code is doing what. What code is part of the critical real-time controls stack and which isn't. **IF** we did, then we could determine how many spare cycles are available at any given time. From that, we could build a trend, build over build, to see if they are running or have run out of space.

And, it is totally possible that they can recover resources build over build. We did that several times with the tiny micro with HW1.
 
  • Like
Reactions: bkp_duke
Also, unless @verygreen has been able to determine what code is doing what, we simply don't know what code is doing what. What code is part of the critical real-time controls stack and which isn't. **IF** we did, then we could determine how many spare cycles are available at any given time. From that, we could build a trend, build over build, to see if they are running or have run out of space.

Pretty sure he does know that- and has frequently, specifically, called out which code is being used actively and which is just doing something passively. Recalling off top of my head he has remarked on things like if/when radar was still being actively used as an input in the code, when the system was running but not actively using some of the city streets code, and when specific NNs were sending their input to other systems, and when they were not....I'm sure there's many other examples I'm not thinking of at the moment too.

Hopefully he'll be nice enough to clarify further here -though I hope the previous clarifications at least inform you that he does appear to know (and how he knows) at least some of the stuff simply SSHing in was not able to tell you earlier when you mentioned doing it.
 
you don't need to look at individual cores as all cores are node-local. You need to log into both nodes A and B to see they are splitting the load (check the AP task schedule, it outlines which nodes get scheduled for what core and what frames, pay attention to "Extended compute request" and how failure at it is now a fatal error. Also lately they got some NNs so "slow" they must run them on both nodes to get semblance of performace it looks like (see the "crossturbo" NNs added in recent releases).
Also don't look at the cpu load, it's the TPU load and latencies that are important metric (you can check those in collected statistic for every task that's part of the current scheduling plan (different plans for different modes of operation), you can also check overall processing latencies in SystemExecutionMetrics output)


This is no longer true, as of 2022.20.x releases they split this so now every trigger triggers on both nodes, but only part of the information comes from any individual node (they no longer run video collection from all cameras on Node-A only and don't store some other info there anymore - probably because they are starting to run out of memory there and those video buffers are sizeable)
I'm not actively probing, I just remember how it used to be setup.

Also, I think you meant TRIP instead of TPU. If they were using a TPU I'd most likely still be at Google ;)

If you are able to crack individually signed packages then you should be able to gleen what each is doing. But not sure if you are going to that level.
 
Doing a task inefficiently with the wrong tools takes more effort than efficiently with the right ones
While this is probably true, today we have people that are able to achieve a goal (waymo and cruise) with some tools (right or wrong? that's debatable. But they are actually achieving the goals!) and others doing grandstanding, posturing, but not actually achieving the goals. Not even on (staged) marketing videos, not even in the official promotional materials. Could this change in the future? That's of course feasible, but given what we know today, the only reason you'd think the tables would turn is faith.

I'm not actively probing, I just remember how it used to be setup.
Well, your data is just stale then? There was a time where the node A was doing everything and then node B was 100% idle. This changed at tehe end of 2019 as the actually enabled some beginnings of failover in 2019.40.50.1 and I tweeted about it
And then they run out of compute on the node A and the extended compute became mandatory and you could see Tesla scrambling to get their cross node IPC in order and they were having some visible delays at the time as I commented back then.
And today they spill even more and more stuff to the node B and the node A is so taxed they even removed a bunch of telemetry collection from there and shifted it to the B node.

So it's a moving target.

Also, I think you meant TRIP instead of TPU. If they were using a TPU I'd most likely still be at Google ;)
I guess I should have said NPU ;) TRIP is the internal name.
If you are able to crack individually signed packages then you should be able to gleen what each is doing
I am not sure what you mean by the "packages" here. The ape firmware is signed as a whole and that's it, the individual NNs are not signed, you can even run your own, the biggest problem with that is there's no public compiler and the instruction set is not documented so you have to do a lot of guessing (but it was done for simple demonstration with some trivial computations)

Also, unless @verygreen has been able to determine what code is doing what, we simply don't know what code is doing what
the code is actually well named so of course it's trivial to see what code is doing what.
 
TBH I don't put much faith in Douma at this point--I thought I'd mentioned this earlier? I've also seen him tell us how lidar wasn't needed, but radar was actually really important for bad weather. Then a few weeks later, because Tesla announced they were dropping radar, tell us how radar isn't needed and dropping it made total sense.

Ok I'm not acting like I am an expert, but I have worked as both a machine learning engineer and a signal processing engineer, all the way from making cloud based deep learning models to getting DSP algorithms to work in fixed point in C firmware on a ARM chip. (That is to say, more relevant experience than basically anyone in those terrible TMC AI threads or reddit/selfdrivingcars)

Whenever James talks about a topic I know about, he is on point 95% of the time. The people trying to point to his experience, publications etc... simply lack any relevant knowledge themselves, or else they would directly try to break down his discussions instead of hand waving. His abiility to assess areas he may not have experience (robotics) is impressive.

Has the “because” part of this statement actually been confirmed or is that conjecture? I don’t know much about what they’re doing but maybe they’re either testing multiple versions or they haven’t bothered to optimize the compute budget since it’s not currently necessary due to human supervision providing the necessary redundancy.

My understanding is that neural nets can be bloated at first and then unnecessary computations leaned out after the architecture has been proven to work well. If so, Tesla may simply be exploiting the fact that they have the extra compute budget instead of wasting time on premature optimization. This might be what James Douma is trying to say.

Yes. We have to understand that neural net compression is an entire field of study. Tesla acqui-hired a company with this expertise a few years ago. You train a neural net with the primary objective to get the best accuracy possible. Then you spend a lot of time whittling it down to a much smaller size w/o much loss in accuracy. That is not a simple task. For instance you may try to covert layers of 32 bit floats to 16 or 8 bit ints. You may try to remove nodes with very small weights. And on and on. To get a sense of magnitudes I looked up a NN compression review paper - I think the expectation would be to get compression to 1/10 or even 1/100th of the original size.

But imagine all of the modules of Tesla's FSD stack and how often they are improved on. You aren't wasting too much time on compression if you have the compute space. My guess is they are compressing some of them enough to even fit on HW3.0, but they aren't fully optimized for sure. There's no point at this stage.

Bigger picture, that doesn't tell us either way whether HW3.0 is sufficient for FSD. And personally I don't find it that interesting because they are already moving on to HW4. Will that be enough? If HW4 increases compute 4x, that obviously adds some leeway from current levels but we can't say 4x as much. Doubling the resolution of the images will 4x the compute of at least some of the layers of some of the networks processing the images, but it probably won't propogate throughout the entire stack. Maybe it leaves 2x left, which is still signficant.

The biggest issue with compute is the issue I've highlighted since I've chatted on TMC about it 4-5 years ago - what compute will be sufficient for their camera based perception algorithms? These things are essentially feeding in 4D info (3D cameras + time) to generate things like depth map / occupancy network. It is extremely impressive what they have been able to do with the compute they have. However the voxel resolution is not detailed enough for FSD. Simply too coarse. Resolution of the voxels probably has to increase 2x-4x in each dimension. How much more compute will that take?

Keep in mind there are going to be more advances in computer vision deep learning techniques that will allow better performance for the same model size. Waymo employees and others thought it impossible to do any real depth mapping from cameras because the processing power required (to mimic the human visual cortex) would be insane. But they didn't know deep CNNs and then transformers were coming. So I do expect Tesla to improve depth estimate resolution on constrained HW.

The other areas that I probably disagree with TMCers - I fully expect Tesla to eventually incorporate both some sort of improved offline map information and added sensors. These are common sense guesses based on experience in bayesian sensor fusion. Tesla doesn't need HD maps, but having a prior about which lane to get into before encountering an intersection makes for safer driving (just like humans do, we don't drive every intersection like its first time). And adding radar or lidar will improve object detection / depth mapping especially in inclement weather. Maybe not necessary for FSD with decent accuracy, but it will improve accuracy. No need for it to block rapid development of the perception stack right now. But eventually.
 
Last edited:
Bigger picture, that doesn't tell us either way whether HW3.0 is sufficient for FSD. And personally I don't find it that interesting because they are already moving on to HW4. Will that be enough?
Agree with the other parts of your post and this part I take as an affront to all that is good in the world.

Not really, but it is fun to think about this objective part about NNs: Today's necessary compute to solve a NN problem, becomes a magnitude smaller in some smallish timeframe.

So, in a sense, I agree with that part of your supposition as well. When, not if, Tesla marches to 5 to 7 9's, they'll do it with some amount of compute. Overtime, that amount will be widdled down to a fraction as they find better ways to accomplish the same or better with less compute. That is the way!

Similar to DeepMind solving Go, but then doing it with Alpha Go.
 
  • Like
Reactions: RabidYak
Then it would be observable/understandable what NNs are running on each node. Is this what you are referring to?
don't even need any acces to the compute to see this, there's a clearly defined scheduling plan from which you'll learn what task (well named stuff) is sheduled to run where, how often and so on. Since NNs are deterministic in runtime, they also include the NPU runtime (in cycles) for every NN so you can add them up if you want.
 
don't even need any acces to the compute to see this, there's a clearly defined scheduling plan from which you'll learn what task (well named stuff) is sheduled to run where, how often and so on. Since NNs are deterministic in runtime, they also include the NPU runtime (in cycles) for every NN so you can add them up if you want.
Can you delineate them?
 
Can you delineate them?
the NNs, the scheduling/execution plans? What are your next steps (note there's over a hundred NNs last I checked)

The (current list of) cpu tasks are: "ACTIVE_SAFETY","ARBITER","BACK_UP_CAMERA","BEV_GRAPH","BRIDGE","CAMERA","CAN_RX","CAN_TX","CITY_STREETS_BEHAVIOR","CLIP_ARCHIVE","CLIP_LOGGER_API","CLIP_LOGGER_HELPER","CLIP_LOGGER_REMOTE_REQUEST","CLIP_LOGGER","COMPRESSOR","CONTROLLER","DASH_CAM","DETERMINATOR","DRIVABLE_SPACE_TRACKER","DRIVER_MONITOR","FACTORY_CAMERA_CALIBRATION","FIELD_CALIBRATION","FLEET_CONFIG","GEO_REGION","GPS","HTTP_SERVER","HW_MONITOR","IMU","INERTIATOR","LANE_CHANGE_BEHAVIOR","LEGACY_PERCEPTION","LOCALIZER","MAP_MANAGER_MCU_COMMUNICATIONS","MAP_MANAGER","METRICS_API","METRICS_DEV","METRICS","MISSION_PLANNER","PARKING_BEHAVIOR","PERCEPTION","PERFORMANCE_COUNTER_MONITOR","POSITIONING_ENGINE","PT_TRACKER","RADAR","RAIN_LIGHT_SENSING","REPLAY","ROAD_ESTIMATOR","RTDV_COMPRESSOR","SCHEDULER","SLAM_BA","SLAM","SNAPSHOT","SNAPSHOT_TRIGGER_CLIENT","STATE_MACHINE","STAY_IN_LANE_BEHAVIOR","TELEMETRY_PACKAGER","TELEMETRY","TEMPERATURE_MONITOR","TEXT_LOG","UBX_LOG","UI_SERVER","ULTRASONICS","VISION","VISION_VISUALIZER","WATCHDOG","X1_CLIENT","DV_INSPECTOR"

Not all tasks run in all modes and some tasks are not even present in prod firmwares (like replay). vision runs the NNs with it's own schedules for them, that depend on modes. Some of these run on A node, some on B some on both (e.g. clip logger is on both because that's the task that compresses video for snapshots)
The tasks are typically attached to pipelines like DYNAMIC_WORLD, STATIC_WORLD,LEGACY_HIGHWAY, ... the running pipelines are selected based on operational mode.

The camera selection modes are: "MAIN_NARROW_FISHEYE", "CONTEXTUAL_LEFT_LANE_CHANGE", "CONTEXTUAL_RIGHT_LANE_CHANGE", "CONTEXTUAL_HIGH_CURVATURE", "LOW_POWER_MODE", "ALL_CAMERA_ROTATION", "CONTEXTUAL_LOW_SPEED", "CONTEXTUAL_OFF_HIGHWAY", "CONTEXTUAL_SUMMON", "ALL_CAMERAS", "CITY_STREETS". These also depend on current operational mode. There used to be a lot fewed of such modes, but as compute was more and more strained they had to stop running all the stuff all the time and only run stuff they absolutely needed (I first noticed immadiate impact of this in 2020 when they stopped doing drivable space detection on highways above certain speed. "optimization!" you might say, "does know if it's safe to jerk right/left" I would say.)

For every selection mode there is a list of actual NNs to get executed (out of the 111 present in the release I am looking at) with examples like AUTOWIPER, CITY_EDGES_MAIN, HYDRANET_STOPS_FISHEYE, ...

every NN gets additional flags like what camera it gets fed and how many frames (as in a divisor so you can get every frame, every second frame, every 3rd frame and so on). And that's just scraping the top of it.
 
FYI, I'm going through a DIY solar project right now (after exhaustively talking with vendors, PG&E's stupidity, and Tesla straight up cancelling my solar install) and having your car as a backup energy battery IS A BIG DEAL. PERIOD.

Ford CEO surprised by F-150 Lightning backup power popularity. It's a game changer

I'm trying to buy LFP battery systems in China via Alibaba right now, and sizing them / determining future potential use is a pain.
Oh, do continue...
 
One thing I just noticed and not sure if it has been widely reported but the Tesla NACS standard goes to 1000v and 1MW. There appear to be some minor connector mechanical changes going from 500v to 1000v but it states that both connectors are compatible.


In the past I aways assumed the Semi would have a different connector. I think some of the reporting on the MW chargers at Pepsi or Reno show a connector compatible with one version of the MCS standard.

Based on the news the other night on CT and Semi both having 1000v architecture it looks like to me they are planning to stick with the same charging connector for everything in the North American market? I can't imagine they would change the connector on the CT as it would cause a lot of disruption with the superchargers.
 
One thing I just noticed and not sure if it has been widely reported but the Tesla NACS standard goes to 1000v and 1MW. There appear to be some minor connector mechanical changes going from 500v to 1000v but it states that both connectors are compatible.


In the past I aways assumed the Semi would have a different connector. I think some of the reporting on the MW chargers at Pepsi or Reno show a connector compatible with one version of the MCS standard.

Based on the news the other night on CT and Semi both having 1000v architecture it looks like to me they are planning to stick with the same charging connector for everything in the North American market? I can't imagine they would change the connector on the CT as it would cause a lot of disruption with the superchargers.
Yeah. NACS makes sense for CT, but 1MW is not enough to hit the Tesla 70% SOC in 30 minute spec for Semi unless the pack was only 725 kWh and it charged at full power the whole time. Even then, there is no room to grow.
 
Yeah. NACS makes sense for CT, but 1MW is not enough to hit the Tesla 70% SOC in 30 minute spec for Semi unless the pack was only 725 kWh and it charged at full power the whole time. Even then, there is no room to grow.
Ok this makes sense. So likely MCS for Semi as I see it goes to 3MW, NACS for CT as 1 MW should be plenty of overhead but I doubt they will get anywhere near actual 1 MW charging.

Is CCS really limited to 350KW? This is going to give Tesla a huge advantage with CT. This maybe the trigger that gets an OEM looking at using NACS.
 
Ok this makes sense. So likely MCS for Semi as I see it goes to 3MW, NACS for CT as 1 MW should be plenty of overhead but I doubt they will get anywhere near actual 1 MW charging.

Is CCS really limited to 350KW? This is going to give Tesla a huge advantage with CT. This maybe the trigger that gets an OEM looking at using NACS.
Not too sure on CCS official specs, seems to be 350kW. However, at least one company has a compatible liquid cooled connector good for 500kW (500A, 1000V).
 
Is CCS really limited to 350KW? This is going to give Tesla a huge advantage with CT. This maybe the trigger that gets an OEM looking at using NACS.
CCS connector is 1000V, 500A. Vendors can go beyond this, e.g. I've seen reports of Model 3/Y (briefly, before taper kicks in) pulling >500A in Europe. And charging stations can provide less power, of course. EA stations limited to 350 kW nameplate, but some can do a bit more. Here's a Hummer EV charging at 363 kW.