Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Tesla autopilot HW3

This site may earn commission on affiliate links.

Very interesting video indeed.
This guy says that HW3 actually performs better on the exact same route. He says it feels like as in a game, where HW3 is "max settings" and HW2 is "medium settings" when it comes to quality.

I think Elon has a white lie when it performs worse / same as HW2. It has to at least run in a better rate...

Wow that dude has a homicidal death wish or something?!? Using AP in its current version on a two lane road without a median with Semi’s zooming by in opposing traffic and several construction zones with workers present.

WTF?!?
 
Wow that dude has a homicidal death wish or something?!? Using AP in its current version on a two lane road without a median with Semi’s zooming by in opposing traffic and several construction zones with workers present.

WTF?!?

Haha, he's crazy taking, taking one for the team :rolleyes:
The craziest thing is that UK has roads like that. We're talking 50 mile speed limits in those areas... :eek:
 
  • Like
Reactions: Octo
I think Elon has a white lie when it performs worse / same as HW2. It has to at least run in a better rate...

It's also possible that the statement was literally true on Firmware 2019.7.105 when the first AP3 cars were delivered around the time of Autonomy Day - but ceased to be true as the firmware developments continued through the last few months...
 
  • Like
Reactions: MorrisonHiker
So preferably starting before FSD is rolled out which is end of this year. Boy, they are in a rush then :)

That would, unfortunately, be normal for Tesla. And once they start, the parts will probably go quickly into a 10 month backorder. :(

Doing retrofits costs money and requires scarce resources at Service Centers, even if the H/W 2.5 -> 3.0 retrofit only takes one labor hour. They'll still have to drop in new firmware, so it may need to be in a Service Center taking space for half a day.

And then there's the H/W 2 -> 3 retrofit, which (presuming possible) is going to be several labor hours, maybe a day. Fortunately only 5K to 15K S & X have paid for FSD and therefore deserve to get updated, but that still 5K to 15K of (say) 6 labor hours each.
 
Wow that dude has a homicidal death wish or something?!? Using AP in its current version on a two lane road without a median with Semi’s zooming by in opposing traffic and several construction zones with workers present.

WTF?!?
Yes, have to wonder whats up with Britain and homicidal death wish ;)

I've may be about 1 mile of driving that would resemble that. I won't be able to do that for 30 minutes !
 
Doing retrofits costs money and requires scarce resources at Service Centers, even if the H/W 2.5 -> 3.0 retrofit only takes one labor hour. They'll still have to drop in new firmware, so it may need to be in a Service Center taking space for half a day.
I think they can do this using mobile tech. They will setup a few mobile techs in each city who do just that (could be less experienced too) every day for a few months.

My impression is they are waiting to stock up on boards (may be a new revision of the board too ?) before mass updates.
 
I think they can do this using mobile tech. They will setup a few mobile techs in each city who do just that (could be less experienced too) every day for a few months.

My impression is they are waiting to stock up on boards (may be a new revision of the board too ?) before mass updates.

If they planned a little they would do north first with mobile techs before the snow sets in :p
 
Whew, I need a drink to soothe my nerves after watching that freaky video!!! Anxiety attack LOL!

I think there is a Tesla Diplomat that has several water towers full of Kool-Aide he would be glad to share with you :D:cool:


upload_2019-8-13_19-47-8.jpeg
 

Attachments

  • upload_2019-8-13_19-46-59.jpeg
    upload_2019-8-13_19-46-59.jpeg
    160.6 KB · Views: 38
Makes me wonder if they start rolling out the retrofits super gradually during otherwise routine service visits. "We fixed your sunroof, the squeaky suspension, the door handle not extending....andohyeaputinHW3foryou... Also rotated your tires!

They could probably get a couple hundred done before anyone noticed and the flood of calls start. I can't imagine there's more then 30 thousand or so waiting on the retrofit. Honestly probably only a few thousand diehards like us who are on top of the FSD news, the rest won't even know they are doing them until Tesla sends out an email. They could probably get the most vocal people done first and then let the others happen organically during service visits.
 
Makes me wonder if they start rolling out the retrofits super gradually during otherwise routine service visits. "We fixed your sunroof, the squeaky suspension, the door handle not extending....andohyeaputinHW3foryou... Also rotated your tires!

They could probably get a couple hundred done before anyone noticed and the flood of calls start. I can't imagine there's more then 30 thousand or so waiting on the retrofit. Honestly probably only a few thousand diehards like us who are on top of the FSD news, the rest won't even know they are doing them until Tesla sends out an email. They could probably get the most vocal people done first and then let the others happen organically during service visits.
Of course they could, and it would likely be the easiest and cheapest way to do things.

But they'll keep improving the hardware, so people who get it later will get "better" stuff. People who unknowingly got it early will whine and complain endlessly that Tesla gave them something inferior for no reason. You know that's what will happen.
 
Of course they could, and it would likely be the easiest and cheapest way to do things.

But they'll keep improving the hardware, so people who get it later will get "better" stuff. People who unknowingly got it early will whine and complain endlessly that Tesla gave them something inferior for no reason. You know that's what will happen.

They probably won't make any real changes (other than possibly cost reductions) until HW4. So do you want self-driving hardware in late 2019 / early 2020, or do you want self-driving hardware in 2024 / 2025?
 
Of course they could, and it would likely be the easiest and cheapest way to do things.

But they'll keep improving the hardware, so people who get it later will get "better" stuff. People who unknowingly got it early will whine and complain endlessly that Tesla gave them something inferior for no reason. You know that's what will happen.

If they thought they'd need any more hardware or sensors to achieve FSD, we would have seen it in April.

Instead, they introduced the FSD computer they've been telling us they'd need to reach FSD for nearly three years now - with as far as we know exactly the same sensor suite they've been installing for two years.

That's not coincidence. I think it means they truly believe those sensors will be enough. They could be wrong, but I think it's clear that after three years of modelling and studying and experimenting they are confident it's enough. I don't know enough to tell them they are wrong.

I don't think there's any chance that new sensors will show up before the HW4 computer that's a couple years away.
 
I think it means they truly believe those sensors will be enough.
Who said anything about sensors? I said "better". You know, some slightly faster chip on the board from a different supplier. Some other small change to reduce energy usage. Tesla keeps making things better in small ways. People will whine because they got a Rev. 3A board rather than a Rev. 3C. And pity the poor bastards who got a 2J in May.

Yes, I'm just speculating. None of it matters. Tesla will make the software switch at some point. We'll all know about it because everybody out there with HW3 will see a step change. And then everybody who didn't get their HW3 upgrade will start demanding it yesterday.
 
For the past few months Tesla has been slowly sharing details of its upcoming “Hardware 3” (HW3) changes soon to be introduced into its S/X/3 lineup. Tesla has stated that cars will begin to be built with the new computer sometime in the first half of 2019, and they have said that this is a simple computer upgrade, with all vehicle sensors (radar, ultrasonics, cameras) staying the same.

Today we have some information about what HW3 actually will (and won’t) be:

What do we know about the Tesla’s upcoming HW3? We actually know quite a bit now thanks to Tesla’s latest firmware. The codename of the new HW3 computer is “TURBO”.

Hardware:

We believe the new hardware is based on Samsung Exynos 7xxx SoC, based on the existence of ARM A72 cores (this would not be a super new SoC, as the Exynos SoC is about an Oct 2015 vintage). HW3 CPU cores are clocked at 1.6GHz, with a MALI GPU at 250MHz and memory speed 533MHz.

HW3 architecture is similar to HW2.5 in that there are two separate compute nodes (called “sides”): the “A” side that does all the work and the “B” side that currently does not do anything.

Also, it appears there are some devices attached to this SoC. Obviously, there is some emmc storage, but more importantly there’s a Tesla PCI-Ex device named “TRIP” that works as the NN accelerator. The name might be an acronym for “Tensor <something> Inference Processor”. In fact, there are at least two such “TRIP” devices, and maybe possibly two per “side”.

As of mid-December, this early firmware’s state of things were in relative early bring-up. No actual autopilot functionality appears included yet, with most of the code just copied over from existing HW2.5 infrastructure. So far all the cameras seem to be the same.

It is running Linux kernel 4.14 outside of the usual BuildRoot 2 environment.

In reviewing the firmware, we find descriptions of quite a few HW3 board revisions already (8 of them actually) and hardware for model 3 and S/X are separate versions too (understandably).

The “TRIP” device obviously is the most interesting one. A special firmware that encompasses binary NN (neural net) data is loaded there and then eventually queried by the car vision code. The device runs at 400MHz. Both “TRIP” devices currently load the same NNs, but possibly only a subset is executed on each?

With the Exynos SoC being a 2015 vintage and in consideration of comments made by Peter Bannon on the Q2 2018 earnings call, (he said “three years ago when I joined Tesla we did a survey of all of the solutions” = 2nd half of 2015), does this look like the current HW2/HW2.5 NVIDIA autopilot units were always viewed as a stop-gap and hence the lack of perceived computation power everybody was accusing Tesla of at the time of AP2 release was not viewed as important by Tesla?

SOFTWARE:

In reviewing the binaries in this new firmware, @DamianXVI was able to work out a pretty good idea of what the “TRIP” coprocessor does on HW3 (he has an outstanding ability to look at and interpret binary data!):

The “TRIP” software seems to be a straight list of instructions aligned to 32 bytes (256 bits). Programs operate on two types of memory, one for input/output and one for working memory. The former is likely system DRAM and the latter internal SRAM.
Memory operations include data loading, weight loading, and writing output. Program operations are pipelined with data loads and computations interleaved and weight fetching happening well upstream from the instructions that actually use those weights. Weights seem to be compressed from the observation that they get copied to an internal region that is substantially larger than the source region with decompression/unpacking happening as part of the weight loading operation. Intermediate results are kept in working memory with only final results being output to shared memory.
Weights are loaded from shared memory into working memory and maintained in a reserved slot which is referenced by number in processing instructions. Individual processing instructions reference input, output, and weights in working memory. Some processing instructions do not reference weights and these seem to be pooling operations.

@DamianXVI created graphical visualizations of this data flow for some of the networks observed in the binaries. This is not a visualization of the network architecture, it is a visualization of instructions and their data dependencies. In these visualizations, green boxes are data load/store. White boxes is weights load. Blue are computation instructions with weights, red and orange are computation blocks without weights. Black links show output / input overlapping between associated processing operations. Blue links connect associated weight data. These visualizations are representative of a rough and cursory understanding of the data flow. Obviously, it is likely many links are missing and some might be wrong. Regardless, you can see the complexity being introduced with these networks.

View attachment 366018 View attachment 366019 View attachment 366020 View attachment 366021

What is very interesting is that @DamianXVI concluded that these visualizations look like GoogleNet. At the outset, he did not work with the intention to see if Tesla’s architecture was similar to GoogleNet; he hadn’t even seen GoogleNet before, but as he assembled the visualization the similarities appeared.

After understanding the new hardware and NN architecture a bit, we then asked @jimmy_d to comment and here’s what he has to say:

“Damian’s analysis describes exactly what you’d want in an NN processor. A small number of operations that distill the essence of processing a neural network: load input from shared memory/ load weights from shared memory / process a layer and save results to on-chip memory / process the next layer … / write the output to shared memory. It does the maximum amount of work in hardware but leaves enough flexibility to efficiently execute any kind of neural network.

And thanks Damian’s heroic file format analysis I was able to take a look at some neural network dataflow diagrams and make some estimates of what the associate HW3 networks are doing. Unfortunately, I didn’t find anything to get excited about. The networks I looked at are probably a HW3 compatible port of the networks that are currently running on HW2.

What I see is a set of networks that are somewhat refined compared to earlier versions, but basically the same inputs and outputs and small enough that they can run on the GPU in HW2. So still no further sightings of “AKNET_V9”: the unified, multi frame, camera agnostic architecture that I got a glimpse of last year. Karpathy mentioned on the previous earnings call that Tesla already had bigger networks with better performance that require HW3 to run. What I’ve seen so far in this new HW3 firmware is not those networks.

What we know about the HW3 NN processor right now is pretty limited. Apparently there are two “TRIP” units which seem to be organized as big matrix multipliers with integrated accumulators, nonlinear operators, and substantial integrated memory for storing layer activations. Additionally it looks like weight decompression is implemented in hardware. This is what I get from looking at the primitives in the dataflow and considering what it would take to implement them in hardware. Two big unknowns at the moment are the matrix multiplier size and the onboard memory size. That, plus the DRAM I/O bus width, would let us estimate the performance envelope. We can do a rough estimate as follows:

Damian’s analysis shows a preference for 256 byte block sizes in the load/store instructions. If the matrix multiplier input bus is that width then it suggests that the multiplier is 256xN in size. There are certain architectural advantages to being approximately square, so let’s assume 256x256 for the multiplier size and that it operates at one operation per clock at @verygreen’s identified clock rate of 400MHz. That gives us 26TMACs per second, which is 52Tops per second (a MAC is one multiply and one add which equals two operations). So one TRIP would give us 52Tops and two of them would give us 104Tops. This is assuming perfect utilization. Actual utilization is unlikely to be higher than 95% and probably closer to 75%. Still, it’s a formidable amount of processing for neural network applications. Lets go with 75% utilization, which gives us 40Tops per TRIP or 80Tops total.

As a point of reference - Google’s TPU V1, which is the one that Google uses to actually *run* neural networks (the other versions are optimized for training) is very similar to the specs I’ve outlined above. From Google’s published data on that part we can tell that the estimates above are reasonable - probably even conservative. Google’s part is 700MHz and benchmarks at 92Tops peak in actual use processing convolutional neural networks. That is the same kind of neural network used by Tesla in autopilot. One likely difference is going to be onboard memory - Google’s TPU has 27MB but Tesla would likely want a lot more than that because they want to run much heavier layers than the ones that the TPU was optimized for. I’d guess they need at least 75MB to run AKNET_V9. All my estimates assume they have budgeted enough onboard SRAM to avoid having to dump intermediate results back to DRAM - which is probably a safe bet.

With that performance level, the HW3 neural nets that I see in this could be run at 1000 frames per second (all cameras simultaneously). This is massive overkill. There’s little reason to run much faster than 40fps for a driving application. The previously noted AKNET_V9 “monster” neural network requires something like 600 billion MACs to process one frame. So a single “TRIP”, using the estimated performance above, could run AKNET_V9 at 66 frames per second. This is closer to the sort of performance that would make sense and AKNET_V9 would be about the size of network one would expect to see running on the trip given the above assumptions.”
Great writeup. Would you be wiling to do an executive summary or a non-technical abstract?