Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Version 2022.20.18 toxic?

This site may earn commission on affiliate links.
Late in the afternoon of 10Oct, I got a notification via my smartphone that an update was available for my Jun2020 MY LR, so I immediately initiated its installation. I didn't drive the car until the next afternoon, when I noticed that none of its cameras seemed to work anymore; all camera view screens were just rendered as solid black. And since the car thought it was in pitch-black surroundings, it turned on its high-beam headlights (at about two o'clock on a sunny afternoon). Also, its GPS position-sensing was inoperative; it continued to display the "own location" triangle as if I were still parked in my garage, pretty nearly all the way to the twenty-miles-distant destination I needed to reach. Very fortuitously, the cameras and GPS started working again in time to be of use for the last few blocks to my unfamiliar destination, but all active driver-assist features remained disabled, along with a "Software update required/Schedule service appointment" notifier. That evening, I set up an appointment for servicing, and waited to see if whoever triages the service petitions would contact me with advice. None has appeared yet, but I did get another notification for an available software update: the very same 2022.20.18 one as from a couple days ago. With considerable relief that Tesla was being so quick with a correction, I initiated the installation of the freshly-received update. And I have to say, Tesla does seem to have fixed the problem of those pesky camera views coming back on. This time around, the camera views seem to be staying black. And the GPS has gone back to a rock-steady depiction of my garage's location.
 
Followup: I brought the car into the service center as per appointment, expecting that the fix would be very quick, since the symptoms seemed so well correlated with firmware changes. Surprisingly, though, the service desk clerk said that remote diagnostics had been run, and had indicated that the car's computer needed replacing (!). Picked the car up the next day, and all the systems seem functional (though I haven't tried FSD yet). Can't complain about the speed of the repair, or the fully-covered-by-warranty cost, but the fact that the car's infotainment/nav computer failed after only two years of operation is very disturbing. I'm used to (and expect) computer equipment running error-free for decades. Is this another case of accelerated wearout of FLASH memory by excessive logfile traffic or something?
 
Followup: I brought the car into the service center as per appointment, expecting that the fix would be very quick, since the symptoms seemed so well correlated with firmware changes. Surprisingly, though, the service desk clerk said that remote diagnostics had been run, and had indicated that the car's computer needed replacing (!). Picked the car up the next day, and all the systems seem functional (though I haven't tried FSD yet). Can't complain about the speed of the repair, or the fully-covered-by-warranty cost, but the fact that the car's infotainment/nav computer failed after only two years of operation is very disturbing. I'm used to (and expect) computer equipment running error-free for decades. Is this another case of accelerated wearout of FLASH memory by excessive logfile traffic or something?
Ahem. Let me introduce you to the concept of FIT rate. It stands for Failures in 10^9 hours. A single resistor is considered to a FIT rate of 1; so, on average, you’d expect a single resistor to last 10^9 hours.

But, across 100,000 systems, each with 1,000 resistors, you’d expect 10^9/(10^3 * 10^5) = one failure every 10 hours!

Integrated circuits have individual FITs around 10 for the simple parts and up to 50 or more, depending on how many pins, mainly, and how hot one runs them. A typical complex board has a FIT, with all the parts including Warm Power Modules, around 1500 to 4500. This includes things like laptops.

Still, that’s not so bad: 1e9/4500 is 222,222 hours, or one failure on average every 24 years. But Tesla is close to making a million cars or more a year. That implies a failure rate, across the world, of something like one of these computers per hour.

In the telecom world the FITs and the methods of calculating them are standardized. Manufacturers seeking to sell into these markets are pretty much required to publish these numbers, partly so buyers can do oranges & oranges comparisons between very different designs in big systems, and partly to drive how many spare circuit boards to keep in stock locally and at depots that serve an area. For Tesla, these kinds of calculations drive, for example, how many spare parts of what are on hand at local SCs.

Fun fact: two packs in a redundant 1+1 protection scheme have an overall FIT of FIT_of_a_pack^2/(1e9 * 1e9), which is tiny.

Point is: doesn’t mean that it’s an unreliable part that got replaced (although that’s certainly possible): just that your car had the I Lost ticket in the lottery drawing.

In fact, exactly like pulling a new lightbulb out of a case of them, plugging it in, and watching it go, Poof!
 
Excellent explanation of failure rate calculations, Tronguy.
As far as life expectancy, most electronic components have a time until failure probability that when plotted looks like the profile of a bathtub, high at time zero, steeply declining to nearly a failure rate of zero, then unchanging for a long time, then steeply rising back to a high probability of failure as it nears end-of-life. {Wikipedia has a nice page on it. Bathtub Curve }
 
Excellent explanation of failure rate calculations, Tronguy.
As far as life expectancy, most electronic components have a time until failure probability that when plotted looks like the profile of a bathtub, high at time zero, steeply declining to nearly a failure rate of zero, then unchanging for a long time, then steeply rising back to a high probability of failure as it nears end-of-life. {Wikipedia has a nice page on it. Bathtub Curve }
Yep, fully aware of all that.

Early failures are sometimes termed, "infant mortality". Manufactures often try to flush these out by baking the equipment. There's an equation for reaction rates called the, "Arrhenius Equation"; the higher the temperature, the faster the reaction rate, and it's an exponential.

Silicon turns out to be an Interesting problem with failures. Early failures are often attributed to soldering issues; a poorly soldered joint may work well enough to get past early testing, but corrosion (because there's Oxygen in the air, natch) catches up eventually with joints that aren't soldered down right, leading to early failures.

Wear-out is a bit more interesting with silicon. First off, there's transistors that were built in the 50's as an experiment and were started running back then. They're still running in their university labs. Wear-out? What wear-out?

On the other hand: If one looks at the dimensions of the (typically) aluminum conductors on high-faluting dense integrated circuits, the traces follow the surface of the silicon and glass that makes up the IC. The silicon and glass are etched as part of the process of making an IC and, microscopically, go up and down. Like cliffs. In fact, some 100 nm-wide trace can go over an internal edge on the silicon, and it's kind of like watching a sidewalk hanging a sharp drop down a six-story building!

The problem arises with, at the edges, the aluminum gets really thin. Like, several atoms thick. Electrons flying along as a gas in this material ram into the individual aluminum atoms and can, actually, shift them. Typically, away from the edge. This is called metal migration 😁 . Migrate enough Al atoms, the conductor gets really small in cross-section, the resistance goes up, there's more heat, the atoms move around even more, and, ta-da! One gets an open. And a non-functioning chip.

Worst thing about the above is that it's dependent upon manufacturing variations. Etch a bit more, etch a bit less, registration of the photo resist isn't as precise as one wants, and so on: Two dies on the same wafer, one might work, one might not, and it might take hours or years to find out. And now you know why process engineers in silicon foundries earn their princely salaries.

Once one gets away from the true infant mortality problems (bad solder joints and the like) and before the (years, or decades, or longer) wear-out mechanisms, things like metal migration show up in a big population of components randomly over time; the usual probability distribution function assumed is Poisson.

Lots of good, clean, engineering fun.
 
  • Like
Reactions: legendsk