Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Memory Chip Failures

This site may earn commission on affiliate links.
I'd have to say that, generally, the overall flash endurance limit has not been improving but instead it is getting worse. I've been a memory test engineer since 1979. I've had some small experience with NAND flash. Over the years of semiconductor process development the amount of storage available in each individual device has essentially expanded by three orders of magnitude (MB -> GB). Some of this is due to the memory cells physically getting smaller and some of it is due to multi-level cell (MLC) storage techniques. That results in the reduction of the number of times (cycles) you can erase and write each cell. There is a design which offers longer endurance than others and that's single level cell (SLC). Most of the flash memories today use one of the multi-level designs as people are usually more interested in being able to store more data in their device or having the lowest cost-per-bit, rather than being concerned about the number of times they can re-write it. SLC designs can typically reach 100,000 cycles while some of the newer MLC designs are rated in hundreds of cycles. Some flash memory manufacturers offer "high endurance" or "dashcam" memories which either use SLC or has a built in memory controller that uses a better wear leveling algorithm or has a more robust error correction system.

Are you doing failure analysis or counterfeit testing? I write software for a curve tracer system used for both.

When I first did my first flash memory design in the late 80s, the advertised mean time between failures for flash memory was about 1000 writes. They can take a lot more than that today. I think an individual cell is usually rated for at least 100,000 writes.

I tried to find the stress test of SSD, but couldn't in a quick search. They found the Intel drives had the best error correction and they failed most gracefully when the drive limit approached. Though all drives gave plenty of warning before completely failing.

Phil thinks it's not necessary and if so, I don't know why they keep it on. But we don't really know. It could be used.

I could see wanting it turned on in the company owned test cars, for some beta testers, and possibly if there is a reported customer problem they can't figure out, but that could easily be done with a firmware switch that turns the logging on and off. Most of the time it should be off.

I have been talking to the Portland SC for the last year plus about the mirror unfold problem (the mirrors unfold on their own if the car is parked in the garage and not driven for 48 hours). I did mention this in my last e-mail. I didn't get any comment back, but hopefully it will be turned off in an upcoming update. It shouldn't take the programmers long to put in a switch. I do that sort of thing for debugging purposes all the time and it rarely takes more than 1/2 hour.
 
@wdolson I was under the impression that car-software logged in a completely different place than the kernel logging.. agree, sending kernel log to anywhere else or shutting it down isn't rocket science.

From what I gleaned from the video linked above, there are two logs stored in two places. The logs that service are likely to check go to a removable memory card. The OS logs which are of limited or no use for anyone trying to diagnose issues with the car's operation are written to the soldered in flash memory. Phil said he turns logging off routinely on MCUs he works on and it causes no issues.
 
  • Like
Reactions: Msjulie
Are you doing failure analysis or counterfeit testing? I write software for a curve tracer system used for both.

When I first did my first flash memory design in the late 80s, the advertised mean time between failures for flash memory was about 1000 writes. They can take a lot more than that today. I think an individual cell is usually rated for at least 100,000 writes.
The high figures might be true for SLC but gets worse with MLC, TLC and QLC.

See these, for example:
Samsung SSD 840: Testing the Endurance of TLC NAND
What is SSD write cycle? - Definition from WhatIs.com (not sure how reputable this is)
The Crucial P1 1TB SSD Review: The Other Consumer QLC SSD (QLC)
The Samsung 860 QVO (1TB, 4TB) SSD Review: First Consumer SATA QLC (QLC)

I tried to find the stress test of SSD, but couldn't in a quick search. They found the Intel drives had the best error correction and they failed most gracefully when the drive limit approached. Though all drives gave plenty of warning before completely failing.
There have been several torture tests like The SSD Endurance Experiment: They're all dead.
 
Are you doing failure analysis or counterfeit testing?
Neither. At the time I was writing test programs for NAND flash (mid '90s IIRC) it was to do basic functionality checks. I never got too involved with flash; my focus was on SRAM and DRAM. As far as using curve tracers to do failure analysis or counterfeiting checks that was what the QA department would do.
 
The high figures might be true for SLC but gets worse with MLC, TLC and QLC.

See these, for example:
Samsung SSD 840: Testing the Endurance of TLC NAND
What is SSD write cycle? - Definition from WhatIs.com (not sure how reputable this is)
The Crucial P1 1TB SSD Review: The Other Consumer QLC SSD (QLC)
The Samsung 860 QVO (1TB, 4TB) SSD Review: First Consumer SATA QLC (QLC)


There have been several torture tests like The SSD Endurance Experiment: They're all dead.

That last one was the article I was searching for.

Neither. At the time I was writing test programs for NAND flash (mid '90s IIRC) it was to do basic functionality checks. I never got too involved with flash; my focus was on SRAM and DRAM. As far as using curve tracers to do failure analysis or counterfeiting checks that was what the QA department would do.

The stuff I work on is used by the QA people.