Memory Chip Failures

lolder · May 28, 2019

Can Tesla fix this with a software update?

Tozz · May 28, 2019

lolder said:
Can Tesla fix this with a software update?

The broken eMMC cannot be fixed. Once it's dead it's dead. It needs to be replaced.
Tesla can however disable the logging (or tune it down) with a software update. But they choose not to do that for reasons unknown to me.

Msjulie · May 28, 2019

It is odd Tesla has not limited or /dev/null the kernel logging since it appears they do nothing with it...

mspohr · May 28, 2019

Msjulie said:
It is odd Tesla has not limited or /dev/null the kernel logging since it appears they do nothing with it...

Phil thinks it's not necessary and if so, I don't know why they keep it on. But we don't really know. It could be used.

wdolson · May 28, 2019

RayK said:
I'd have to say that, generally, the overall flash endurance limit has not been improving but instead it is getting worse. I've been a memory test engineer since 1979. I've had some small experience with NAND flash. Over the years of semiconductor process development the amount of storage available in each individual device has essentially expanded by three orders of magnitude (MB -> GB). Some of this is due to the memory cells physically getting smaller and some of it is due to multi-level cell (MLC) storage techniques. That results in the reduction of the number of times (cycles) you can erase and write each cell. There is a design which offers longer endurance than others and that's single level cell (SLC). Most of the flash memories today use one of the multi-level designs as people are usually more interested in being able to store more data in their device or having the lowest cost-per-bit, rather than being concerned about the number of times they can re-write it. SLC designs can typically reach 100,000 cycles while some of the newer MLC designs are rated in hundreds of cycles. Some flash memory manufacturers offer "high endurance" or "dashcam" memories which either use SLC or has a built in memory controller that uses a better wear leveling algorithm or has a more robust error correction system.

Are you doing failure analysis or counterfeit testing? I write software for a curve tracer system used for both.

When I first did my first flash memory design in the late 80s, the advertised mean time between failures for flash memory was about 1000 writes. They can take a lot more than that today. I think an individual cell is usually rated for at least 100,000 writes.

I tried to find the stress test of SSD, but couldn't in a quick search. They found the Intel drives had the best error correction and they failed most gracefully when the drive limit approached. Though all drives gave plenty of warning before completely failing.

mspohr said:
Phil thinks it's not necessary and if so, I don't know why they keep it on. But we don't really know. It could be used.

I could see wanting it turned on in the company owned test cars, for some beta testers, and possibly if there is a reported customer problem they can't figure out, but that could easily be done with a firmware switch that turns the logging on and off. Most of the time it should be off.

I have been talking to the Portland SC for the last year plus about the mirror unfold problem (the mirrors unfold on their own if the car is parked in the garage and not driven for 48 hours). I did mention this in my last e-mail. I didn't get any comment back, but hopefully it will be turned off in an upcoming update. It shouldn't take the programmers long to put in a switch. I do that sort of thing for debugging purposes all the time and it rarely takes more than 1/2 hour.

Msjulie · May 28, 2019

@wdolson I was under the impression that car-software logged in a completely different place than the kernel logging.. agree, sending kernel log to anywhere else or shutting it down isn't rocket science.

wdolson · May 28, 2019

Msjulie said:
@wdolson I was under the impression that car-software logged in a completely different place than the kernel logging.. agree, sending kernel log to anywhere else or shutting it down isn't rocket science.

From what I gleaned from the video linked above, there are two logs stored in two places. The logs that service are likely to check go to a removable memory card. The OS logs which are of limited or no use for anyone trying to diagnose issues with the car's operation are written to the soldered in flash memory. Phil said he turns logging off routinely on MCUs he works on and it causes no issues.

cwerdna · May 28, 2019

wdolson said:
Are you doing failure analysis or counterfeit testing? I write software for a curve tracer system used for both.

When I first did my first flash memory design in the late 80s, the advertised mean time between failures for flash memory was about 1000 writes. They can take a lot more than that today. I think an individual cell is usually rated for at least 100,000 writes.

The high figures might be true for SLC but gets worse with MLC, TLC and QLC.

See these, for example:
Samsung SSD 840: Testing the Endurance of TLC NAND
What is SSD write cycle? - Definition from WhatIs.com (not sure how reputable this is)
The Crucial P1 1TB SSD Review: The Other Consumer QLC SSD (QLC)
The Samsung 860 QVO (1TB, 4TB) SSD Review: First Consumer SATA QLC (QLC)

wdolson said:
I tried to find the stress test of SSD, but couldn't in a quick search. They found the Intel drives had the best error correction and they failed most gracefully when the drive limit approached. Though all drives gave plenty of warning before completely failing.

There have been several torture tests like The SSD Endurance Experiment: They're all dead.

RayK · May 28, 2019

wdolson said:
Are you doing failure analysis or counterfeit testing?

Neither. At the time I was writing test programs for NAND flash (mid '90s IIRC) it was to do basic functionality checks. I never got too involved with flash; my focus was on SRAM and DRAM. As far as using curve tracers to do failure analysis or counterfeiting checks that was what the QA department would do.

wdolson · May 29, 2019

cwerdna said:
The high figures might be true for SLC but gets worse with MLC, TLC and QLC.

See these, for example:
Samsung SSD 840: Testing the Endurance of TLC NAND
What is SSD write cycle? - Definition from WhatIs.com (not sure how reputable this is)
The Crucial P1 1TB SSD Review: The Other Consumer QLC SSD (QLC)
The Samsung 860 QVO (1TB, 4TB) SSD Review: First Consumer SATA QLC (QLC)

There have been several torture tests like The SSD Endurance Experiment: They're all dead.

That last one was the article I was searching for.

RayK said:
Neither. At the time I was writing test programs for NAND flash (mid '90s IIRC) it was to do basic functionality checks. I never got too involved with flash; my focus was on SRAM and DRAM. As far as using curve tracers to do failure analysis or counterfeiting checks that was what the QA department would do.

The stuff I work on is used by the QA people.

Search

Memory Chip Failures

lolder

Active Member

Tozz

Active Member

Msjulie

Active Member

mspohr

Well-Known Member

wdolson

Well-Known Member

Msjulie

Active Member

wdolson

Well-Known Member

cwerdna

Well-Known Member

RayK

Active Member

wdolson

Well-Known Member

Similar threads