Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Tesla infotainment system upgradeable from MCU1 to MCU2

This site may earn commission on affiliate links.
How come you think you know better than tesla what the relevance of the data is? They clearly want to keep that kernel alive and well, so there is probably quite a lot of relevant stuff running in it. Even with the latest software version all kernel logging is on while the car is moving. Now it shuts down while being parked, though.

I highly doubt service centers are going to be looking into kernel logs. I suppose engineering might want to look at that on a production vehicle on a rare occasion, but they could probably do with just turning on this logging when they need it.

Furthermore, since they have a removable SD card that they could use for this logging, it stands to reason they didn't think much of this through.
 
With 2019.32.12.7 they added a way for them to dynamically adjust the logging level. As others have said, they do this when the car is sleeping/etc automatically, but presumably they can also now set more verbose logging on specific vehicles as needed.

One other important change they made is that they are now storing a historical log of the emmc health (instead of just periodically writing it to the syslog). Ironically, this log is stored on... the emmc. I’d would love to see their fleet wide data on emmc aging.. mine is about 50% “worn” already, and it has only been ~2 years since they replaced it.
 
I have no words. This is where I would normally joke about if Tesla is looking for a systems engineer but I give up. Stackoverflow offers better filesystem tweaking advice.

Actually quite typical. Many large sw development organizations are great doing algorithms, but really suck at understanding a modern OS. This lack of understanding almost always translates to humongous resource waste (doing stuff that really does not need doing) and poor performance.
 
Evidence that HW3 is no compatible with MCU1 has been accumulating over the past few weeks (example). If so, this could be a huge motivator for Tesla to make an MCU2-equivalent retrofit available to MCU1 cars, as there have been at least as many promises from Musk that anyone who purchased FSD gets an HW3 upgrade included, as there have been that MCU1 could be upgraded.
 
Evidence that HW3 is no compatible with MCU1 has been accumulating over the past few weeks (example). If so, this could be a huge motivator for Tesla to make an MCU2-equivalent retrofit available to MCU1 cars, as there have been at least as many promises from Musk that anyone who purchased FSD gets an HW3 upgrade included, as there have been that MCU1 could be upgraded.
mcu1 is perfectly compatible with hw3. It's so compatible, mcu1 firmware contains hw3 firmware similar to how it ocntains hw2.0 and hw2.5 (and how mcu2 contains hw2.0 even though this combination was never produced in practice)
 
Evidence that HW3 is no compatible with MCU1 has been accumulating over the past few weeks (example). If so, this could be a huge motivator for Tesla to make an MCU2-equivalent retrofit available to MCU1 cars, as there have been at least as many promises from Musk that anyone who purchased FSD gets an HW3 upgrade included, as there have been that MCU1 could be upgraded.

I suspect they'll fix the incompatibility long before they start doing MCU2 retrofits.
 
Anyone heard of a "squash" error? I recently asked service to check my MCU for signs that it might be failing soon after seeing some graphical glitches and having a black screen three times in the last few weeks when entering the car (reboot resolved). Tesla said that they look for "squash" errors as an early failure indicator and there were none in my case. Instead, they remoted into my car and deleted "an excessive amount of stored data" which appears to have resolved my issues.
 
Anyone heard of a "squash" error? I recently asked service to check my MCU for signs that it might be failing soon after seeing some graphical glitches and having a black screen three times in the last few weeks when entering the car (reboot resolved). Tesla said that they look for "squash" errors as an early failure indicator and there were none in my case. Instead, they remoted into my car and deleted "an excessive amount of stored data" which appears to have resolved my issues.
it's "squashfs" and yes, that's what they look for as a sign of emmc failure though emmc failure might manifest in some other ways too.
 
Anyone heard of a "squash" error? I recently asked service to check my MCU for signs that it might be failing soon after seeing some graphical glitches and having a black screen three times in the last few weeks when entering the car (reboot resolved). Tesla said that they look for "squash" errors as an early failure indicator and there were none in my case. Instead, they remoted into my car and deleted "an excessive amount of stored data" which appears to have resolved my issues.

So are you are saying service told you your emmc in your 2013 mcu is nowhere close to failing? Is that the original mcu?
 
it's "squashfs" and yes, that's what they look for as a sign of emmc failure though emmc failure might manifest in some other ways too.

Good thing they are writing out the kernel ring buffer!!!

Sigh reminds me of a past life where squashfs errors were for scratched CDROMs...


As an aside though it seems ridiculous that squashfs errors can be caused by NAND wear. It's a read only filesystem which means you can write it down and verify it at update time and NAND dynamic wear leveling is not supposed to relocate already written blocks to less reliable locations.

This really feels like it's more than simple NAND wear. Sure it might be one of the provoking factors but I suspect either the emmc part they chosen has a bad wear leveling implementation, or it's something else entirely (heat, TRIM bugs, improper power/rail sequencing) that is at play.
 
As an aside though it seems ridiculous that squashfs errors can be caused by NAND wear. It's a read only filesystem which means you can write it down and verify it at update time and NAND dynamic wear leveling is not supposed to relocate already written blocks to less reliable locations.

wear levelling IS supposed to relocated bloks, but it's more than that. NAND cells naturally lose electrons over time, THAT is why you need to relocate. And the more wear you have the faster they lose them.

So a fresh write will often works no problems but you can no longer read it tomorrow (also temperature dependent). NAND cells have a "we guarantee that you can still read it after X time at Y temperature" warranty and once degradation dips below that you have data corruption even on nonmodified data because it's not moved around in time. E.g. see here for example of that SSDs can lose data in as little as 7 days without power - ExtremeTech.

Also older MCU1s from pre-ap2 cars overheat like crazy in operation (The whole mcu does) which certainly contributs to problems.
 
wear levelling IS supposed to relocated bloks, but it's more than that. NAND cells naturally lose electrons over time, THAT is why you need to relocate. And the more wear you have the faster they lose them.

So a fresh write will often works no problems but you can no longer read it tomorrow (also temperature dependent). NAND cells have a "we guarantee that you can still read it after X time at Y temperature" warranty and once degradation dips below that you have data corruption even on nonmodified data because it's not moved around in time. E.g. see here for example of that SSDs can lose data in as little as 7 days without power - ExtremeTech.

Also older MCU1s from pre-ap2 cars overheat like crazy in operation (The whole mcu does) which certainly contributs to problems.


I am aware — I also implemented the read disturb mitigation algorithm back in the days I worked on this stuff but yes, both extreme temperature and aging NAND do contribute to needing to rewrite such data... but still it surprises me that wear alone is causing read only data that was once readable to suddenly become not readable. My gut tells me that the storage controller firmware wasn’t all that great which frankly is probably par for the course for that vendor :(.

Do the diagnostics on the eMMC part show the amount of wear that would result in losing data this spontaneously? I’ve seen your business card tweet so I think neither of us are at liberty to give typical number but a yes or no answer would suffice and I’ll believe you :D
 
Do the diagnostics on the eMMC part show the amount of wear that would result in losing data this spontaneously? I’ve seen your business card tweet so I think neither of us are at liberty to give typical number but a yes or no answer would suffice and I’ll believe you :D

Hynix EMMC in MCU1s has 3000 rated nand erase cycles (at least that's what Tesla uses as the baseline). Recently they started to gather this usage statistics and show the undocumented command to use it so now rooted users actually do know what their overwrite count is. e.g. on my car it's at 989 right now.

I saw one report of a dead emmc at ~4k erase cycles but it's really dependent on a bunch of stuff as you know.

diagnostic does not really know when the data would be lost this suddenly because it depends as well, as such they have this rated number at which they still retain electrons marginally well enough in some benchmark conditions, and as you go further into erases data deteriorates.
wear levelling ensures that all cells are pretty much uniformly damaged which in particular means it's read-only data that's more likely to turn up wrong because it's not being overwritten/refreshed as often as the worn NAND cells need.

Paradoxically I think this means that once your EMMC is worn enough, decreasing write rate is counterproductive because readonly data is not being migrated around often enough to keep it readable.
 
Hynix EMMC in MCU1s has 3000 rated nand erase cycles (at least that's what Tesla uses as the baseline). Recently they started to gather this usage statistics and show the undocumented command to use it so now rooted users actually do know what their overwrite count is. e.g. on my car it's at 989 right now.
when was your car built and what is your milage to compare with mine ?