Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Linux is not a RTOS and should not be running Tesla controls

This site may earn commission on affiliate links.

Cosmacelf

Well-Known Member
Supporting Member
Mar 6, 2013
12,686
46,769
San Diego
So today I hopped into my Tesla and started fiddling with the nav display. It couldn't keep up very well with my screen presses, and would momentarily go into a screen locked state - during these times, the AC and fan turned off. This happened about four times where the AC/fan would turn off for a few seconds at a time. In a hot car on a hot day, this was very annoying. Not only was the nav system fritzing out, but the climate control system was being affected by it.

I attribute this to the Tesla running Linux as the OS for the car displays and controls. I can only guess that it has a regular virtual memory system (since the symptoms seemed consistent with pages being swapped in and out of memory).

While such an OS is fine for a desktop or server general purpose PC, it most certainly is not a good choice for a car control system. A car must be able to walk and chew gum at the same time - in particular, if there are critical things the OS must do to keep the climate control system happy (like checking the temperature sensors and giving fan speed instructions), those processes must be at a very high priority, and in particular, cannot be interrupted by page swaps. Indeed, there shouldn't even BE page swaps. The car computer should have an adequate amount of ECC memory to never need a VM system. You aren't running Microsoft Word on it, afterall.

A RTOS (like QNX) would be able to deal with fast interrupts from users and the car sensors without dropping important functions.

Whenever I see software bugs in the car, they usually appear to be symptoms of using Linux instead of a robust RTOS. I am worried that Tesla will never be able to fully get the car computer/control system working without bugs unless the underlying architecture gets a rework. People asking for all sort of fancy features may be waiting a long time if they can't get basic functionality to work (like have a climate control system that doesn't get affected by user input presses in a completely unrelated function).
 
The car's real time systems do not run on Linux. They run on dedicated systems. Only the two nVidia display screens run on Linux. In my experience its response time is adequate.

If you're having trouble with your touchscreen then perhaps you should try rebooting it...
 
So today I hopped into my Tesla and started fiddling with the nav display. It couldn't keep up very well with my screen presses, and would momentarily go into a screen locked state - during these times, the AC and fan turned off. This happened about four times where the AC/fan would turn off for a few seconds at a time. In a hot car on a hot day, this was very annoying. Not only was the nav system fritzing out, but the climate control system was being affected by it.

I attribute this to the Tesla running Linux as the OS for the car displays and controls. I can only guess that it has a regular virtual memory system (since the symptoms seemed consistent with pages being swapped in and out of memory).

You don't need a RTOS to control an A/C, unless the general CPU is literally controlling the current flow using a software-based PWM implementation. (Which I can't imagine). I've implemented a PID controller quite successfully on Windows, which is the antithesis of a RTOS.

More than likely the A/C compressor and fan control is offloaded to some dedicated PWM chips (if not a full-blown controller subsystem), and the O/S just controls the PWM rate via SPI or I2C (or some higher level bus). This should not time sensitive - if the O/S doesn't update it for a few seconds, the A/C and Fan should keep running at the same speed until it gets another update.

By the fact that it stops, it means the O/S sent a specific stop signal, or the subsystem went into a failsafe. Either would be a bug. Nothing to do with O/S architecture.

Real-time is not the way to go for the general purpose O/S. For starters we want 3rd parties to be able to create apps... :).


You aren't running Microsoft Word on it, afterall.

Why not? I use outlook.office365.com all the time, and it works quite well. I'm pretty sure word.office365.com can also be made to work with a little effort.
 
More than likely the A/C compressor and fan control is offloaded to some dedicated PWM chips (if not a full-blown controller subsystem), and the O/S just controls the PWM rate via SPI or I2C (or some higher level bus). This should not time sensitive - if the O/S doesn't update it for a few seconds, the A/C and Fan should keep running at the same speed until it gets another update.

By the fact that it stops, it means the O/S sent a specific stop signal, or the subsystem went into a failsafe. Either would be a bug. Nothing to do with O/S architecture.

Yes, we are all guessing what the underlying software architecture is here. Nonetheless, when the NAV UI is freezing/has delayed screen redraws and generally freaks out (I'm sure you've experienced this), this should in NO WAY affect the climate control. Why the heck would the Linux OS send a specific stop signal to the AC subsystem? Why would anything in the NAV system affect the climate control system?

My guess is that Linux OS is actually controlling the climate control system at the level of reading temperature sensor(s), and sending a fan speed number to the AC fan subsystem which is no doubt run on some dedicated processor of its own. But I am guessing based on observed results that fan speed is set by the Linux OS. My guess is that the Linux OS missed temperature sensor readings due to its non-realtime OS, and the software assumed the temperature was 0, causing it to turn the fan speed to 0. If I'm right, this could be mitigated by more robust software (if you read off-scale sensor inputs, use the last known good reading), but that is a band aid. A better approach is an OS where interrupts and sensor readings are never missed.

As far as third party apps in the car is concerned, that is a BAD idea. The car is buggy enough, thank you, without having unknown effects due to CPU hog third party apps. Crashing your iPhone is one thing, but not have AC during a hot day is quite another.

And my point about not running Office Word is that large programs like Word consume huge amounts of RAM. They really do need to be run in an OS that has virtual memory support. The problem with VM is paging. The computer literally freezes for a period of time while pages are saved/restored from disk. That is no way to run a car computer where users are expecting the lights to turn on NOW when the light button is pressed while driving. An inconsistently responding car control system is horrible.
 
It's software. Yeah, it's running Linux. That doesn't mean that critical applications are paging their brains out. People are running mission-critical storage controller software on top of Linux/FreeBSD with much tighter response time requirements than any car A/C sensor system and those systems work just fine. Trust me, you can make paging a non-issue if you know what you're doing and are willing to lock the system down (which I'm sure Tesla is).

But it's still software and software's got bugs.

If the touchscreen flakes out on you, reboot it or report the problem to Ownership so they can grab the logs and other diagnostic info, then reboot it.
 
The car's real time systems do not run on Linux. They run on dedicated systems. Only the two nVidia display screens run on Linux. In my experience its response time is adequate.

If you're having trouble with your touchscreen then perhaps you should try rebooting it...
This is exactly what I heard too (that the subsystems run independently on something else, don't remember what I article I read it from though). There is no need for an RTOS like QNX if this is the case.
 
This is exactly what I heard too (that the subsystems run independently on something else, don't remember what I article I read it from though). There is no need for an RTOS like QNX if this is the case.

Nevertheless, I've had the same experience as the OP. At least four times I've had to reboot the screen as the AC cycled on and off ( in Florida). If this is controlled via dedicated subsystem, then why the bug link to the Nav system refresh?
 
I switched all my production servers from QNX (which is costly as it requires purchasing a license) to Linux. These servers handle gigabits of real time data per second with microsecond latency (television broadcast servers and real time videoconferencing streams, both of which responsiveness and latency is critical). Trust me when I say that Linux is a hell of a great operating system and far more than sufficient to run a petty little Tesla display.
 
Yes, we are all guessing what the underlying software architecture is here. Nonetheless, when the NAV UI is freezing/has delayed screen redraws and generally freaks out (I'm sure you've experienced this), this should in NO WAY affect the climate control. Why the heck would the Linux OS send a specific stop signal to the AC subsystem? Why would anything in the NAV system affect the climate control system?

My guess is that Linux OS is actually controlling the climate control system at the level of reading temperature sensor(s), and sending a fan speed number to the AC fan subsystem which is no doubt run on some dedicated processor of its own. But I am guessing based on observed results that fan speed is set by the Linux OS. My guess is that the Linux OS missed temperature sensor readings due to its non-realtime OS, and the software assumed the temperature was 0, causing it to turn the fan speed to 0. If I'm right, this could be mitigated by more robust software (if you read off-scale sensor inputs, use the last known good reading), but that is a band aid. A better approach is an OS where interrupts and sensor readings are never missed.

As far as third party apps in the car is concerned, that is a BAD idea. The car is buggy enough, thank you, without having unknown effects due to CPU hog third party apps. Crashing your iPhone is one thing, but not have AC during a hot day is quite another.

And my point about not running Office Word is that large programs like Word consume huge amounts of RAM. They really do need to be run in an OS that has virtual memory support. The problem with VM is paging. The computer literally freezes for a period of time while pages are saved/restored from disk. That is no way to run a car computer where users are expecting the lights to turn on NOW when the light button is pressed while driving. An inconsistently responding car control system is horrible.

Temperate changes do not happen in the nanosecond range that paging happens. If the OS misses a temperature reading, it shouldn't immediately turn off the AC.

PC games run on a paging O/S, and has vastly more intense response times requirements than anything in the Model S. (Save the drivetrain, but no way that's running on Linux). Similarly, you don't see glitches in a mouse cursor while an O/S is paging.

There's a perfectly good way of implementing A/C control on Linux / Windows / iOS / whatever... It's just not perfectly good yet.


Coming to your related complaint - that the screen locks up while you're pressing buttons. There's no reason for it to do that either. I seriously doubt this is due to paging. More than likely it means the developer took a lock on a UI thread (waiting for e.g. something from the network). That shouldn't ever be done in theory, but it commonly is since it's simpler. Async programming is not for sissies.
 
More than likely it means the developer took a lock on a UI thread (waiting for e.g. something from the network). That shouldn't ever be done in theory, but it commonly is since it's simpler. Async programming is not for sissies.

Yes, this is also possible. I uploaded a video of this bug in action. It is, unfortunately, very reproducible. I am running v4.4 (1.31.11). It took me about 2 minutes of pressing buttons on the nav screen for me to cause the bug to shut down the AC. I edited the video to edit out the boring parts. I managed to halt the AC twice, so you'll see it twice here. The background white noise is the AC fan, it gets very silent when it turns off. My AC settings are all auto except for non-recirc air. I just set the temperature down to 64 to get it to run all the time.

http://www.youtube.com/watch?v=3WWdkQTPGE4

So how do I report this bug without having to go through a clueless non-software support person? Is there an email I can use?
 
Last edited:
And you know the Model S has no swap because?

Because:
a. i work on embedded systems with linux. Your assertions are all wrong.
b. Tesla engineering have proven many times they're not stupid. No one in their right mind does swap to flash. EVER.

What you've described (lag) is common on embedded (ARM) processors that are anemic CPUs (think phone/tablet CPUs) trying to run stuff in the background.
 
My car's systems have been flawless for months now. I don't remember the last time I rebooted anything. More than likely, something else is going on -- perhaps specific to your car, or you have firmware mismatch going on ... or perhaps a version of hardware that's got issues, bad hardware, etc. It's not Linux at fault. Bad software is bad software on any platform. It may be that different platforms have different advantages, sure -- all trade-offs.
I'd definitely get someone on top of your issues. Contact your local service manager. Email ownership (they're not stupid) -- give them a link to the video.