Just a little point.. a machine can read lips 5 times better than a professional human lip-readers, now, are you so sure of your affermation?
Just because you never seen a system that can interpret some gesture/etc doesn't means that it really can't.
You are good at it because..? because experience tought you, not because you are brilliant, when you were first on a driver seat i'm sure you had problem detecting when some car will probably turn left or aren't going to stop at the stop sign or similar etc, but guess what? we are good at getting the subtle context and so, after some year you start getting this signals and now you are very good at it.
Guess what? if you feed tb of data to a deep learning machine, the machine will get the same "felling" that you have.
The fact that nobody has actually done it, is that for now it's irrelevant, you first need to learn "where in the hell i need to drive" before you need to learn the subtle rules of the road.
The same as lips reading.
They simply fed a lot of movies with subtitles and you are good to go, and again, with the same way they were able to "resync" the audio with the video a not so-easy thing to do.
The point is, everything we learn from experience, a machine can learn too if the machine can see what we see.
And they definitely see what we see ( and well beyound ), i would say that the car can send you a visual alert "jerk coming behind you!"
( maybe they wouldn't say it aloud..
)
Don't be too sure of what a machine can and cannot do, the question is:
it has enought computer capability to do so? and.. they will teach the car to do so? and.. when they do it? and again, it's really necessary? when we have enought data for it?
i would say that "can it" it's more or less solved or going to be solved