I have a few thoughts on this. I'm an experienced programmer who has a history of solving things that most people deem hard or impossible. Naturally I have a great interest in Teslas FSD project. I'm thinking of writing an article on Teslas FSD soon.
Now there are many ways of doing this, but I will summarize the two most common.
1. End to end, inputs to driver controls, deep neural network. This solution will with a huge amount of training data seemingly get pretty far pretty fast. But it is basically a big blackbox. Maybe you get a good reliability one day with enough tweaking, but iterating the software further is practically impossible for now. Probably possible in the future when generalised in a human intelligence matter, but not in year 2017 in any other way than experiments .
Which leaves us at option 2:
2. Small specialized neural networks, mainly for object recognition. But heavily assisted by regular logic/algorithms. This has the following advantages:
- Parts can be debugged, replaced, iterated and tested separately.
- The driving behaviour will be very predictable.
Now, one question might be: How do you with regular logic account for billions of different situations that the car might never have seen before? Now I will answer that soon, because the solution to that is rather simple (though extensive).
Let's break this problem into a few chunks.
First let's start with the basic part:
- Where can the car drive? Now this is a complex question, but let's start really simple by asking: Where is it physically possible to drive? This is one of the hardest chunks to solve, and I believe Tesla has been working on that for a long time.
This consists of using the camera, textures, geometry-lines, assisted by neural networks to map out where in your surroundings it is physically possible to drive. I have a few ideas how that can be implemented, but that is a lot of work and requires lots of data and testing.
Now that you know where it is possible to drive and have mapped this area into a 3D model, you have already reduced the problem of FSD a little. You now know where you can't drive, which is usually most of your sorrounding area.
Next step is flagging this drivable area into multiple categories/groups. This is another complex operation, and each category requires a separate set of NNs helped by algorithms to be safely recognized.
- Preferred
- Unpreferred
- Low speed only
- Illegal
- Etc.
Generally a preferred drivable area is your lane, bordered by lane markings and unpreferred areas. I'm going to later write a document explaining how these categories can be mapped in more details.
Next step is objects. The 3D model that is built for mapping out drivable area is retained, modified, extended as long as the car has not moved out of this area.
Objects come and go, but generic obstructions to drivable area must be recognized as non-terrain and objects. This is also a really hard part, but fortunately now we have mentioned all the 3 hard parts. Unfortunately you cannot safely drive much at all before you have solved these. Or you can cheat and buy an expensive device that does most of this 3D mapping for you (Lidar) and save most of the hard camera work up until now.
Now we got this:
- Overview over drivable areas
- We know where the objects are and what space they occupy.
- Using the past few seconds of data we also know eventual velocity of these objects.
Next step is predicting where this objects will be in the future and the uncertainties in the prediction. Now a moving object that we don't know what is has a big uncertainty, because we don't know how it moves. If all objects were ungrouped, the car would not be able to drive anything but extremely slow and safely.
So we're interested in specifying these objects as much as possible. These have fairly common techniques for recognizing:
- Car objects
- Human objects
- Children
- Bike objects
- Similar...
Now take this objects and feed them into their own movement predictors. Now a car next to your lane is likely to continue following his lane. No need to assume he might be swerving into your lane in the future unless he is blinking turn lights or is approaching your lane closely. Cars in front of you on highway is predicted to be ahead also in a few seconds because they're cars and they have a velocity. The human is likely to continue walking the same way, but has a bigger uncertainty. Children have a big uncertainty in predicted movements.
Now calculate your own vehicles preferred path based on drivable area as far as the cameras can see, signs and marking. Every area you don't see is flagged as uncertainty. Do the future prediction for the other objects and look for intersects and potential intersects (uncertainty paths). Now group intersects into a few categories and process them. Adapt your own planned path to a low risk solution that avoids all certain intersects and maintains an acceptable risk for uncertain intersects.
Example of adaptations:
1. You're on a highway. Several vehicles approach from the right on entry lane. Because they're moving in an ending lane, the path predictor will plan their paths intersecting with your planned path. Every drivable path is considered and reduce speed to avoid intersect or changing to another lane (less preferred area) is the desired action to avoid intersect.
2. Children play by the road. Path predictor deems the uncertainty to a big circle, with max speed of N. Driving speed is reduced so stopping distance is less than distance to child + worst movement uncertainty. Possible future intersect reduced to minimum.
3. A man jumps out in front of the car with oncoming cars. Every preferred drivable area results in a head on crash or killing the man. Choose the illegal path or the grass outside the road (unpreferred terrain, slow speed only) as the most likely successful outcome.
Now the last step is creating artificial objects in your scene that actually do not exist. An engine works out every possible place to put an object in an unseen area of your scene and predicts their path in the same manner as a real object.
This is for example to work out that an opposite lane as a dangerous place to be before a turn because of the potential car being placed right around the turn approaching at 80 kmt. Causing an uncertainty intersection. Result: Don't pass another car unless you see it's clear.
Or that building corner should be passed slowly because someone might be walking around the corner. Causing an uncertainty intersect. Result: Car slows down to walking speed, OR chooses a different path further from the wall and higher speed.
This post got quite long, and I've not gone into all the details. But my conclusion is that selfdriving in a safe manner that actually can handle every possible situation is not unthinkable. Actually I have yet to think of a situation this FSD would not handle. Tesla chose the hard path going all vision, but this might actually work out!