I'm not sure what "counterfactual" driving is exactly. I should probably re-listen to the podcast, which was excellent.
There is nothing like owning a Tesla for a few months to make a person really think about driving.
Driving is:
1. Controlling the car. As far as I can see, Tesla (and other carmakers) have this solved. It is not as if the auto controls fail to control the car. By "fail" for example, it is not as if you set the speed for 50mph and the car wobbles between 50 and 70 mph.
2. Controlling the speed. Same as 1.
3. Keeping the car oriented in (i) its lane, or (ii) to the extent there is not a clear lane, on the correct side of the road. Teslas are fantastic at (i), and I have seen improvements in (ii) since I bought the car in June. This is the first area where you need perception. You need to have the car be able to recognize the lane at lease as well as, if not better than, two human eyes. Too many people, from what I read, really discount how far along Teslas are on (i) and (ii). Item (ii) may not be perfect yet, but it seems obvious to me that if you can solve for (i) you can solve for (ii). I don't quite see how the naysayers can argue this won't happen, when it is actually happening.
4. Avoiding moving cars. Regardless of whether you are in a lane or not, you need to avoid other cars. Again, in the context of cars going in the same direction on a road, Tesla may be better than humans at this already. Its only a feeling, because without a deep dive into data to separate accidents generally from accidents caused by not getting out of the way of another moving car, its not provable. But given NAP's ability to change lanes, which involves waiting for the lane to be open, its hard to argue that Tesla has not solved for this as well. Hotz argues in the podcast that Tesla only has "one kind of lane change" - meaning that Teslas do not do the type of lane change where you speed up to get into the lane ahead. That's true, but that begs the question of whether that type of lane change is even needed, notwithstanding the fact that humans do it all the time, sometimes very aggressively. I believe that it may be needed for lane changes where there is simply so much traffic that waiting is not practical or even unsafe. Right now, with the combination of front cameras and radar, it seems to me that Teslas are far, far better than humans at not rear-ending the car in front of you.
I would note that last software release also shows cars coming the other way -- this is how a left and right turn can be done. Tesla already can identify the lane/road. I successful left turn involves that plus cars coming the other way. It also involves calculating safe distances.
5. Avoiding stationary objects, including people. This is why the release of smart summon was such a big deal. Telsas already have automatic emergency braking. But who would actually "test" it? I have noticed in the last software release a great improvement in warning when the car ahead is slowing down and I am not slowing with it. But smart summon attempts to avoid objects and people, on its own, and does so. It does it very slowly. But why would anyone assume its impossible to do it better? The cameras are better positioned than human eye level, and, of course, there are more of them. But the ability to identify and avoid objects is now officially a feature.
6. Process objects, such as stop lights, stop signs, etc. This is next. Teslas may be able to indentify, for example, a temporary stop sign placed between lanes when the traffic light is out of order. But other than the non-released developer software, current software does not attempt to distinguish a "sign" from, say, a trash can. In addition, the type of object (cone v. newspaper) dictates the response.
Five of the six features are already released.
So I guess I don't know that "counterfactual" driving is an actual feature. If you can do all six things above with the flawlessness of a computer, do you need the same amount of anticipation which humans have? Or do we just need the anticipation because we do not have the same level of perception of 8 cameras plus the sensors?