This is a massively different problem. In your example all the system is doing is computing the approach velocity, and yes you can get this (more or less) directly from various measurements (doppler shift or frame-by-frame distance deltas). But computing the lateral velocity of an object is a totally different class of problem. And guess what? It requires solving basically the same hard set of problems that you face when doing so from camera data. That is, object identification, object placement, object persistence, and velocity projection. These are all NN problems and are not "solved" by the sensors any more than the cameras "know" that they are seeing a car.
And if you go down that path, the extra data you are getting from the lidar/radar becomes less critical .. because solving for lateral motion (for which the approach data is more or less useless) also solves for approach motion, and if you are doing that in the NNs, then what is the lidar giving you? And that, ultimately, is Tesla's logic, or at least that's my reading of it.
So why are others still using all these extra sensors? Well, in Waymo's case, I suspect the answer is rather pragmatic. They were an early entry in the autonomous car space, and at that time the hardware cost (in both $$, power needs and physical bulk) for the NN/CPU compute power needed was prohibitive. The radar/lidar data could be interpreted, however, to get some basic driving tasks done. My guess is, Waymo has gone so far down this path they are more or less stuck with it, unless they undertake a total re-write of their stack.