Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Neural Networks

This site may earn commission on affiliate links.
My guess is it needs to be accurate to within about a foot to avoid hitting curbs and jersey barriers.
There are plenty of videos and accounts of autopilot hitting things that it wouldn't if it had an accurate 3D map. Curbs, jersey barriers, trucks, off the top of my head.

Yup. Older software, often older hardware. Not sure how many conclusions about the current state of the Tesla art we should draw from that.
 
The question is if it's an "accurate 3D" map.
Yes. To be clear, there is a lot of research being done on visual SLAM, and methods based on deep convolutional neural nets look promising. But there are still a number of unsolved problems.

Here's a video showing a deep learning based visual SLAM applied to the image dataset from the KITTI vision benchmark (skip to 1:20 for the visualization):

 
Yes. To be clear, there is a lot of research being done on visual SLAM, and methods based on deep convolutional neural nets look promising. But there are still a number of unsolved problems.

Here's a video showing a deep learning based visual SLAM applied to the image dataset from the KITTI vision benchmark (skip to 1:20 for the visualization):


Doesn't seem that bad, and it's working under handicaps that Tesla doesn't have - it's trying to guess the movement between frames, which the car can supply to the neural network from the inertial sensor set and GPS, and it's restricted to a single camera while Tesla obviously has several overlapping in the important to map areas.
 
Haha. I have exactly the opposite experience. If a car is changing into my lane TACC doesn't respond until they're nearly half way into the lane. I don't understand why it's so hard to predict that someone who is signaling and already a third of the way into my lane is going to soon be entirely in my lane. They should put that case into the neural net before they worry about people who might be about to change lanes.
Oh mine does that as well. It brakes for cars in adjacent lanes that aren't moving into my lane, but then doesn't brake for ones that are actually moving into my lane until the last second with a "HOLY SHMIT THERE'S A CAR THERE NOW" kind of response.
 
Oh mine does that as well. It brakes for cars in adjacent lanes that aren't moving into my lane, but then doesn't brake for ones that are actually moving into my lane until the last second with a "HOLY SHMIT THERE'S A CAR THERE NOW" kind of response.

That's the way my AP1 car always worked - very disturbing and disruptive when someone cut in. My new AP3 car handles it much more smoothly.
 
  • Informative
Reactions: Cosmacelf
Yes. To be clear, there is a lot of research being done on visual SLAM, and methods based on deep convolutional neural nets look promising. But there are still a number of unsolved problems.

Here's a video showing a deep learning based visual SLAM applied to the image dataset from the KITTI vision benchmark (skip to 1:20 for the visualization):

There's no denying that there's a lot promising research on the subject. I do wonder how well these methods work when you have moving objects in the scene. Anyway there's a big difference between research and having something with the accuracy necessary for FSD.
Oh mine does that as well. It brakes for cars in adjacent lanes that aren't moving into my lane, but then doesn't brake for ones that are actually moving into my lane until the last second with a "HOLY SHMIT THERE'S A CAR THERE NOW" kind of response.
In Tesla's defense there are many human drivers that do this too. I don't like riding with them though. haha.
 
These things will improve over time. It's part of Tesla's machine learning process. In the beginning, when the neural nets are not very good, the system will be really poor but as Tesla feeds more data into the machine and improves the neural nets, it gets better and better. I am confident, at some point, AP will get really good with intersections.



I think part of the "obsession" with LIDAR is that it is a very direct way of solving the perception part and therefore can give you a rudimentary self-driving system pretty quickly without needing to do all that messy and hard vision learning stuff. It's why we see these start-ups that want to jump into the self-driving business, basically slap some LIDAR on the roof of their car and start testing right away.

But I do agree with your quote. As I see it, the two approaches (LIDAR vs vision) take different approaches to solving the perception problem but both meet at the same end point. In the end, regardless of which perception solution you used, you still need to make driving rules to tell your car what to do with that perception data. LIDAR certainly has its advantages and can be useful for solving the perception part of self-driving. But LIDAR is not some self-driving magic pill. It does nothing to figure out what driving rules you need. That's something you still need to figure out on your own regardless of whether you used LIDAR or not.


Aptiv Lyft cars has a trifocal camera
 
At 2:19:40 while they're spinning the 3D constructed map from their 6 seconds of camera only data, he explicitly says they're applying the same techniques on a slightly more sparse version in the car.

Personally I do not consider that a ”3D map”. That is a point cloud.

A 3D map is a computerized outlook on the world that places all obstacles and markings in a computer object format that identifies them (that can then be visualized if need be but of course the computer only looks at the as data objects).

Below is what a Waymo creates of the world very reliably and judging by the spinning and disappearing cars and lane markings on the Tesla IC, Tesla has a very hard time doing reliably.

Once you have a reliable 3D view of the world in the car, you can then train an NN to drive against that, but without a reliable 3D view you are bound to have to write a lot of rules for safety because you can’t quite trust what the computer says is or isn’t there.

image
 
Personally I do not consider that a ”3D map”. That is a point cloud.

A 3D map is a computerized outlook on the world that places all obstacles and markings in a computer object format that identifies them (that can then be visualized if need be but of course the computer only looks at the as data objects).
This discussion came about because people were comparing Lidar to computer vision. Lidar produces a point cloud as output, while a camera gives you a series of flat 2D images. In both cases the goal is to obtain a certain understanding of the world around you, but you start from different representations. The point cloud has the unique advantage that you will never miss an obstacle because depth information is directly available in the point cloud representation, while the process to obtain depth information from 2D images is complex and today's methods lack in accuracy and have relatively high error rates.
 
@Eno Deb Sure. And I agree the Lidar point cloud is great for assessing distances and saying ”nothing is there”. Usually that is a stepping stone and the goal of great computer vision and sensor fusion is to generate a digital representation of the world with identifiable objects, which can then be acted upon (by an NN or heuristics).
 
The Aptiv cars actually have a trifocal camera.

They *have* a front facing camera, but it is just in shadow mode for their Las Vegas fleet. The camera data is not used for driving the car. They are capturing data from the camera and tagging it in real time from the Lidar & Radar & GPS in order to train a NN at a later date. The PR people were very clear in explaining to me that the camera is not currently used for driving at all. The large screen that shows what the computer is seeing was very clearly only showing lidar point cloud and nothing else. I urge everyone to go ride in one, don't believe some random guy on the internet.

I'm sure there are many people on this forum who live in Las Vegas who can confirm.
 
Aptiv Lyft cars has a trifocal camera

I think you meant to reply to @Inside post below instead of my post:

Not true. While in Las Vegas last month I rode is a self-driving Lyft car (made by Aptiv). It uses only lidar & radar. No cameras, no ultrasonics. I took 6 different autonomous rides, in 6 different cars, during my week out there. It drove perfectly, 100 times better than any Tesla. It handled jay walkers, merging, red lights, left turns, being cut off, etc. I suggest everyone who is interested in this sort of thing should try it next time you are in Vegas. Just use your normal Lyft app.

p.s. They can get away with no cameras because Las Vegas is a "smart city" and the traffic light status (red light, green light) is computerized and the car can poll the status remotely. I seem to recall being told that most signal lights out there also have a radio transmitter that broadcasts the signal's ID# as well as its current status. The cars use high-def map data and know where all the limit lines are and where all the lane lines are, and which lanes are turn-only lanes, etc.
 
They *have* a front facing camera, but it is just in shadow mode for their Las Vegas fleet. The camera data is not used for driving the car. They are capturing data from the camera and tagging it in real time from the Lidar & Radar & GPS in order to train a NN at a later date. The PR people were very clear in explaining to me that the camera is not currently used for driving at all. The large screen that shows what the computer is seeing was very clearly only showing lidar point cloud and nothing else. I urge everyone to go ride in one, don't believe some random guy on the internet.

I'm sure there are many people on this forum who live in Las Vegas who can confirm.
I did actually ride one of their cars at CES, that's how I know they have a camera. ;) The safety driver wasn't a technical person so he couldn't give me substantiated technical information beyond some soundbites he had been trained for, but I honestly doubt that they rely solely on V2I for things like traffic light status.