Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Experts ONLY: comment on whether vision+forward radar is sufficient for Level 5?

This site may earn commission on affiliate links.
For starters, SAE autonomous driving grades need to be stated and understood up front:
SAE J3016 Automated Driving Standards
Level: 5
Name: Full Automation
Narrative Definition: the full-time performance by an automated driving system of all aspects of the dynamic driving task under all roadway and environmental conditions that can be managed by a human driver.
Execution of Steering and Acceleration/ Deceleration: System
Monitoring of Driving Environment : System
Fallback Performance of Dynamic Driving Task : System
System Capability (Driving Modes) : System

Definition of driving mode: Driving mode is a type of driving scenario with characteristic dynamic driving task requirements (e.g., expressway merging, high speed cruising, low speed traffic jam, closed-campus operations, etc.).

The above are all SAE definitions, verbatim.

In short, L5 requires full 360 degree situational awareness without requiring human intervention. For example, lane changes cannot be executed without keeping track of cars, bikes, bicycles or pedestrians in the rear flanks. Front visibility is not sufficient.

In my opinion, upto L3 is quite straightforward to implement. AP2 should be able to accomplish it. The jump from L3 to L4 is very tricky, not just technologically, but in terms of regulatory oversight. L4 to L5 is even harder.
Also sorry I didn't see the last paragraph where you put your opinion upon first read. I only saw the definition of L5.
 
...vision ...should be good enough for software, with enough resolution and processing power. Of course, it may be easier with more input sources.

Of course, this leaves open the questions of how much code it will take and how long it will take to develop and exactly how much processing power is needed to run that code and is it reasonable for a car without using too much energy. All of this remains unanswered. If your question meant to imply considering these factors, I don't yet think anyone can say. If you question was a more simple question of whether or not it's technically possible regardless of those factors, then I would say yes.

Yes I was hoping some experts would take those factors into consideration. It sounds like you're saying processing power and code is the limiting factor - am I right? If so, how does adding even more sensors lesson that load? Is the argument that radar signatures and lidar data is easier to interpret than visual data, therefore requiring less computing power/simpler neural networks and is thus easier to achieve in the short term?
 
Please understand what he truly expects: a rational discussion of something he doesn't understand in a loose domain such that should anyone respond that doesn't suit his agenda allows for a full tantrum reminiscent of DJ Trump. Queue Mr. A. Jackson please...
It's called a screen/sort. Yes - when looking for info in a field one doesn't understand, one uses imperfect signaling devices (such as degrees/employment field) to make a quick judgement on who to listen to. It's imperfect - invariably you throw out good sources of information, but you also likely/hopefully throw out more bad sources than good ones when time is limited. Sorry if it upsets you. I have no agenda - I just want to hear what actual domain experts think because the forum is thick with uninformed opinions (including mine).
 
... many years working with machine vision systems.

my 2c

wrong question!

It's not what the systems "see".
It's what they miss.

Human drivers process visual field only, no LIDAR, no RADAR.
So in theory this is possible to replicate with machine vision alone, this I believe is primarily Tesla thinking.
LIDAR and RADAR both suffer from noise, and LIDAR in particular is compromised in anything but excellent visibility, though both can measure over considerable distances more accurately than a camera typically is able to.

However though machine vision can be more accuarate and reliable in some conditions, having sufficient AI to apply context as a human would to interpolate a situation is some way off.

The problem however is this:

Human drivers kill 1000's of other humans every year. This has become accepted as a (somewhat unpalatable) norm.

Automation kills one single human (even if said human was not acting according to the rules provided) but quietly reduces accident rate by 40% thereby statistically saving more lives than the one sadly lost. Result - Hysteria.

Likely outcome: Tesla will continue to make phenomenal strides in real world application of autopilot. Most probably L5 FSD will be restricted to Highways only with curent technology and good visibility due to the reduced scenarios for accidents, with lower levels applicable elsewhere. Will still meet Musk's claim though.
 
AI expert here, but more in the natural language processing area, but I can extrapolate autopilot.

Short answer - Not possible.

Long answer -
You need 4 ingredients to make the L5 autopilot pizza,
- a) 360 degree situational awareness
- b) Fleet learning
- c) Data crunching both in real time (like your head does), and learned (like your head does)
- d) A computer and a powersource to support all this.

360 degree situational awareness
Compare it with your head, you have stereoscopic vision, but it's mounted on an axis (your neck). AP camera is not, radar does not have enough resolution.
Cameras are vastly inferior to eyes, except we can make cameras good in a single dimension.
Military drone cameras cost millions, and they get around the whole problem with brute force (very clear lenses, huge aperture, massive CCDs) ~ but then they produce so much data. A car with a 90kwh battery can't run a computer powerful enough to process all that in real time.

Fleet Learning
I don't think they are taking advantage of their 'high resolution maps' just given how the system behaves so far, and that is the 'learning' bit that your head is so good at, and Tesla just isn't. And I don't think they will be able to take advantage of the high-rez-maps either - not to the extent they'd like you to believe, simply because the car neither has enough data storage, nor enough computational power, nor a power source to support that kind of computation. Basically that "fleet learning" thing - that's way oversold than what Tesla can realistically ever deliver. However, no way to measure the success of that ;-) so they can get away with that bluff. Story of corporate America, the judge (you) is dumber than the criminals (Tesla). Anyway, so what they CAN do is to improve their algorithms based on data. I don't think they are doing that greatly yet either. i.e. they are not considering every car's data. That may actually not be necessary even for what they are trying to achieve for now.

What they cannot do you drive 15 mins before me, and swerve a pothole, and my Tesla magically learns from your experience without a programmer in freakmont writing a line of code - BS! Not happening.

Data crunching both in real time (like your head does), and learned (like your head does)
Data crunching in real time ~ oh crap that bickhead is just swerved into my lane cuz he was texting and driving.
Learned - it's friday night, better be extra careful of honda civics with coke can exhausts and underbody lights.

With AI, we could extrapolate missing bits of info, but this won't happen for 3 reasons,
-- Human beings do not possess the ability to write complex software, not to that level on that hardware on a mobile platform.
-- Neither do we have the necessary hardware to process all that in a small enough power efficient enough package to mount on a car.
-- Your head is a computer, with stereoscopic forward vision, but it has a ridiculous amount of computational power and far superior algorithms to what Tesla is writing (okay that was below the belt, snicker).
-- Your head cannot do L5 under all situations either. Can you drive in fog? In a downpour? snowstorm?

Now with AP2 hardware, can we do it? Sorry nope!
We have the necessary 'cameras' and 'sensors' but not the necessary software, or the hardware to run that software.
However, we can get pretty damn close, for 90% of the time.
Tesla is taking the logical path here, to use Linux/GCC/graphics card acceleration repurposed for AP computation. Basically they are brute forcing computation at the problem and reacting as fast as they can to give you the illusion of FSD .. which is good enough for majority of the situations.

A computer and a powersource to support all this.
So the 'learned' portion - is some bonehead in freakmont learning for the computer and writing out code. But the real time learning replacing that bonehead - well, its possible, one day. But we need vastly superior hardware than we have today. If we did, countries wouldn't be racing each other in making the best super computer possible, while not really explaining what the hell they use them for.

Summary,
With AP2 and the current hardware Tesla will be able to give you the illusion of FSD that will actually work in 80-90% situations, which is plenty good and fairly impressive. However, it's not as sci-fi as Elon is trying to sell you. And yes, it's a shame that no other auto-manufacturer can figure this out. PS: This is the internet, so I guess I pulled off being an expert pretty well, no?

Wow thank you so much - most detailed commentary on the problem/situation I've yet read. I now have even more questions - will ask you some tonight after work here on this thread.
 
I have a BS in Computer Science and currently work in the implementation(inference/forward pass) side with deep neural networks for computer vision.

That out of the way, yes, of course. Even vision alone is demonstrably sufficient.

If the bar is being able to safely handle driving in all environments that a human can, then you have been using an L5 suite since you drove your first car that relies almost entirely on just two cameras(with the ability to swivel to something like 135 degrees in each direction).

The trick here isn't the sensors. Tesla's AP2 sensor suite has better coverage and tracking than you do, by far. The hard part is being able to train a network to use those inputs to perform at least as well as a human can, while still being able to process those inputs fast enough.
 
Can you unpack this more?
To start with, I insist on the use of SAE definitions from the PDF posted earlier. Any other basis will not work for purposes of discussion, because I refuse to discuss on the basis of arbitrarily defined parameters of what L5 may be.

On that basis, look through the PDF I posted earlier. There are 4 SAE vectors:
1. Execution of Steering and Acceleration/ Deceleration
2. Monitoring of Driving Environment
3. Fallback Performance of Dynamic Driving Task
4. System Capability (Driving Modes)

L2 through L5 progressively adds each of these to System role, from Human Driver. Item #1 is essentially Autosteer in Tesla language (which is why AP1 is a L2 system) . Item 2 is significantly covered by 8 cameras on AP2. *But* 'monitoring' and system fallback are hugely different roles, which means item #3 is a non-trivial jump from item #2, i.e. from L3 to L4 is hard. There *will* be regulatory and liability insight into a system driven fallback approach. Item #4 (from some driving modes to all) is the hardest, which implies L4 to L5 will be a very hard move.

In my opinion, Elon's definition of FSD is essentially something between L3 and L4, skirting the point where he can technologically accomplish substantial autonomous capability, but not complete 'lie in the back and sleep' autonomous capability that regulatory reform will not permit anytime soon.
 
I have a BS in Computer Science and currently work in the implementation(inference/forward pass) side with deep neural networks for computer vision....The trick here isn't the sensors. Tesla's AP2 sensor suite has better coverage and tracking than you do, by far. The hard part is being able to train a network to use those inputs to perform at least as well as a human can, while still being able to process those inputs fast enough.

Do you agree with @Sir Guacamolaf and @thegruf in their characterizations that training the neural network to successfully process the input from the cameras is years off. If I've misinterpreted your positions @thegruf and @Sir Guacamolaf please correct me.

And to all three of you - and any other experts here - does the addition of lidar and/or more radar sensors reduce the computation required?
 
Last edited:
  • Like
Reactions: Sir Guacamolaf
Do you agree with @Sir Guacamolaf guacamolf and @thegruf in their characterizations that training the neural network to successfully process the input from the cameras is years off. If I've misinterpreted your positions @thegruf and @Sir Guacamolaf please correct me.

And to all three of you - and any other experts here - does the addition of lidar and/or more radar sensors reduce the computation required?

No comment in that regard. It's impossible to even guess, unless you're part of Tesla's AP management or trying to read the tea leaves of what they say or do, what kind of time table they're on.

That said, I highly doubt the networks we have are anything close to what Tesla is developing towards FSD. They're clearly bringing in some of the vision networks(evidence of this in that 8.1 appears to use parallax between the two cameras since it can identify stopped cars), but since they aren't using the side or back cameras at all, the networks that take them as input clearly aren't there(or aren't being used). More qualitatively, it *feels* like steering and TACC are rules based currently, and I'd imagine those would need to move to an ML approach for FSD.

If every bit of technology they have towards FSD can be said to already be activated in our current cars, then yes, they're a long ways away. But that's almost certainly not the case, and it's tough to comment on the status of software we've never seen.
 
  • Informative
Reactions: croman and calisnow
Missed the question about decreasing computation required. Unfortunately, the best answer to that is: *shrug*. It all depends on what the network can learn. If the addition of lidar/radar allows a smaller network to achieve the same performance of a larger network without it, then yes, it'll likely shave off processing time. If not, then it won't.

Since the whole *thing* in machine learning is training a computer to do a task you're unable to fully define, the only real way to tell which of those is the case is empirically; by running a set of experiments and seeing what happens.
 
many safety systems work on a 2oo3 (two out of three) principle or similar.
The theory being that two systems have to concur to result in a response.

It couid be argued that camera's alone are just one input. Many cameras could be considered many inputs (if they overlap) but these are consolidated to a single input through MVP.
Adding RADAR and/or LIDAR in theory can provide confirmation inputs to the system: camera thinks it detects something/radar agrees - system responds.
To some extent Tesla's now massive autopilot database can also be considered a 3rd input, at least for location information. Fundamentally knowing exactly where you are is critical to any decisions to be made, threats to be assessed, scenarios to be interpreted.

I think it is important not to exclude the latter when considering FSD, this is outside my specialist field but clearly many in the industry see that Tesla have a big, potentially unassailable lead in this area.

Even for highway use, I still have a conern that with only forward facing radar whether overtaking vehicles in adjacent lanes can be sufficiently characterised for speed by cameras alone, especially relevant during lane change of course. Had three radars been fitted, with two rear facing (perhaps from the mirror housings) I could forsee an easier path to FSD.
 
  • Informative
Reactions: calisnow
...They're clearly bringing in some of the vision networks(evidence of this in that 8.1 appears to use parallax between the two cameras since it can identify stopped cars)

I thought the improved detection of stationary/partially obstructing objects in path was down to the new active reconstruction mode of the radar signal with whitelisting that Tesla have developed?
 
I thought the improved detection of stationary/partially obstructing objects in path was down to the new active reconstruction mode of the radar signal with whitelisting that Tesla have developed?

Ah, possible. I hadn't seen any explanations of it and just suddenly saw that my AP2 car could pick up stationary objects it never saw move before. And, given that they activated a second forward facing camera for the same release, parallax seems the most obvious explanation for that.
 
  • Informative
Reactions: calisnow
I meet one of your expert criteria but do not feel I am qualified to give an expert opinion. Your criteria are too easy.

If FSD is vision based, what happens if the vision is obscured with glare, snow, rain, mud, etc?

As a human, we squint our eyes or turn our head. The cameras are in fixed locations without any ability to clean or shade themselves.

I'd be curious what the car will do if it is temporarily blinded.