Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Neural Networks

This site may earn commission on affiliate links.
@verygreen questioned the pr videos coming from ME and saying that ME must be applying some type of filter/clean up for their videos. But as @wk057 and AP1 confirms. There is no filter. The network just produces good outputs.


That video .. LOL

You have to play it at 0.25x to get the actual speed of the car
The pedestrians are not detected as obsticles
The car lunges into a parking space (0.19s) before snapping back into the lane

The priority for ME, pre-Intel, was to create good-looking visualisations so that they could get funding & maybe even sell products. When they couldn't do that using the output from their product, it looks like they just manipulated the video...

ghost.JPG
 
Here is an actual EyeQ4 visualization, taken from the BMW Emergency Stop video. It's not pretty. There is a ghost car sitting on the rear bumper, and the overtaking car disappears and fragments as it passes:

86pMwW.gif


Later, we see what happens then the BMW starts changing lanes and is passed by another car that deforms and loses position as it passes. Again not pretty, but at least the tailgater has gone now...

PZ0z22.gif


If this was any other vendor, we would say that the vision/radar fusion is a bit rubbish, but because EyeQ4 is L5-ready, the only possible explaination is that the overtaking car in the first GIF is actually falling to pieces.

This is because of the micro-wormholes which inconveniently materialised on the BMW test tack as they were filming - the car disappears from the display every time it hits one. The second GIF shows the stretch effect of driving close to a micro-wormhole, without actually hitting it. Luckily that car didn't disintegrate too.
 
Very small global changes in the image (even clouds and thermal noise) will cause variations in the NN outputs for segmentation and bounding boxes, but the movement in the driver display is a lot larger than that.

Don't forget that they are projecting from 2d space into 3d space. When the bounding box moves that creates an angular difference which can move the vehicle quite far in 3d space. They are also trying to infer distance to target, which doubtlessly has very high uncertainty/noise, and small changes in that can lead to big changes in 3d space.

So in other words I disagree that the movement in the driver display is "a lot larger than" the jitter in the bounding boxes themselves. I think there's (a) likely a lot of jitter in bounding boxes and distance estimates for objects in the side cameras, which often are looking at a very close car and only seeing part of the car, and (b) even small jitter in 2d space causes large jumps in 3d space.

Edit: Also "smoothing" is not going to solve this problem without creating another problem of its own, namely reduced response time when vehicles are legitimately moving. You do sort of want a system like this to react quickly to vehicles that make sudden movements or are moving fast. What would solve this problem is, of course, lidar.
 
On the topic of vehicles and objects being misidentified and jumping around, I think it might also be helpful for folks to recognize that even an imperceptible difference to us could result in wild differences for the neural net:
Attacking Machine Learning with Adversarial Examples

(I’m not saying this is a case of bad “adversarial” data, but just an illustration of how difficult a real world environment is to identify digitally using frame-by-frame pixel analysis from a video feed.)
 
Can't have binocular vision with a single camera. You can fudge forward looking with cropping and lens adjustments and such with multiple forward cameras, but the others are all single camera per view direction otherwise. The forward cameras are all different fields of view / focal lengths.

There are 3 forward cameras of varying FoVs. Wherever there is overlap, you can measure angle on the overlapping section, and triangulate distance (how binocular vision works). This is what they're doing on the Curiosity Rover with the pair of Mast Cams with 2 focal lengths.

The other cameras also have some overlap (check Everygreen's videos), but where they don't overlap, the NNs are still spitting out distance data (doesn't seem to accurate). Jimmy_D noted the NN get fed pairs of frames, likely milliseconds apart. He hypothesizes the NN can use this information to triangulate distance using parallax, since the car knows how much it should move in that space of time. This is the same way astronomers determine distance to moderately distant stars by measuring 6 months apart when the Earth has traveled 2 astronomical units, and also the way my cat judges distance by weaving her head back and forth before jumping on top of the fridge from the counter.

I suspect this is why jitter seems worse when the car is stopped. With little to no lateral movement, the side cameras have a very small "denominator" in the triangulation calculation. So any imprecision there will appear magnified, throwing the distance further off. Functionally, it's probably less important to gauge distance when all parties involved are not moving.
 
Here is an actual EyeQ4 visualization, taken from the BMW Emergency Stop video. It's not pretty. There is a ghost car sitting on the rear bumper, and the overtaking car disappears and fragments as it passes:

86pMwW.gif


Later, we see what happens then the BMW starts changing lanes and is passed by another car that deforms and loses position as it passes. Again not pretty, but at least the tailgater has gone now...

PZ0z22.gif


If this was any other vendor, we would say that the vision/radar fusion is a bit rubbish, but because EyeQ4 is L5-ready, the only possible explaination is that the overtaking car in the first GIF is actually falling to pieces.

This is because of the micro-wormholes which inconveniently materialised on the BMW test tack as they were filming - the car disappears from the display every time it hits one. The second GIF shows the stretch effect of driving close to a micro-wormhole, without actually hitting it. Luckily that car didn't disintegrate too.

That's not eyeq4. Please stop spreading BS. That's using lidar, radar and one main forward camera.

Not only that, but the video itself is 8 years old. Again here you are embarrassing yourself.

 
Last edited:
That video .. LOL

You have to play it at 0.25x to get the actual speed of the car
The pedestrians are not detected as obsticles
The car lunges into a parking space (0.19s) before snapping back into the lane

The priority for ME, pre-Intel, was to create good-looking visualisations so that they could get funding & maybe even sell products. When they couldn't do that using the output from their product, it looks like they just manipulated the video...

View attachment 350778

Spreading false info after false info?
Even my 5 years old niece would understand that the only visualization enabled there is the cars and that the car isn't some sdc testing car.
Like seriously bro?
 
First you said...

The BMW X5 currently has it, and so will the BMW 3 series. It uses eyeq4 with a trifocal camera.

Obviously eyeq4 has a vision system capable of powering a level 5 sdc. But automakers usually do the bare minimal and sometimes only enough to get them their 5 star rating. Tesla with AP1 and supercruise being the exception. Audi A8 might also be tagged onto that list when its L3 is turned on after OTA Update Q1 2019 in Europe. Then theres Nissan L3 in Japan next year and then another big automaker with L2 using rem map by end of 2019.

So it's looking good for eyeq4 oem implementations.


But here's the breakdown of BMW features.


Driving Assistant Pro:

1. Steering & Lane keeping assist (anywhere)

2. Adaptive cruise control with automatic speed limit adjustment. (anywhere)

3. Automatic lane-change function using turn signals. (anywhere)

4. Unlimited hand-free driving (limited access freeway only) under 37 mph. Using driver monitoring camera ala supercruise. (Coming in OTA update 12/2018) - Extended Traffic Jam Assistant

5. Emergency Stop Assistant - There is a button you push (or if you pull the emergency brake handle) the car will change lanes by itself until the car gets onto the lane shoulder and stop the car safely. If you are in the middle lane or left most lane for example. It will change lanes till it gets to the shoulder. This is pretty cool L3-ish feature.


But then you said...

That's not eyeq4. Please stop spreading BS. That's using lidar, radar and one main forward camera.

Who is spreading BS again...?
 
  • Like
Reactions: Joel
Spreading false info after false info?
Even my 5 years old niece would understand that the only visualization enabled there is the cars and that the car isn't some sdc testing car.
Like seriously bro?

Seriously bro. Video shows "obstacle detection" but fails to detect 2 pedestrians walking in the road. Even my 2 year old niece knows that pedestrians are obstacles. That, plus the lunge into the empty parking space... it is a bit rubbish, isn't it?
 
Talking about Neural Nets as a critical part of FSD is like talking about optical character recognition as the key to teaching a computer to read. It's the absolute bare minimum you need and it's nowhere near to the whole solution.
Look how far behind they are. After 2 years of working on their own solution, the big feature they launched was:
lane changes- on protected roads, under strict human supervision. They minimal feature they released is way too unreliable to be trusted on it's own. This is the practically the sum total of 2 years of progress since switching from Mobileye.

Moving to a chip that's 10x faster doesn't solve FSD, anymore than the fact that your iPhone X is 100x faster than an iPhone 4 makes it capable of FSD. It's a minor implementation detail (and in any case, Tesla is basically just holding pace with NVIDIA's new chips).

Nothing Elon has promised around FSD has ever come to pass. It's always 2 years away.



I was super excited to read this stuff. Of course Elon always sounds confident, even when he's talking about really hard deadlines. And people have been wrong about self driving cars for a long time so there's plenty of precedent for being overconfident. Still, I'm really happy to hear this level of confidence about this level of capability becoming possible next year. That might mean we consumers get it a year later but it also says that five years away is probably really pessimistic.

I was thinking recently that, according to the 2018 2Q conf call Tesla has been driving HW3 for a while already: 6 months, maybe more. And before they had the NN chip prototypes they probably were building HW3-equivalent mockups for the cars to test big NNs in addition to big NNs in simulated driving - you need to do something like that just to inform what you put into the chip. So that might have been happening 2 years ago or longer. In other words, Tesla has had a good idea of what HW3 with a much bigger NN would be able to do for quite a while now. But they can't ship it until HW3 is available - which is a multi year effort that won't come to fruition for another 6 months yet.

If they've been sitting on this knowledge for 2 years, and if the results look really good - well that could explain a lot of the statements and actions of the company WRT FSD.

There's also this:

Yeah, I mean ... you need a specialized inference engine. Like the Tesla hardware 3 Autopilot computer, that will start rolling into production early next year, is 10 times better than the next best system out there at the same price, volume and power consumption. And it’s really because it’s got a dedicated neural net chip. Which basically, it sounds complicated, but it’s really like a matrix multiplier with a local memory.

This description is a perfect match for the "TPUv1 style systolic matrix multiplier coprocessor" approach. I still think that's probably what's going into Tesla's NN chip.
 
There are 3 forward cameras of varying FoVs. Wherever there is overlap, you can measure angle on the overlapping section, and triangulate distance (how binocular vision works). This is what they're doing on the Curiosity Rover with the pair of Mast Cams with 2 focal lengths.

But for the 360 view (sides and back) they do not have this, only for the forward view. And fundamentally, cameras can only record the angle of light coming into the camera; without a lot of separation between the cameras the focal length difference matters somewhere between not at all and vanishingly little. And the amount of separation between Tesla's forward cameras is only good for very close-range stereo.

Temporal stereo (consecutive frames taken while the vehicle is moving) is their best option, but this has problems if the object you're trying to gauge distance to is also moving.
 
  • Like
Reactions: Matias
Talking about Neural Nets as a critical part of FSD is like talking about optical character recognition as the key to teaching a computer to read. It's the absolute bare minimum you need ....

Mobileye Bullish on Full Automation, but Pooh-Poohs Deep-Learning AI for Robocars

Can't tell if you are trying to bolster the MobilEye case? If so I think you're doing it wrong.

Also, MobilEye hardware isn't facilitating the lane change or the driving algorithm. So making some sort of comparison on Tesla's progress versus MobilEye with auto lane change being your litmus test is kind of odd.
 
  • Disagree
Reactions: AnxietyRanger
I'm assuming this is a newer system than the one in the 2017 BMW 5 series that didn't do too well in the IIHS tests.

They should have stuck to the roads used in the MobileEye PR demos.
But for the 360 view (sides and back) they do not have this, only for the forward view. And fundamentally, cameras can only record the angle of light coming into the camera; without a lot of separation between the cameras the focal length difference matters somewhere between not at all and vanishingly little. And the amount of separation between Tesla's forward cameras is only good for very close-range stereo.

Temporal stereo (consecutive frames taken while the vehicle is moving) is their best option, but this has problems if the object you're trying to gauge distance to is also moving.

Maybe they're using another CNN model for depth, you can get a decent depth map from just mono vision. E.g.
 
And fundamentally, cameras can only record the angle of light coming into the camera; without a lot of separation between the cameras the focal length difference matters somewhere between not at all and vanishingly little. And the amount of separation between Tesla's forward cameras is only good for very close-range stereo.

There's about 80mm of separation between the two farthest forward facing cameras, and 40mm between each set. Human males average 65mm distance between our eyes, and from what I recall can judge distance by parallax to about 10 meters. Past that, it's mainly based on apparent size compared to known scale. Computer vision may be more precise. I don't know.

That said, focal length difference will only have an effect on resolution. My 12mm lens gives me the exact same field of view when cropped in as my 200mm telephoto lens. Field of view / perspective is based completely on position, not focal length. Focal length only determines how much of that total field of view is captured.
 
  • Informative
Reactions: NateB and J1mbo
Mobileye Bullish on Full Automation, but Pooh-Poohs Deep-Learning AI for Robocars

Can't tell if you are trying to bolster the MobilEye case? If so I think you're doing it wrong.

Also, MobilEye hardware isn't facilitating the lane change or the driving algorithm. So making some sort of comparison on Tesla's progress versus MobilEye with auto lane change being your litmus test is kind of odd.

That article clearly takes what Amnon said out of context. Amnon was talking about end to end nn.

 
  • Informative
Reactions: AnxietyRanger
There are 3 forward cameras of varying FoVs. Wherever there is overlap, you can measure angle on the overlapping section, and triangulate distance (how binocular vision works). This is what they're doing on the Curiosity Rover with the pair of Mast Cams with 2 focal lengths.

The other cameras also have some overlap (check Everygreen's videos), but where they don't overlap, the NNs are still spitting out distance data (doesn't seem to accurate). Jimmy_D noted the NN get fed pairs of frames, likely milliseconds apart. He hypothesizes the NN can use this information to triangulate distance using parallax, since the car knows how much it should move in that space of time. This is the same way astronomers determine distance to moderately distant stars by measuring 6 months apart when the Earth has traveled 2 astronomical units, and also the way my cat judges distance by weaving her head back and forth before jumping on top of the fridge from the counter.

I suspect this is why jitter seems worse when the car is stopped. With little to no lateral movement, the side cameras have a very small "denominator" in the triangulation calculation. So any imprecision there will appear magnified, throwing the distance further off. Functionally, it's probably less important to gauge distance when all parties involved are not moving.

I don’t think that you can use motion parallax, because you don’t know, at which speed the other object is moving. If you knew at which speed the other object is moving, you could use motion induced parallax.

If the other car is for instance moving at the same speed to the same direction, there is no parallax.