Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Tesla begins HD map development (collecting short camera videos) as i predicted

This site may earn commission on affiliate links.

Bladerskb

Senior Software Engineer
Oct 24, 2016
3,206
5,547
Michigan
Just tested it out. You can pick the Mars, James Bond and White board easter eggs off the screen to activate them now. The rainbow road gives a description of how to activate it.

Another new thing that I noticed this morning that I think is new with this update is "data sharing". It asks if Tesla can collect short video clips using the car's external cameras to learn how to recognize things like lane lines, street signs and traffic light positions. You have to checkmark "agree" twice to allow it to work.

This is exactly as I predicted. To the point that it needs its own thread.
Its funny how I have been right about everything I have said concerning the tesla's autopilot yet I have received unprecedented backlash.

EAP HW2 - my experience thus far... [DJ Harry] (only one camera active)

Enhanced Autopilot 2 may never happen before Full Self Driving is ready! (my accurate predicted timeline)

Whether its the delay for parity or number of cameras currently being used, or how neural networks/deep learning works and the way Tesla leverages it. All the way down to the requirement of High Definition maps that are accurate to 5-10 CM unlike Tesla's current high precision map that uses GPS logging and is only accurate to a-couple meters.

FSD needs three type of data.

1) HD Data: what mobileye is doing with REM. A map with exact lanes, lane markings, intersections, traffic sign lights, road signs, light poles, road barriers/edges, and landmarks.

I have always maintained that The only facts we have with us is that.
Tesla fleet learning consists only of gps logging and radar/gps logging.
gps based maps are not suitable for anything above level 2 autonomy.

All the way down to the fact that Tesla hadn't started their HD map development and didn't have / couldn't process the raw video data from Mobileye's AP1. I also said when they start their HD map development we will all know about it through announcement from Elon Musk himself or through data uploads from Wi-Fi.

we know they are not processing any further data because everything they have done they have bragged about. We know they collect gps logs and radar logs..

When they start collecting data from cameras and mapping out every lanes, traffic light, road sign, road marking, intersection, etc in the world.
we will know about it because elon won't hesitate to brag about it.

Infact in my very first post in this forum I listed exactly how this video collection will happen and what it will be used for.
I have also said in the past that they will only need short video clip or even just a picture in some cases and processed meta data in most cases. The short video clips are then annotated by a human and collated to be used in training the specific models.

IF an interesting situation were to take place. Or an interesting intersection or stretch of road were marked on the map for recording. All that needs to happen is for the car to record the last 30 seconds of the cars encounter and send it over to Tesla HQ.

Tesla are doing two things. They are creating a 5-10 cm HD map for the car to localize in (requires only metadata processed by the on-board dnn model).

The HD Map also includes exact location of traffic lights. How traffic light works in SDCs is that they look for the exact position of a traffic light. They don't look at the entire picture. since they know the exact position of what traffic light they want to examine from the HD map, they focus on it (requires only metadata).

HD Maps also includes what traffic light exactly corresponds to a lane or road. At an intersection there could be 10 traffic lights facing you and you need to know exactly which one you should be paying attention to in relation to where you want to go (*requires maybe video)

Detection NN Models are not perfect and you can get accuracy of 80% easily but to improve on that, you need millions of more pictures to train the network with. Improving a traffic light or stop sign detection model will (require video clip/image). It can also be improved to only take a picture of intersections that are not mapped yet / intersections that it fail to properly recognize.

When Raw Video clip ( which are just small number of picture frames) are uploaded, they are annotated by a human and collated.

Why would you need video clips and not just metadata


For example if your car were to come to a stop at an intersection with no car directly in-front of it and that intersection haven't been mapped and it doesn't detect a traffic/stop sign. The car will take a picture with the assumption that its traffic sign/light detection model failed and there must be a stop light, or stop sign somewhere. The picture gets sent back to HQ, which is then annotated by a human and collated.
 
Last edited:
Impressive example of quoting your own posts while patting yourself on the back. ;)

Lol i mean each individual thread shows how much backlash i get from making these posts.
It does feel good to be vindicated. But its not even about vindication, cause i knew i was right all along. I mean this is my career. But I believe everyone Tesla customer should know exactly how autopilot works and what to expect from it.

Plus i have alot more quotes that i excluded like below. Let's just say if i quoted everything i was right in. i will take up the entire first page. :p

Mobileye doesn't allow raw camera data.
Second of all there is no other chip to actually process the visuals of the camera.
There is no secondary chip to do an additional machine learning process.
All tesla has access to is converting the outputs from the EyeQ3 to actuators.
..their miles data only consists of literally gps location.


Someone said i should make a thread so i did.

Maybe you could start a new thread where you would list all your predictions and their materialization (I'm serious)
 
Last edited:
I got you something for your troubles...
smart-cookie-pic-copy.jpg
 
Impressive example of quoting your own posts while patting yourself on the back. ;)
I for one appreciate it. I've been following your posts and yours have been most informative regarding Tesla's presumed plans on automated driving. It also makes sense and gives me hope for the future that I didn't throw 3k and more down the drain...
 
@Bladerskb you have a very negative/agressive tone in your posts, and they are seldom accompanied by external sources, I think that's what make people suspicious. However the more I read about this subject, the more I am starting to realize that you are often right in your (blunt) statements. So I keep reading your stuff. Not saying I agree wth you, just that it's becoming more and more apparent that you're not 100 % full of bs :D
 
@Bladerskb you have a very negative/agressive tone in your posts, and they are seldom accompanied by external sources, I think that's what make people suspicious. However the more I read about this subject, the more I am starting to realize that you are often right in your (blunt) statements. So I keep reading your stuff. Not saying I agree wth you, just that it's becoming more and more apparent that you're not 100 % full of bs :D

Well I try to post as much external source as possible. But most people don't click on them and read them. Quoting them would sometimes be lost in the entire post plus i wanna make the post short as possible.

For example, detection of traffic lights.

http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37259.pdf

By using a prior map, a perception system can anticipate and predict the locations of traffic lights and improve detection of the light state. The prior map also encodes the control semantics of the individual lights. Cars must deal with traffic lights. The two main tasks are
detecting the traffic lights and understanding their control
semantics. Our approach to solving these two tasks has been
to automatically construct maps of the traffic light positions
and orientations, and then to manually add control semantics
to each light. We then use this map to allow an onboard
perception system to anticipate when it should see and react
to a traffic light, to improve the performance of the traffic
light detector by predicting the precise location of the traffic
light in the camera image, and to then determine whether
a particular route through an intersection is allowed. Our
system has been deployed on multiple cars, and has provided
reliable and timely information about the state of the traffic
lights during thousands of drives through intersections

However, it is easy to drive a mapping car instrumented with cameras, GPS, IMU, lasers, etc., through intersections and collect precisely timestamped camera images. The traffic light positions can then be estimated by triangulating from multiple views. All that is needed is a large set of welllabeled images of the traffic lights. The accuracy of the traffic light map is coupled to the accuracy of the position estimates of the mapping car. In our case online position estimates of the mapping car can be refined by offline optimization methods [Thrun and Montemerlo, 2005] to yield position accuracy below 0.15 m, or with a similar accuracy onboard the car by localizing with a map constructed from the offline optimization.

The input to the automatic mapping system is a log file that includes camera images and the car’s precise pose (Figure 2 shows the mapping pipeline). Generally, traffic lights will only occur at intersections, so we use geo-spatial queries (through the Google Maps API) to discard images taken when no intersections are likely to be visible. Unfortunately, Google Maps only includes intersections and not whether there is a traffic light at the intersection, which would make this winnowing process even more precise. C. Classification After winnowing the set of images to those that were taken while the car was approaching an intersection, we run a traffic light classifier over the entire image.

I quoted some relevant parts. But really reading the entire thing will illuminate some of the points i was making.
I also excluded alot of things that Tesla new update is doing to keep the post short.

Things like:

  • How far away is the traffic lights / intersection lines /stop signs (video clip frames), feed the frames with the distance information (get distance information from car exact speed/IMU) into a deep neural network to develop a model that can predict the distance of traffic lights, stop signs and intersection lines (needed for FSD)
etc.
 
Last edited:
This is exactly as I predicted. To the point that it needs its own thread.
Its funny how I have been right about everything I have said concerning the tesla's autopilot yet I have received unprecedented backlash.

EAP HW2 - my experience thus far... [DJ Harry] (only one camera active)

Enhanced Autopilot 2 may never happen before Full Self Driving is ready! (my accurate predicted timeline)

Whether its the delay for parity or number of cameras currently being used, or how neural networks/deep learning works and the way Tesla leverages it. All the way down to the requirement of High Definition maps that are accurate to 5-10 CM unlike Tesla's current high precision map that uses GPS logging and is only accurate to a-couple meters.





All the way down to the fact that Tesla hadn't started their HD map development and didn't have / couldn't process the raw video data from Mobileye's AP1. I also said when they start their HD map development we will all know about it through announcement from Elon Musk himself or through data uploads from Wi-Fi.



Infact in my very first post in this forum I listed exactly how this video collection will happen and what it will be used for.
I have also said in the past that they will only need short video clip or even just a picture in some cases and processed meta data in most cases. The short video clips are then annotated by a human and collated to be used in training the specific models.



Tesla are doing two things. They are creating a 5-10 cm HD map for the car to localize in (requires only metadata processed by the on-board dnn model).

The HD Map also includes exact location of traffic lights. How traffic light works in SDCs is that they look for the exact position of a traffic light. They don't look at the entire picture. since they know the exact position of what traffic light they want to examine from the HD map, they focus on it (requires only metadata).

HD Maps also includes what traffic light exactly corresponds to a lane or road. At an intersection there could be 10 traffic lights facing you and you need to know exactly which one you should be paying attention to in relation to where you want to go (*requires maybe video)

Detection NN Models are not perfect and you can get accuracy of 80% easily but to improve on that, you need millions of more pictures to train the network with. Improving a traffic light or stop sign detection model will (require video clip/image). It can also be improved to only take a picture of intersections that are not mapped yet / intersections that it fail to properly recognize.

When Raw Video clip ( which are just small number of picture frames) are uploaded, they are annotated by a human and collated.

Why would you need video clips and not just metadata


For example if your car were to come to a stop at an intersection with no car directly in-front of it and that intersection haven't been mapped and it doesn't detect a traffic/stop sign. The car will take a picture with the assumption that its traffic sign/light detection model failed and there must be a stop light, or stop sign somewhere. The picture gets sent back to HQ, which is then annotated by a human and collated.
It may not be what you are saying it might be how you are saying it.
 
Tesla want video clips to improve image recognition. In addition to being able to continue to collect road segment data.

You assume this is for HD mapping, but it could just as easily be for improving the DNN. Road segment data + disengagement + video can equally be used to tune the DNN.

No assumption here at all and I already mentioned that they will use this to improve their detection networks and also create new ones.
Road segment data is an industry term, it is a component of hd maping. Tesla for example had high precision map in which their RSD back then only consisted of gps then later radar (for whitelisting) and finally car speed (for fleet learned curvature) logging as i have proved.

Tesla is mapping out every lane on Earth to guide self-driving cars
Upgrading Autopilot: Seeing the World in Radar

Now their RSD will include lane lines, road signs and traffic light positions, etc just like Mobileye.
The only reason you would want the position of traffic lights for example is because you want to build a map of traffic lights and add control semantics to them.
The video collection is necessary to improve and build networks needed for hd mapping.

Road Experience Management™ (REM™) - Mobileye

The harvesting agents collect and transmit data about the driving path’s geometry and stationary landmarks around it. Mobileye’s real-time geometrical and semantic analysis, implemented in the harvesting agent, allows it to compress the map-relevant information – facilitating very small communication bandwidth (less than 10KB/km on average).

The relevant data is packed into small capsules called Road Segment Data (RSD) and sent to the cloud. The cloud server aggregates and reconciles the continuous stream of RSDs – a process resulting in a highly accurate and low TTRR map, called “Roadbook”.

Lastly this has nothing to do with autopilot disengagement.
 
  • Informative
Reactions: phaduman