If you have any interest in the CES 2016 presentation here are the notes I made from the video. They make a lot more sense if you watch the video and follow along.
Tesla gen 1 autopilot has “industry first” DNN, free-space, HPP, AEB (fusion)
2016 will have 5 new launches with 52 new models of cars using Mobileye technology (not autonomous driving but simple aids)
- More free space detection functionality
Second generation camera-only ACC - full speed, industry first
- General object detection (industry first for mono)
- Animal detection - volvo
- Traffic jam assist
Future
Two camps of thought on how to reach Level 5 autonomy - driver completely out of loop.
Three pillars of autonomous driving:
- Sensing: interpreting 360 degree awareness and make an “environmental model”
- Curves, barriers, guardrails, objects, other cars etc.
- Mapping: questionable assumption because humans don’t need maps to drive. 6:28
- What we mean is not clear - navigation maps, high definition (tomtom, here - diff uses), Google high resolution lidar maps
- Planning (driving policy): Answer to question why a 16 y/o kids needs to take driving lessons - learn to negotiate a driving path in the presence of other cars - “multi agent game” - other drivers on the road - some follow/bend/violate rules, some are aggressive, some are courteous, etc.
How to combine sensing & mapping - two main camps - and you must be committed once you are in the camp - you can’t cherry pick ideas.
- Sensing - pros in the field know what each tech can do
- Cameras - highest density of information
- Image variability is big challenge (day, night, dusk, rain, dust etc.)
- Radars - thousands of samples per second - more weather robust - can see through weather
- Lidars - hundreds of thousands of samples per second
- Can’t sense texture reliably but can sense 3D reliably
- Mapping - not as clear consensus in the profession - must think about localization - highly detailed map doesn’t help if you don’t know how to find yourself in it.
- None - no localization
- Navigation - gps approximately 10m
- HD-Maps (Tomtom, HERE) - 10cm
- Google 3D - centimeter scale, ½ gigabytes data per kilometer - 10cm
- Localization
- Localization - two camps - somewhere vs everywhere
- Google, Baidu - “somewhere” with full capability
- 3D detailed, cm scale, gigabytes data, can use low resolution lidar sensor. Doesn’t have to be dense - after you record the map you can use the “principle of subtraction” to find moving objects
- So Google can drive cars with only laser scanner because they have very
- Car industry - “everywhere” with partial capability
- Fusion between camera and radar
- Ultimate goal - “everywhere” with full capability
- Challenges to both camps
- Google
- geographic scalability
- Updates
- How to do it? Mobileye says “i don’t know”
- Car industry
- Stronger A.I. to get from partial to full autonomy
- Risky - we don’t know how much time to get to truly strong AI - 5 years or 50 years
- Therefore let’s settle for StrongER AI in the man time, and compensate for lack of full AI by using a detailed map in conjunction with deep learning
- Higher-resolution maps
The rest of this talk is about using Higher Resolution Maps with Stronger A.I.
Idea: Simplify the creation of high resolution maps by having one unit which interprets the scene and uses that ability to create and update high resolution maps via crowd sourcing. (Basis of Volkswagen and GM).
How to get stronger AI 360 sensing environmental model, planning
- Generate a sparse 3D map - not a detailed one - use landmarks (signs, posts etc.)
- Dense 1D - dense information for the lanes - but we don’t need full density for 3D.
- Crowd sourced - only 10 kb/km - 5 orders of magnitude smaller than Google’s approach.
- Advantage of this data - can be transmitted by cars to the cloud and transmitted back to the cars as a map.
Name: Road Experience Management (REM) - will talk about how they make the stronger AI and how they will build these higher resolution maps.
EyeQ3 - mono front-facing camera - today’s best technology
- Powerhouse of today’s ADAS (collision avoidance)
- Today 50 degrees, 2018 -> 75 degrees, 2019 -> 100 degrees
- Today 1.3M imager, 2019 -> 1.7M, 2020 -> 7.2M
- VERY low light sensitivity sensors - more than consumer cameras
EyeQ4 - trifocal front-facing
- 3 optical paths: 150, 50, 25 degrees
- Enables highway autopilot in a safe manner
- 4 production launches 2017/18
- Typically with front radar, 4 corner radars and in some cases front (back) lidar (veoscanna)
“Full Vision” 360 coverage: eyeq4 or multiple eyeq3's linked together
- Trifocal + 5 cameras: any segment of the field of view is covered yb at least one camera
- Together with redundancy layers (radar/lidar from moving objects, REM for drivable paths) will support Full Autonomous Driving
- Launches in 2017 with partial functionality - software will be updated over time
- Some initial launches will use multiple linked EyeQ3’s instead of one Eyeq4
Learning with well defined input/output relationships are ideal scenarios for machine learning - data is labeled/annotated by humans. Example: what is in a bounding-box - where is the path delimiter of a road (curb, hedge, concrete barrier, lane marker etc.). Give the computer lots and lots of examples and the neural networks go to work.
What is special about deep learning at Mobileye?
- They started in 2012
- EyeQ3 launched 10/2015 with Tesla’s autopilot and contains Mobileye deep learning algorithms - each one trained end to end with a deep learning module:
- Object detection
- Environmental model - free space
- Path planning - holistic path planning
- Scene recognition
- Unique challenges
- Real Time constraints - high-res at 36 fps
- Input/output modeling, network architecture and utility functions need innovation
- Least interesting problem is simple object detection
- Most likely we launched the industry’s first portable real time embedded deep network DNN in volume production - in any industry.
- Not connected to cloud - computations done real time in the car
- Not a garden variety network copied from some academic paper
Examples 3DVD - bounding boxes on each face of a vehicle - not just the rear end (as is done today).
- This is a difficult problem because the faces are not always visible
- This problem hasn’t been dealt with yet in any academic paper
- This capability is coming late 2017/early 2018 on one EyeQ4 and late 2016/early 2017 on a system running 3XEyeQ3’s via one manufacturer.
- Anything
Free Space through Pixel Labeling
- Uses context
- Launched on Tesla autopilot
- Boundaries of freespace have category labels - 15 different ones
- Path delimiter of moving vs stationary objects - and what types of objects they are within those two categories
Path planning using holistic cues
- Launched in Tesla 2015
- Integral part of “lane detection” going forward in 2016 in all programs
- System fuses information from lead vehicle with holistic path planning
Driver Policy / Planning - how to plan the vehicle’s next actions
- Autonomous cars must learn to drive like humans
- Driving is a “multi-agent” game - behaviors to be learned so FAD should adopt “human-like” driving skills
- This is a technological problem, not an ethical problem
Sensing vs Planning
- Sensing: the present, single agent, perfectly predictable - input/output - if you have enough data you learn the output from DNN mapping
- Technology - deep supervised learning with multiple end-to-end modules
- Planning/ driver policy -Planning for future, multi-agent, “what will happen if” reasoning, not perfectly predictable
- Technology - “reinforcement learning:”
- One type of RL: Deep Q-learning (Google DeepMind) - but it is not suitable for driving. Why not? Other agents are not Markovian, Q-function is not smooth (large Lipschitz constant), difficult to break down into separate modules, very long training time
- This talk is not technical so we won’t go into the above bullet point. See arXiv paper by Mobileye - on the website
- Example of planning problem - simulated host car (red) merging into round-about - after many iterations the car can learn the best way to merge into traffic the best way without upsetting other drivers.
MAPPING
Map definition
What should a map enable in the context of autonomous driving? Localization and finding the drivable paths assuming no obstacles (obstacles
What are the map requirements to enable “everywhere autonomous driving”?
- Map updating must be a continuous process - near real time
- Process must be “crowd sourced”
- Small data - 10kb per km
- Transmitting images or other raw data is out of the question
- Advanced processing must be done on-board
- Preferable not to introduce any dedicated hardware for this task - use cameras.
- GPS for localization?
- Problem - 10m accuracy (not consistent, much less in urban environments)
- DGPS, RTK - perhaps acceptable accuracy in open areas, not so for urban, city traffic
- SLAM - Simultaneous Localization and Mapping
- Idea - every image has a descriptor and you track these descriptors in subsequent images so you can localize yourself (look up wikipedia for more details). Many feature points per frame + ego-motion
- 1 MB / meter, 1 GB / km per camera
- TomTom “RoadDNA”
- Only localization
- Compressed lidar - 25kb / km
- Not crowd sourced - does not make map. HD-map[ is separate engine.
- Mobileye’s idea: Road Experience Management
Look for landmarks in the image - traffic signs, directional signs, general rectangular signs, lampposts and reflectors, additional families of landmarks (e.g., dashed lines) will be added if needed.
In absolute worst case - boring texas highway - every 100m you encounter a landmark.
This is a per land mark process, not a per image process - you use environmental model to find these landmarks.
Volvo Driveme project has 4 surround cameras and uses Nvidia to to do parking processing.
Cameras are for far range - not for near range. Mobileye is impressed with what Tesla has done for the near range with ultrasonic sensors.
Nvidia provides a “pre-trained” network - Mobileye says it is inadequate for auto industry production.
Roadbook is built only from forward facing camera. Needs only additional software for eyeq chip and means to communicate - existing 3G/LTE ability to communicate.
Who does Mobileye view as its real competitors:
- Classical - Bosch, Denso, Autolieve, Continental - companies with experience getting production awards - supplying both hardware and content. Maybe more. Not Nvidia.
Monetization
- Mobileye says it’s too early to speak. 2016 launch with GM - so by 2018 can talk about map services. Post 2020 things will be very interesting from a commercial POV. Mobileye thinks these maps will generate more income than regular GPS maps today. Shared mobility business models.
- Build infrastructure for future - the money will come.
Who owns the data?