Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Waymo tests imitation learning for path planning

This site may earn commission on affiliate links.
Waymo just published a blog post and paper describing how it tested imitation learning for path planning, something that Tesla has recently been reported. Waymo trained a path planning neural network to handle a few driving tasks using 1) imitation of 1,440 hours of human driving and 2) simulation. The path planning network, ChauffeurNet, was able to stop for a stop sign, stop for a yellow light, nudge around a parked car, and follow other cars while maintaining the flow of traffic.

It’s important to understand this is not end to end learning (sometimes called “behaviour cloning”):

“In order to drive by imitating an expert, we created a deep recurrent neural network (RNN) named ChauffeurNet that is trained to emit a driving trajectory by observing a mid-level representation of the scene as an input. A mid-level representation does not directly use raw sensor data, thereby factoring out the perception task, and allows us to combine real and simulated data for easier transfer learning.”​

The “mid-level representations” are the metadata, like the bounding boxes, labels, and location and motion estimates you see in verygreen’s “What Autopilot sees” videos. The perception neural network does its thing, and then outputs abstract information (not raw pixel information) to the path planning neural network.

It’s also important to understand Waymo only used supervised learning, not reinforcement learning:

“Beyond our approach, extensive simulations of highly interactive or rare situations may be performed, accompanied by a tuning of the driving policy using reinforcement learning (RL). However, doing RL requires that we accurately model the real-world behavior of other agents in the environment, including other vehicles, pedestrians, and cyclists. For this reason, we focus on a purely supervised learning approach in the present work, keeping in mind that our model can be used to create naturally-behaving “smart-agents” for bootstrapping RL.”​

In the paper, Waymo says that ChauffeurNet performs worse than Waymo’s current path planning system, which uses a combination of machine learning and hand-coded rules. However, they also say that there is room to build on this approach:

“...the model is not yet fully competitive with motion planning approaches but we feel that this is a good step forward for machine learned driving models. There is room for improvement: comparing to end-to-end approaches, and investigating alternatives to imitation dropout are among them. But most importantly, we believe that augmenting the expert demonstrations with a thorough exploration of rare and difficult scenarios in simulation, perhaps within a reinforcement learning framework, will be the key to improving the performance of these models especially for highly interactive scenarios.”
If my interpretation of Amir Efrati’s reporting in The Information is correct, Tesla is testing (or perhaps even using in production today, who knows) an imitation learning approach to path planning. What’s most exciting there is Tesla has the capacity to collect so much more metadata (a.k.a. mid-level representations) on human driving. Even if Waymo’s average speed over those 1,440 hours was 90 mph (an unrealistically high estimate), that’s only a total of about 130,000 miles. Tesla has around 300,000 HW2 cars on the road. The average U.S. driver drives around 1,000 miles per month. So HW2 Teslas are probably driving around 300 million miles per month. This means Tesla has an ocean of data to collect from. It would be feasible for Tesla to collect billions of miles of data if that would be useful. Tesla could collect four orders of magnitude more data than Waymo used for ChauffeurNet.
 
Very limited anecdotal evidence suggests that HW2 cars might upload an average of 900 MB+ per month. In another thread, verygreen said a HW2 car might generate 8 MB of metadata per minute of driving, although there is a lot of variation depending on how many objects are nearby. Also, I think this is just a rough guess off the top of verygreen’s head and not a hard number calculated from file sizes. So take this with a grain of salt.

Let’s randomly guess that 1/3 of data uploaded — or 300 MB+ per month — is metadata (with the other 2/3 being sensor data). If we assume an average speed of 30 mph (0.5 miles per minute), and we assume an average of 300 MB of metadata is generated per minute, then 300 MB is 30 minutes’ worth of driving, or 15 miles’ worth. That would imply a total of 150,000 hours’ worth or 75,000 miles’ worth of metadata is uploaded per month across all 300,000 HW2 vehicles. Annualized, that’s 1.8 million hours or 900,000 miles — but of course annualizing these figures is misleading because the HW2 fleet is rapidly growing, and the rate at which it’s growing (i.e. the production rate) is growing.

Any of these key variables could be way off — MB of metadata per mile, monthly MB uploaded per HW2 car, and percentage of uploaded MB that is metadata. It’s also possible that Tesla has so far been uploading very little or no standalone metadata. This is all just guesswork.

In September, a small investment firm called Worm Capital published a blog post about their visit to the Gigafactory. In the post, they say:

“According to Tesla, the company believes it can gather 1 billion miles of data per year from current drivers.”
Two problems: 1) this is second-hand info and 2) it’s not stated what kind of data will be uploaded. For instance, it could just be GPS data for the navigation system.

The big picture is, regardless of what Tesla is doing now, at some point in the future it can upload billions of miles of metadata that would be useful for training a path planning neural network. Since the accuracy of the metadata is reliant on the accuracy of the perception network, Tesla may want to wait until the perception network has reached a certain threshold of accuracy before it starts uploading significant metadata. The perception network reaching that threshold of accuracy may, in turn, have to be wait until after the release of Hardware 3, and possibly after a lot more sensor data is collected and labelled.

Unlike any other company (so far), Tesla is in a position to try out this imitation learning approach to path planning with billions of miles of human driving data. A sufficiently human-like path planner trained using supervised learning with a combination of real world data and synthetic simulation data could also be used to populate a simulator with vehicles that exhibit human-like driving behaviour. As Waymo says, that might be the key to unlocking the power of reinforcement learning for path planning. Supervised imitation learning is therefore 1) intrinsically useful, in training a path planning network and 2) potentially also instrumentally useful, in enabling reinforcement learning.

I personally feel eager to see Tesla try both supervised imitation learning and reinforcement learning because that takes the hand-coding out of path planning, and makes path planning a learnable task. The limitation on path planning performance shifts from a) the human ability to articulate implicit knowledge as a set of explicit rules, expressed in a programming language to b) the amount of training data that can be obtained, and the design of the network architecture and hyperparameters. (b) can be improved much, much faster than (a). As long as self-driving car progress depends on progress in hand-coding how to drive, who knows how slow or stagnant progress might be. But if progress depends on collecting training data and innovation in neural networks, then there is hope for a rapid takeoff.

If Tesla can do actually good reinforcement learning in simulation, then maybe they can quickly do trillions of simulated miles and go from subhuman to superhuman performance on path planning over the course of a few quarters or a few years. Maybe it’s not realistic to think that an AlphaGo-like trajectory of progress can be replicated with reinforcement learning in a robotics application like self-driving cars. Robots have to deal with sensor noise, an astronomically larger state space than Go, and uncertainty about how actuator commands will actually translate into real world physical events. However, there have been some recent promising proofs of concept from OpenAI and Google applying reinforcement learning to robotic hands and arms. So maybe reinforcement learning will work for path planning too.

Once a feature moves from simulation to test vehicles to the production HW3 fleet, maybe it will be possible to do reinforcement learning at scale in the real world too. The training signal can take the form of disengagements, aborts, and crashes, and perhaps other events like bug reports, collision warnings, and hard brakings. Just an idea.

 
Last edited:
Just a note. Metadata is normally defined as:
“a set of data that describes and gives information about other data

usually this means that a data file contains some meta data for example file format, version, size, date. After the metadata comes the data.
 
Just a note. Metadata is normally defined as:
“a set of data that describes and gives information about other data

usually this means that a data file contains some meta data for example file format, version, size, date. After the metadata comes the data.

It’s metadata in the sense that it’s data about the sensor data. Rather than a file name, size, etc. it’s an object type, location, etc.

Perhaps it’s not exactly the right term but we need something shorter than “mid-level representations”.

Verygreen made this visualization of the Autopilot metadata:

 
@SandiaGrunt

The thing is, @strangecosmos seems to have only one thesis when it comes to the autonomos space: Tesla is on the right path. Tesla’s sole benefit — one not shared by others that is — in autonomous development (they have another in deployment) is their ability to collect vast amounts of consumer data.

Therefore his entire thesis and raison d’être in this space would fall apart if he seriously entertained the possibility that more data is not the key. He is not about to do that. I expect he will continue seeing everything through this prism and postulating that more data is the key until either proven right or wrong over time.

Mind you @strangecosmos does admit he may be wrong. Kudos to him for that. But that does not change the fact that he only has this one thesis and will likely continue with it because it is tied to a singular belief Tesla got something right here others did not. This belief-system limits his ability to entertain other options seriously. A TSLA related interest may also exist.

Being an amateur is not a sin. Most of us here are amateurs in the autonomous space. But I do find conversations with people who are eager to entertain multiple theories and avenues much more constructive though.
 
Last edited:
@SandiaGrunt

You will never get through to him, he sees everything through a Tesla tinted glass. Every new paper results in an article at SA, a thread or post about how "Tesla could be doing/using this and only them can do this, this is why they are X ahead."

The whole data collection, shadow mode have been debunked long long time ago. I have proven without a reasonable doubt how architecture and the structure of data is what's driving NN innovation not actually more data. I have also proven that the data being used are actually getting smaller not larger because of how good architecture are advancing.

ChaffuerNet conclusion was that end to end imitation learning won't work using millions/billions of expert demonstration.

we naturally asked ourselves the question: given that we had millions of miles of driving data (i.e., expert driving demonstrations), can we train a skilled driver using a purely supervised deep learning approach?

Yet Trent's conclusion is that "Great, if Tesla dumps a billion miles to it they will get it to work.".

Its like, what? Did you even read the paper? It clearly says "we presented our experience with what it took to get imitation learning to perform well in real-world driving. We found that key to its success is synthesizing interesting situations around the expert’s behavior." and "For this reason, simply having a large number of expert demonstrations to imitate is not enough. "

The reason they got it to work is because of the amount of synthetic data they added like trajectory permutation and crashes into objects, curbs, etc. They only needed a small percentage of their millions of miles data. Adding 4x or 40x more data wouldn't get the job done as they just proved.

Tesla even says that their users only get into a crash-like event about once every 3 million miles. Which is clearly no where near what you need to get an even serviceable system. The amount of real data needed is actually very small compared to any number Trent actually tosses out.

Infact for example, Google's DeepMind (which works with Waymo) duplex was created from 24 hours of voice. Using Trent logic it would take years of human voice data to get to that. But no that's not how it works. Google released their architecture and the paper associated with it and dozens of researchers and companies have tried to replicate it and have failed.

How your data is organized and your arch is way more important the amount of data.

Google AI Blog: Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone
 
Last edited:
Here are a few false statements:

The article literally says that imitation learning for path planning is not the right path. They said, "Therefore, the bar for a completely machine-learned system to replace the Waymo planner is incredibly high."

Waymo is saying that billions of miles of typical human driving data is useless

Waymo is literally saying that approach will not work

I think you are confusing your own opinion or inference with what Waymo actually said. Waymo simply said that ChauffeurNet underperforms their current hybrid path planning system, and suggested directions for future work. Waymo did not say imitation learning is “not the right path”, that “billions of miles of typical human driving data is useless” (those miles would include many rare, difficult scenarios and thousands of crashes and near-crashes), or that the imitation learning approach “will not work”. This is all just your own editorializing. It’s easy for anyone to read the blog post and the paper and to see these statements you are claiming Waymo made don’t appear anywhere. Your claims are just false.

Since you’re just making up statements that Waymo didn’t say and falsely attributing them to Waymo, I don’t think I need to engage with you any further. This is your first post on this forum, and it’s aggressive, rude, and it contains multiple false claims. This is a major red flag for a troll/sockpuppet/brigading account.

On the broader question of whether more training data improves neural network performance, those curious might find these two studies interesting:

Google AI Blog: Revisiting the Unreasonable Effectiveness of Data

Advancing state-of-the-art image recognition with deep learning on hashtags - Facebook Code

The rate at which performance increases slows the more data you add beyond a certain point (i.e. there are diminishing returns), but nonetheless performance continues to increase, and performance that beats the state of the art can be achieved just by adding more data — and data that is only weakly labelled by Instagram hashtags. A good number of hashtags for objects on Instagram (e.g. “#car”) are appended to images that don’t contain the object. It has not yet been tested how much performance might increase if this volume of images were properly labelled.

Of course, it isn’t an either/or thing. You can grow your training dataset and try to improve on your neural network architecture (and hyperparameters) simultaneously. I don’t see why it has to be just one or the other. You can do both.

In terms of reinforcement learning, there may be studies also, but an interesting anecdotal example is OpenAI’s Dota bots. The team describes significant improvements in the Dota bots’ ability to play literally overnight — because it was playing the equivalent of years’ worth of Dota while the team slept. A highly skilled human player who beat the bot today might get defeated tomorrow. So, in reinforcement learning too, more training data — at least sometimes — means better performance.

P.S. It would surprise me if anyone actually thought I’m an expert on AI — why wouldn’t I be working at Tesla or Google or Shopify or wherever instead of blogging and posting in forums? Even on major news sites, tech journalists who write about AI are never (at least, I don’t know a single example) experts on AI. They just sometimes talk to experts to learn what they need to learn to write their article, and sometimes get it wrong. The norm is that tech writers are not experts.

Also, I’ve only ever heard non-experts complain about other non-experts giving their opinion on AI... and then proceed to give their own opinion. This seems just to be a bad faith argument.

P.P.S. Those who are interested in good faith discussion: I would encourage you to put everyone who’s replied to this thread, except heltok, on your ignore list, and/or to check out Gradient Descent which is a moderated autonomous car forum.
 
Last edited:
A few expert opinions on the Waymo imitation learning paper.

From Bharath Ramsundar, who has a Stanford PhD in deep learning and who created DeepChem:

“This paper is a tour-de-force. Imitation learning shaped by auxiliary losses allows for reasonable control of a self driving car. If this were scaled in a deep RL engine, I can believe it could lead to robust driving in the next decade”​

From Oliver Cameron, CEO of the self-driving car startup Voyage:

“The conclusion from the ChauffeurNet team?​

"The model is not yet fully competitive with motion planning approaches but we feel that this is a good step forward for machine learned driving models."

[thumbs up emoji] This is impressive work from @Waymo. I think it goes to show just how much exploration there is still to do.”​
 
Last edited:
The rate at which performance increases slows the more data you add beyond a certain point (i.e. there are diminishing returns)

Oh, the sweet, delicious irony.

You literally posted an article that proves my point. The blue curve (including fine-tuning) is basically 2 log(N).

If increasing the number of miles driven results in similar logarithmic improvements, that means that going from 10 million miles (Waymo's current tally) to 1 billion miles (Tesla's current tally) would increase performance along some metric from 2 log(10 million), which is ~32 to 2 log(1 billion), which is about ~41, an improvement of about 28%.

image2.png

Now, let's look at the effect of novel improvements in architecture over time, in the winners of the ImageNet challenge over the last few years:

Winner-results-of-the-ImageNet-large-scale-visual-recognition-challenge-LSVRC-of-the.png


From AlexNet to GoogLeNet-v4, there's been an improvement of 529%. That's all due to improvements in architecture, because the ImageNet dataset is unchanged.

By my count, at least 90% of the posts / articles you've posted to seekingalpha, medium, and here have implicitly or explicitly depended upon the idea that Tesla is or will soon be the leader of the race to autonomy because they will or have collected a larger training set. This is not at all supported by any evidence.

Are you honest enough with yourself, and with your readers, to go back and delete all of your erroneous posts?
 
  • Informative
Reactions: croman and Matias
Here’s my thing with this. The issue for me isn’t so much that reinforcement learning can’t work for this or even that more data can’t help. I think a reasonable interpretation certainly can be a yes reinforcement learning can work for this and more data can help as well. How well, how much are up for reasonable discussion as shown. This is interesting research and I hope the studies continue.

The problem for me is the interpretation again that this will somehow help Tesla in particular in post #1.

Here we have Waymo the leader in autonomous driving and a part of the corporate entity that has brought us most if not all the greatest deep-learning activity of recent times (DeepMind at Google) doing this impressive stuff. The leaders of autonomy and deep-learning working together: Waymo, DeepMind, Google. Yes as some quoted people said, this is impressive stuff. Kudos to Waymo and its corporate family.

But again somehow the interpretation turns to: this is exciting for... Tesla! A completely unrelated entity with a checkered history in NN development that is behind the above-mentioned leaders in both categories: showing autonomous and deep-learning results.

What?

It isn’t that I don’t kind of get the thesis. If you believe consumer data collection and consumer deployment are keys to autonomous you can see Tesla having some advantages others lack. Sure. But that is a very particular benefit which would have to offset all the disadvantages Tesla also has in this race to overtake everyone else which seems to be the continued suggestion they might do... which would require not just match but to significantly exceed the speed of progress of the leaders to overtake.

That I take issue with. The dismissal of all the disadvantages Tesla has and not properly weighted against the realistic interpretations of any advantages they have. Again we see great progress (in terms of great research at least) from companies ahead of Tesla in this... and somehow this turns into a potential advantage for the trailing Tesla! Wow. This is the biggest problem with this thesis in my view.
 
Last edited:
From AlexNet to GoogLeNet-v4, there's been an improvement of 529%. That's all due to improvements in architecture, because the ImageNet dataset is unchanged.

By my count, at least 90% of the posts / articles you've posted to seekingalpha, medium, and here have implicitly or explicitly depended upon the idea that Tesla is or will soon be the leader of the race to autonomy because they will or have collected a larger training set. This is not at all supported by any evidence.

Are you honest enough with yourself, and with your readers, to go back and delete all of your erroneous posts?

Oh my goodness... :rolleyes: It’s been shown by Facebook that you can beat the state of the art performance on ImageNet just by throwing a ton of weakly labelled (i.e. largely mislabelled) images at an existing neural network architecture. Therefore the best overall performance will be attained by a neural network that has both 1) the best architecture and 2) the best training dataset (in terms of size, diversity, and labelling quality). I don’t think this is a controversial claim at all. I think most, if not all, experts take this for granted.

It’s important to emphasize that the Facebook study uses very weakly labelled data, not data labelled in the careful, manual way that Tesla or Waymo would label data. Just take a look at any hashtag on Instagram that corresponds to an ImageNet category and count how many out of the first 10 or 20 images actually contain the object.

It’s an open question how much incremental improvements like those shown by Facebook and Google matter in practical terms for self-driving cars. To answer this question with any confidence, we would need to benchmark the state of the art performance of neural networks against humans, which is something I’ve looked into and I just can’t find very much good research on it. One study in 2011 found that ConvNets were already better than humans at classifying cropped images of traffic signs, but I’m skeptical that the experimental conditions translate to the real world task of traffic sign recognition.

If neural networks today are within a few incremental improvements of human performance, then a few incremental improvements could have huge practical importance for self-driving cars. If not, then not. As I said, it’s an open question and I can’t find any persuasive hard data on this so far.

With reinforcement learning, the story is different than supervised learning. According to Waymo, the problem with using reinforcement learning for path planning is not eking out consecutive incremental improvements. The problem is populating a simulator with naturalistically behaving drivers so that reinforcement learning can be used in the first place. There may be other problems with using reinforcement learning in simulation for path planning, both known and yet to be discovered, but that’s the one Waymo highlights. Waymo says in their blog post and their paper that supervised imitation learning (as was used for ChauffeurNet) is a potential way to make naturalistically behaving simulated drivers. Therefore, it’s a potential way to make reinforcement learning in simulation work for path planning. It doesn’t sound to me like this is certain to work, but Waymo, Bharath Ramsundar, and Oliver Cameron all seem to think this is something worth trying.

If you can successfully get reinforcement learning working well in simulation, from what we’ve seen with, for instance, OpenAI’s Dota bots, you can get significantly improved performance literally overnight just by running the simulator for ~8 hours. As I said above, there are challenges with getting reinforcement learning to work for robotics applications, but we have seen some interesting proofs of concept and, who knows, maybe this will work for self-driving cars. Not even experts seem confident whether it will work, so I certainly am not, but as a layperson who is interested in self-driving cars and new ideas related to them, that sure sounds interesting to me. Again, “worth trying” is the operative phrase.

By the way, what are your credentials in the field of deep learning, and what’s your proof that you have those credentials? I cited an expert with a Stanford PhD in deep learning who disagrees with you. Do you have better credentials than him? It seems like you might just be, like me, a non-expert with strong opinions...
 
Last edited:
It’s metadata in the sense that it’s data about the sensor data. Rather than a file name, size, etc. it’s an object type, location, etc.

Perhaps it’s not exactly the right term but we need something shorter than “mid-level representations”.

Verygreen made this visualization of the Autopilot metadata:


I see. Not sure if this has a formal term. We used to call it processed data and postprocessed data, where the latter is processed given that we know what happens in the future, which can be very useful for training but should be avoided for testing.
 
I see. Not sure if this has a formal term. We used to call it processed data and postprocessed data, where the latter is processed given that we know what happens in the future, which can be very useful for training but should be avoided for testing.

Well, the formal term in AI is “mid-level representations”, but that’s so long to type. :p Maybe abbreviate to MLR? But that acronym is already used by Major League Rugby...

Verygreen has (I think) been referring to it as “metadata” (unless I misunderstand) and that term stuck in my head. But it may be too misleading or confusing to be a good term. So mid-level representations is probably the best term to use, for the sake of clarity.
 
Therefore the best overall performance will be attained by a neural network that has both 1) the best architecture and 2) the best training dataset

Of course -- you're right, that's not controversial at all. My point is that Tesla can spend years grinding away at a larger and larger dataset and see only meager logarithmic improvements. Meanwhile, somebody else can have a stroke of genius that will immediately produce 5x or 10x improvements all at once. This has been the story of AI for decades. Google / DeepMind / Waymo has won ImageNet three of the last six years, yet you only cheer for the team whose strategy is just grinding out more and more miles...

One study in 2011 found that ConvNets were already better than humans at classifying cropped images of traffic signs

We're talking about path planning in this thread, not image classification. It is true that image classification is already "solved" to superhuman levels. Path planning? Nope.

It doesn’t sound to me like this is certain to work, but Waymo, Bharath Ramsundar, and Oliver Cameron all seem to think this is something worth trying.

Of course this work is valuable! It also happens to totally shame Tesla, as it is evidence that Tesla's approach is doomed.

By the way, what are your credentials in the field of deep learning, and what’s your proof that you have those credentials?

The sad state of social media is that it democratizes knowledge, but also reduces everyone to "just another idiot on Reddit," even the experts. Their voices are always drowned out by a sea of idiots. You would be surprised to find out who I am, but I cannot say; NDAs are a bitch. I guess you'll just have to judge my arguments by their merits.
 
Of course -- you're right, that's not controversial at all. My point is that Tesla can spend years grinding away at a larger and larger dataset and see only meager logarithmic improvements. Meanwhile, somebody else can have a stroke of genius that will immediately produce 5x or 10x improvements all at once. This has been the story of AI for decades. Google / DeepMind / Waymo has won ImageNet three of the last six years, yet you only cheer for the team whose strategy is just grinding out more and more miles...

This is not a purely technical argument, but a partially technical argument about talent acquisition, proprietary R&D vs. open research, and competitive advantage. You seem to be conflating perception (or, generously, inside knowledge) of the AI industry with technical expertise. What course teaches students about the competitive landscape of the AI industry? I’ll gladly take it.

By the way, I also do cheer for Waymo. As I’ve said a few times before, I just want fewer people to die in car crashes. Whatever company or strategy makes that happen I will fully cheer for.

As an investor, I might invest in Waymo if it were spun out as a separate stock. A big part of the reason I’m not already invested is just financial, not technical: Alphabet has a huge market cap that wouldn’t even double if Waymo hit a $500 billion valuation. Tesla, by contrast, would increase its market cap ~10x if that happened.

Also, there is the complication of Waymo’s relationship with car manufacturers. If Waymo robotaxis threatens to bankrupt automakers, will they be so eager to supply Waymo with the vehicles it needs? Will Waymo have to share a large portion of ride-hailing revenue to placate them? If Alphabet attempts to buy an automaker, will regulators approve the deal? Even if you were to assume that Waymo is ahead of Tesla on the technology, the investment case for Tesla might still be better for financial and business reasons.

We're talking about path planning in this thread, not image classification. It is true that image classification is already "solved" to superhuman levels. Path planning? Nope.

Then your argument from analogy about ImageNet may be inapt. How do we know whether there are diminishing returns with supervised imitation learning for path planning, and at what size does a dataset start to hit diminishing returns? Waymo used 1,440 hours, 30 million examples, and probably ~50,000 miles (assuming an average speed of 30-40 mph) for ChauffeurNet. What if diminishing returns aren’t reached until 1 billion miles? Moreover, what if — even with diminishing returns — human-like and human-level path planning can be achieved with 3 billion miles using current neural network architectures?

Then a company that can collect 3 billion miles of mid-level representation data can get to human-like, human-level path planning without any neural network architecture innovation. And a company that can’t collect 3 billion miles of data can’t. Maybe a stroke of genius will happen at Alphabet to increase training data efficiency 100x before Tesla collects 3 billion miles of data... but maybe it won’t. Which would you say is more likely?

Keep in mind the current HW2 Tesla fleet is driving at an annualized rate of 3.6 billion miles per year, and at the current production rate the fleet is set to double in size in 12 months. The production rate itself is planned to increase by 50% by the end of 2018, and to double by the end of 2019.

Of course this work is valuable! It also happens to totally shame Tesla, as it is evidence that Tesla's approach is doomed.

What exactly do you think Tesla’s approach to path planning is? How do you think it differs from ChauffeurNet? And why do you think that approach is doomed?
 
Last edited:
This is not a purely technical argument, but a partially technical argument about talent acquisition, proprietary R&D vs. open research, and competitive advantage.

To expand on this: it’s important to clarify that Tesla’s strategy isn’t “just” increasing training data. It seems like Tesla is also working on neural network architecture innovation. It’s a very reasonable argument that Alphabet is better than Tesla at neural network architecture innovation, but Tesla is a highly desirable company for tech workers, and it seems like it has some of its AI talent working on this too.

It’s not all or nothing; the question is what the marginal utility of Alphabet’s presumed lead in neural network architecture innovation is relative to Tesla. How much better at neural network architecture innovation is Alphabet versus the median $1 billion+ Bay Area or Seattle area tech company? Assuming Tesla is somewhere between Alphabet and the median due to Tesla’s high desirability among tech workers, how much better at neural network architecture innovation is Alphabet versus Tesla?

Is it reasonable to think, for example, a 100x improvement in training data efficiency could be achieved at Alphabet without Tesla (and other companies) being aware of similar work? It seems more common for neural network progress to happen in steps than to take a 100x leap all at once based purely on the work of one team or one company.

Also, how much of Alphabet’s best research is shared in academic papers versus kept secret? How much of the best research overall on neural network architectures is made public by companies, academic teams, and non-profits like OpenAI? Again, I think it’s reasonable to argue that Alphabet might develop, or might already have, secret neural network architectures that are better than the publicly known state of the art. The question is how much better. 100x?

There seems to be some amount of inevitable pure guesswork, pure speculation here. The nature of secret things is we don’t know about them. I don’t have an answer. But I think a better way to frame the question than “architectures vs. training data” (which is not the operative distinction) is “the marginal performance improvement of Alphabet’s putatively superior secret architectures vs. the marginal performance improvement of Tesla’s architectures caused by its much larger training dataset”.
 
Last edited:
I'm intentionally choosing not to respond to parts of your posts. Sorry. Some parts I simply can't comment on, others are failed attempts to mansplain how machine learning works, and others are awkward attempts to cajole me into agreeing with you that Tesla is just really super great after all. I'll pass.

By the way, I also do cheer for Waymo. As I’ve said a few times before, I just want fewer people to die in car crashes. Whatever company or strategy makes that happen I will fully cheer for.

I have never seen you cheer for any organization but Tesla. Is it Elon? Is it that the model names spell S3XY? Is it the frunk?

what size does a dataset start to hit diminishing returns

Diminishing returns are always present. You posted an article showing that diminishing returns were clearly evident at N = 10 million. Diminishing returns don't "start" at any particular N.

What exactly do you think Tesla’s approach to path planning is?

I don't know, but Karpathy has made videos describing Tesla's notion of "Software 2.0" as fully machine-learned. The Waymo article makes it clear that a fully machine-learned solution is not possible at this time, and that a great deal of hand-coded work went into their planner.

It seems like Tesla is also working on neural network architecture innovation.

jimmy_d says V9 is just a bigger Inception network, which is pretty underwhelming.

the marginal performance improvement of Alphabet’s putatively superior secret architectures vs. the marginal performance improvement of Tesla’s architectures caused by its much larger training dataset

The point I keep trying to make is that the "much larger training dataset" is really of limited use, permitting only boring logarithmic improvements in performance. I think you would grow as a tech enthusiast and writer if you were talk about something else -- really anything else -- besides Tesla's "much larger training dataset."
 
I don't know, but Karpathy has made videos describing Tesla's notion of "Software 2.0" as fully machine-learned. The Waymo article makes it clear that a fully machine-learned solution is not possible at this time, and that a great deal of hand-coded work went into their planner.

Oliver Cameron, the CEO of Voyage, has a different interpretation. So does Bharath Ramsundar, who created DeepChem and has a Stanford PhD in deep learning. Why do you think these experts disagree with you? How does your expertise compare with theirs?

Waymo says in the paper that ChauffeurNet is inferior to their current hybrid system, but they do not say that supervised imitation learning can’t or won’t surpass their hybrid approach. Maybe your opinion is that supervised imitation learning can’t or won’t surpass Waymo’s current hybrid approach, but Waymo doesn’t say that and Bharath Ramsundar in particular sees this approach, if used to bootstrap reinforcement learning (an idea Waymo suggests), as a potentially viable path to robust autonomous driving. So, why do you think Bharath Ramsundar is wrong? Do you have a technical argument?

I’m actually interested in hearing the other side of this issue. What is it?

Diminishing returns are always present. You posted an article showing that diminishing returns were clearly evident at N = 10 million. Diminishing returns don't "start" at any particular N.

The point I keep trying to make is that the "much larger training dataset" is really of limited use, permitting only boring logarithmic improvements in performance. I think you would grow as a tech enthusiast and writer if you were talk about something else -- really anything else -- besides Tesla's "much larger training dataset."

I don’t think you can look at data about ImageNet classification with training datasets beyond 10 million images and just assume that the exact same pattern will hold for path planning beyond the current largest known training dataset of ~50,000 miles. With image classification, if you were to look at the number of training images between 1 and 500,000, say, I bet you would see a different rate of performance increases per additional image than looking at the numbers between 10 million and 300 million.

With supervised imitation learning for path planning, we don’t yet know what the curve looks like. Say we go from a) 50,000 miles to 100,000 miles, and then from b) 100,000 miles to 150,000 miles. Will the increase in performance per additional mile of training data increase, diminish, or stay the same between (a) and (b)? If it diminishes, by how much? Since we don’t have any data on this, we can’t say.

You are making the assertion that “only boring logarithmic improvements in performance” will come from increasing the training miles used for supervised imitation learning for path planning beyond the ~50,000 miles used for ChauffeurNet. But what is your evidence for this? You seem to be reasoning by analogy to what we see when we increase training datasets for image classification from 10 million images to 300 million images. But is there any evidence besides that?

The analogy to ImageNet is a non-sequitur. You could equally make the same argument about any new application of supervised learning, no matter how little training data had been collected. This amounts to a universal assertion that no matter the task, and no matter the size of the existing training data set, more training data will always only result in “boring logarithmic improvements in performance”. This is overextending the reach of Google’s result, which only applies to increasing the training dataset for image classifiers from 10 million to 300 million.

Is your argument anything beyond an analogy to ImageNet results? Please tell me if I’m missing something deeper. I would love to find out if I’m wrong, but I need evidence to change my mind. If you can present me with evidence that changes my mind, you would be doing me a service.


jimmy_d also said:

“the more interesting aspect of the change to the camera interface is that camera frames are being processed in pairs. These two pairs are likely time-offset by some small delay - 10ms to 100ms I’d guess - allowing each processed camera input to see motion. Motion can give you depth, separate objects from the background, help identify objects, predict object trajectories, and provide information about the vehicle’s own motion. It's a pretty fundamental improvement to the basic perceptions of the system.”
And:

“Well, the V9 network appears to be camera agnostic. It can process the output from any camera on the car using the same weight file. ... I didn’t expect to see a camera agnostic network for a long time. It’s kind of shocking.”
And:

“The thing I find most impressive about the network I wrote about is the scale of resources needed to create it - that's the part of my post which strays into hyperbole. Second is the audaciousness of the architecture - it's ground breaking. The runtime resource requirement is also impressive, but it's the least shocking of the three. We know HW3 is coming and that it's a lot more powerful. This has to be because Tesla intends to run more resource intensive networks, so it's no surprise that the networks are getting more resource intensive.”
Somewhere, maybe it was on this forum or in his podcast interview, I remember Jimmy saying that one of Tesla’s innovations (was it camera agnosticism?) was worthy of being written up in a research paper.

I have never seen you cheer for any organization but Tesla.

I love everyone who is working on self-driving cars. I remember watching videos from the DARPA Urban Challenge around 10 years ago and feeling excited. Since then I have lost friends to car crashes. Competition is important for technological progress, and we can even watch the competition like a spectator sport. But ultimately there are lives at stake and it doesn’t matter who gets there first, as long as they get there as fast as possible. I would happily lose every dime I have invested in Tesla if it meant the advent of full autonomy at scale were accelerated by a month.

In another thread on this forum, I wrote:

“I am rooting for Waymo, Cruise, Mobileye, Apollo, Voyage, Aurora, Wayve, and whoever else is working on the problem. If any of them succeed, we all win. We can save lives, create economic growth, and make it way easier and more enjoyable to get around. Make it happen!”
I follow all of these companies and I think they’re all doing great work. I once sent flowers to an organization that I think is doing important research. I would happily send flowers to any of these companies.
 
Last edited:
  • Helpful
Reactions: croman
Therefore his entire thesis and raison d’être in this space would fall apart if he seriously entertained the possibility that more data is not the key. He is not about to do that. I expect he will continue seeing everything through this prism and postulating that more data is the key until either proven right or wrong over time.

Mind you @strangecosmos does admit he may be wrong.

The problem is that he's already wrong. He takes statements out of context and try to twist it to fit his view points.
Then he ignores clear facts that simply disprove his theories while using the same fact in other topics when he chooses.

Q3 2018 Vehicle Safety Report

1. Tesla themselves say they recorded 1 crash-like event in 1.92 million miles when humans were driving.
- That's a crash-like event (might not be even a crash as near misses are included) every 2 million miles, 100 every 200 million miles, 1000 every 2 billion miles, 1,500 every 3 billion. (Trent says 3 billion miles would be what's needed and there will be thousands of crashes and near misses. Yet Tesla's report says otherwise).

2. Clearly Tesla doesn't have thousands of crashes in the miles ranges that Trent posted.

3. You need similar proportions of data for perturbations and not "thousands" to train a NN model to avoid a particular event. Unless your model will be ridiculously over fitted. This is NN 101 but Trent seems out of his elements in this.

To remove a bias towards driving straight the training data includes a higher proportion of frames that represent road curves.

TWVgJnm.png


- NN are all about having balance in data in literally anything you want to accomplish. You can't have 30 million examples of straight driving and then 1,500 examples of "crash-like events" examples. You need "crash-like event" in the millions similar to the proportion of your data-set. The Waymo and even the Nvidia e2e paper clearly shows that.

Training with data from only the human driver is not sufficient; the network must also learn how to recover from any mistakes, or the car will slowly drift off the road. The training data is therefore augmented with additional images that show the car in different shifts from the center of the lane and rotations from the direction of the road.

They accomplish this by having a left, center and right camera. The left and right camera being used for perturbations to simulate the car being off-center, off the road and over the line. This is similar to what waymo had to do to avoid going off lane, off road, and crashes.

we augment the data by adding artificial shifts and rotations to teach the network how to recover from a poor position or orientation. The magnitude of these perturbations is chosen randomly from a normal distribution. The distribution has zero mean, and the standard deviation is twice the standard deviation that we measured with human drivers.

The images for two specific off-center shifts can be obtained from the left and the right cameras. Additional shifts between the cameras and all rotations are simulated through viewpoint transformation of the image from the nearest camera.The steering label for the transformed images is quickly adjusted to one that correctly steers the vehicle back to the desired location and orientation in two seconds.

Figure 3 shows a block diagram of our training system. Images are fed into a CNN that then computes a proposed steering command. The proposed command is compared to the desired command for that image, and the weights of the CNN are adjusted to bring the CNN output closer to the desired output. The weight adjustment is accomplished using back propagation as implemented in the Torch 7 machine learning package.

- Trent is an example of a person who feverishly tries to talk about something they don't grasp.

4. Waymo had millions of miles of data but only used a small percentage of it. If using more data would greatly improve performance wouldn't waymo had done that?

- Waymo used expert data consisting of 60 days of continual driving.
- This translates to 1440 hours of driving (60 * 24 hours)
- Which translates to 57600 miles of driving at the average speed of 40 MPH (which is the avg speed outside of highways).

Nivida in their Dave 2 system used 10 fps of data from 72 hours of driving because a higher sampling rate would include highly similar data.

As of March 28, 2016, about 72 hours of driving data was collected. Our collected data is labeled with road type, weather condition, and the driver’s activity (staying in a lane, switching lanes, turning, and so forth). To train a CNN to do lane following, we simply select data where the driver is staying in a lane, and discard the rest. We then sample that video at 10 FPS because a higher sampling rate would include images that are highly similar, and thus not provide much additional useful information.

- If Waymo sampled data at 10 FPS they will end up with 51 million driving examples. If they sampled at 6 FPS, they will end up with 31 million driving examples which is a match for what they ended up actually using (30 mil).

If driving expert demos were the key why only use 1440 hours (57,600 miles) of it? When they have access to tens of millions of miles?

5. Trent claims Tesla are doing neural net architecture innovation and listed collecting two sets of time delayed images input and camera agnostic network. The problem is, none of these are NN architecture innovation. Literally every network that uses imagenet IS A CAMERA AGNOSTIC NETWORK. This is what makes Trent thesis so laughable. Its always based off erroneous premises. Imagenet contains images from thousands of different camera types, camera FOV, camera angles, etc. Yet this is somehow new? Shows you how much of a shill the other guy making these claims actually is.


End-to-End Deep Learning for Self-Driving Cars
 
Last edited:
The problem is that he's already wrong.

Be that as it may my take is @strangecosmos would consider a certain endgame as proof of being right or wrong. Maybe I give him too much credit but I do believe if someone starts selling an autonomous car before Tesla he would concede.

The tricky part of course is nobody will start selling an autonomous car outright. This will happen in stages and stages can be judged through biased glasses.