Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Is all neural networks really a good idea?

This site may earn commission on affiliate links.
For instance, with fast precise absolute distances, including on sides of car with full-angle high res lidar or radar you could enforce policy that prevents the collisions and rim curbings in extreme situations while letting the net drive most of the time.
Good post. The excerpt I took out is the only part that I somewhat differ with. I do think that more elaborate sensors could be fused more easily than was the case with the heuristic based system, but to solve the curbing I don't think it takes particularly fancy hardware.

First, I think that ongoing training can and will solve the present situation with curbing issues, But it didn't have to happen and I also think the main original fault here is the deficit in low forward/side camera coverage.

A lot has been said about side looking cameras farther forward, and about bumper cameras for parking. I believe that both would be very helpful, and at least 3 years ago I was arguing for cameras in the headlights or elsewise at the corners. But today I'd say that the single add-on that would have the most bang for the buck (no pun intended) would be a forward-looking camera element added to the side repeaters. In simple terms, extend the existing rearward view to form a continuous panoramic view that takes in the sides, curbs and sighting along the front fender. This would do a great deal for near field, cross traffic creeping view, and perception of oncoming lanes that are obscured by oncoming left-turning traffic or stopped ehicles directly in front.

Regarding radar, maybe we will see that in the next generation (not to image curbs and near field infrastructure which it wouldn't be very good at, but for an extra layer of adversarial traffic detection). Still, I think just filling in the camera POV gaps is the highest priority and probably the most cost efficient.
 
  • Like
Reactions: DrChaos
A lot has been said about side looking cameras farther forward, and about bumper cameras for parking. I believe that both would be very helpful, and at least 3 years ago I was arguing for cameras in the headlights or elsewise at the corners. But today I'd say that the single add-on that would have the most bang for the buck (no pun intended) would be a forward-looking camera element added to the side repeaters. In simple terms, extend the existing rearward view to form a continuous panoramic view that takes in the sides, curbs and sighting along the front fender. This would do a great deal for near field, cross traffic creeping view, and perception of oncoming lanes that are obscured by oncoming left-turning traffic or stopped ehicles directly in front.
Its pretty shocking they haven't added more cameras---the usual position other makers use is outboard on the mirrors which can give a great view in many directions, close to what you suggest. And it's useful for looking down for parking/curbs/children and human use as well.
 
  • Like
Reactions: JHCCAZ
Its pretty shocking they haven't added more cameras---the usual position other makers use is outboard on the mirrors which can give a great view in many directions, close to what you suggest. And it's useful for looking down for parking/curbs/children and human use as well.
In a monolithic system, messing with the pixels would mess up the whole system. You'd have to retrain everything for each permutation of sensors, and that would mean collecting training data for every permutation.

I'm surprised that the trend hasn't been towards a separation of perception and control. Modularization. That would allow them to change the perception system in any way they want. Different camera locations, more cameras, different sensors, etc. Perceive the world anyway you want, then have the normalized output of the perception system drive the control system.

It solves the problem of changing sensors, the problem of collecting training data, and the distinction in control for vehicles of different sizes and different control metrics. If I can collect enough Model 3 training data, then I should have enough normalized perception data to train the Cybertruck.

I suppose there would be problems with a kind of myopia when training on high quality perception data and trying to train the controls for a low quality perception vehicle. The controlling vehicle just wouldn't see all the stuff that it was trained on. Or not see it as well, as soon, whatever.
 
In a monolithic system, messing with the pixels would mess up the whole system. You'd have to retrain everything for each permutation of sensors, and that would mean collecting training data for every permutation.

I'm surprised that the trend hasn't been towards a separation of perception and control. Modularization. That would allow them to change the perception system in any way they want. Different camera locations, more cameras, different sensors, etc. Perceive the world anyway you want, then have the normalized output of the perception system drive the control system.

I think the second is standard. It has the problem that the representations for perception were human decided and not necessarily optimal for control, so if the human-decided perception dimensions are insufficient for good driving performance the system will not learn to fix it. It's not really known what humans do, but a vector space of bounding blocks and lableled 'objects' is probably not it. The 10.x and 11.x systems did that, and they have big limitations.

It's backpropagating policy errors into better perception representations, even if less externally interpretable, where one might eventually get better performance.

 
I think the second is standard. It has the problem that the representations for perception were human decided and not necessarily optimal for control, so if the human-decided perception dimensions are insufficient for good driving performance the system will not learn to fix it. It's not really known what humans do, but a vector space of bounding blocks and lableled 'objects' is probably not it. The 10.x and 11.x systems did that, and they have big limitations.
I wasn't suggesting the V11 perception system. The goal is to normalize the perception data, but as little as possible. I don't know where that line is, but that would be the goal of the exercise. For example, a spherical surface with the images from the cameras projected onto it. Or line and area recognition on that spherical surface. And so on. Something that bridges the various perception configurations while giving the control system a uniform set of data to work from.

It's backpropagating policy errors into better perception representations, even if less externally interpretable, where one might eventually get better performance.
Having people understand the representation isn't the goal, only normalization. My examples above are limited by my ability to communicate. It could be a normalization that is defined somehow by training. I'm not sure what sort of meta-operation that is, but it would allow for the two halves of the system to talk to each other despite each perception system working from a different starting point. Like most here, I'm not well-versed in the particulars.
 
I wasn't suggesting the V11 perception system. The goal is to normalize the perception data, but as little as possible. I don't know where that line is, but that would be the goal of the exercise. For example, a spherical surface with the images from the cameras projected onto it. Or line and area recognition on that spherical surface. And so on. Something that bridges the various perception configurations while giving the control system a uniform set of data to work from
That may be possible but the shared configuration would be the lowest common view of these and not the highest performing. They must do something like this to some degree as 3 and Y training are shared with otherwise identical cameras but higher forward camera position on the Y.

Supposedly the HW4 cameras full resolution isn't being fully used yet.

Having people understand the representation isn't the goal, only normalization. My examples above are limited by my ability to communicate. It could be a normalization that is defined somehow by training. I'm not sure what sort of meta-operation that is, but it would allow for the two halves of the system to talk to each other despite each perception system working from a different starting point. Like most here, I'm not well-versed in the particulars.
If there is super high fidelity simulation of these differences (maybe not really possible) the usual approach is data augmentation where the training system includes artificially modified examples of the same underlying phenomenon mapped to the same outcome so the system learns to ignore those differences.

In the end all these tricks will be lower performing than "doing the needful" and collecting as much high quality real data from as many cars as possible. If at some point the unsupervised model (which can use inexpensive data) can scale without needing these augmentation tricks because of the fleet size. Ideally they should be able to train even policy networks self-supervised (i.e. without human annotated/curated labeling) by measuring stable high quality drivers when they drive on NAV, follow the route, but are driving manually while the FSD data are collected.

Perhaps the latest push out of the version will enable this to a greater population.
 
  • Informative
Reactions: JB47394
That may be possible but the shared configuration would be the lowest common view of these and not the highest performing. They must do something like this to some degree as 3 and Y training are shared with otherwise identical cameras but higher forward camera position on the Y.

Supposedly the HW4 cameras full resolution isn't being fully used yet.


If there is super high fidelity simulation of these differences (maybe not really possible) the usual approach is data augmentation where the training system includes artificially modified examples of the same underlying phenomenon mapped to the same outcome so the system learns to ignore those differences.

In the end all these tricks will be lower performing than "doing the needful" and collecting as much high quality real data from as many cars as possible. If at some point the unsupervised model (which can use inexpensive data) can scale without needing these augmentation tricks because of the fleet size. Ideally they should be able to train even policy networks self-supervised (i.e. without human annotated/curated labeling) by measuring stable high quality drivers when they drive on NAV, follow the route, but are driving manually while the FSD data are collected.

Perhaps the latest push out of the version will enable this to a greater population.
I've always wondered if Tesla quietly uses safety score or similar to find the "good driver" cohorts. The other possibility is that good drivers stand out in the training set because good drivers all do very similar things, while bad drivers are bad in a wide variety of ways. The good drivers will all be clustered together, while the bad drivers will be scattered all over the place.
 
I've always wondered if Tesla quietly uses safety score or similar to find the "good driver" cohorts. The other possibility is that good drivers stand out in the training set because good drivers all do very similar things, while bad drivers are bad in a wide variety of ways. The good drivers will all be clustered together, while the bad drivers will be scattered all over the place.

With their existing/previous rule-based driving system there is some sort of underlying optimization problem and scoring of preferred paths and it simulated these and had rules to choose/filter them.

They could run that off-line back in the lab (with higher fidelity as well) and find segments & hence people who scored well on this.

Pretty soon the problem is not finding these clips, but finding enough diversity. The 'natural measure' (speaking mathematically) from a uniform selection of the existing fleet is going to have lots of easy driving in California which it does fine at---the challenge is finding and properly training the infinite variety of other stuff. The train set will have to have all the weird stuff heavily overrepresented compared to their natural occurrence probability.

If they're able to fully close the loop and can effectively use self-supervised training and curate the distribution, they will have finally achieved something significant. And this will finally scale well with more data. They'll have better and better filters for pulling out useful train data in both policy and perception and a fleet size which will let them get enough and I hope a training pipeline which can ingest huge datasets with little human labelling needed.

On the chatGPT type of system the amount of unlabelled data (basic text pretraining) is enormous vs the expensive human labelled (for them it's the instruction tuning and reinforcement learning specific examples which are hand crafted by intentional thought and human authorship). The biggest discovery of the LLMs which made them a big deal is how well the relatively small amount of human labelled training can leverage the huge dataset's previous training for good representations, that they can do interesting things with such small instruction datasets. By the way---that was an unexpected empirical discovery---not something that was intentionally engineered. They really shouldn't be that smart, being quite low level and stupid down below.

I'm hoping the same would apply for driving. The 1st phase training is on a giant set of reasonably well-driven human scenes taken automatically as above, then on top of that a fairly small amount of 'instruction tuning' which is examples which make the car do what the nav controller tells it should be doing to achieve a given goal (which the pre-training didn't have as people were just doing their own thing). I assume this is the architecture or where they're going to.

I'm finally hopeful about the FSD, as the giant quantity of raw data theoretically available could finally be useful in a scalable way where limits is only train capacity. Right now they say "trained on 1 million video clips". Why not trained on a billion video clips? Perhaps 5,000 human created instruction tuning they could manage (i.e. a human verifies/programs in the exact relationship between the nav destination, internal state and selects the best preferred driving outcome) but taking advantage of tremendous base datasize which can refine the representations to be whatever is able to video predict scenes and naturally observed driving choices forward in time.
 
Last edited:
I've always wondered if Tesla quietly uses safety score or similar to find the "good driver" cohorts. The other possibility is that good drivers stand out in the training set because good drivers all do very similar things, while bad drivers are bad in a wide variety of ways. The good drivers will all be clustered together, while the bad drivers will be scattered all over the place.
I can certainly tell you one way Tesla uses the safety score. I got 12.3.3 yesterday and went for a 50 mile drive. It is much better than 12.3, however - It was hard accelerating toward a yellow traffic signal which turned red when we were about 20 feet from the 'stop' line. It slammed on the brakes and slid to a stop with the tires squealing and chirping to about 20 feet past the stop line. Fortunately there wasn't anyone behind me so, no harm no foul. Of course the "hard stop" ding appeared on my safety score and my estimated premium went up 168%.

I did get an adapter kit to insert an OBD connector in the console in series with the can bus. I've been holding off putting it in, because I was fearful of problems with the car warranty if I did install it, but now the insurance is so far out of control I will have to go ahead and install it so I can get reasonable priced insurance.
 
  • Like
Reactions: enemji
I was under the impression that any miles driven on AP/FSD don't count against your safety score. Is that no longer the case?

In any case, if you feel it about to slam on brakes for no reason, you can simply feather the accelerator to cruise throught the yellow light.
 
  • Like
Reactions: enemji