One of the probable changes in FSD progression is that v12 will address a range of cases that Tesla has pretty clearly avoided to date, or kept low on the priority list. This is both good and bad.
----------
TLDR: with v12, Tesla may have a breakthrough, but the very things that allow it to work surprisingly well with little human generated code, also represent a reduction in Tesla's ability to control how it decides to behave. And Tesla's past ability to decide when and how much to emphasize certain scenarios, like school zones, is now possibly unavailable or much more difficult. It may prove necessary to provide the ML network with a useful data beyond the navigation map info, giving it a chance to perform better in scenarios that Tesla can no longer easily bypass or forbid it from attempting.
----------
Consider for example, the whole issue of stopping for school buses and behavior in school zones. This has been discussed off and on (here is a 2022 thread), but my conclusion is that Tesla has avoided addressing it because of the unfavorable risk-to-reward ratio. If they attempt to deal with this complex problem, and mostly but not completely succeed, it gives FSD drivers a false impression that they can probably let the car handle it - then with particularly harsh consequences if it doesn't.
Of course, this false confidence problem exists it nearly all aspects of FSD operation. We could point to any number of other driving cases that are presumably on the to-do list, but the issue of school bus / school zone behavior is a particularly complex and sensitive example.
With the prior major versions of FSD, there was a clear opportunity for the Autopilot team to make choices, coding whatever level of partial effort or stopgap band-aids they deemed necessary to to keep these issues on the shelf while they work on the rest of the problem.
But if we take Elon and Ashok's explanations as a marker for the continuing nature of v12 end-to-end FSD, they won't be writing a module to recognize school buses or school zones at all, whether for the purpose of improving them or deliberately ignoring them. It would seem that the training set will will include them, and the learning network will try to mimic what it sees.
I think this could work reasonably well for the problem of school buses with flashing lights and flip out stop signs. As noted in prior discussions, it's more challenging to deal with school zones that command altered behavior " When Children are Present" or "When School is in Session". To me, the latter suggests that the map data would helpfully include this information in a form that's available and understandable to the neural network. As a counter-argument, perhaps the system will essentially teach itself to read signs and conclude when school is in session from looking at school parking lot and traffic patterns.
Again this is only an example. The question for v12 edge cases is whether just providing an enormous quantity of training data, covering many past examples over many calendar dates, will prove to be sufficient. Or does Tesla need to work on providing more inputs regarding the local community environment, giving it at least a chance to draw the correct conclusions despite its lack of general real-world knowledge?
And even supposing Tesla were to make a developmental policy decision to deprioritize school zones for now, how could they do that?
By auto-curating the training clips to remove school-in-session scenarios? That seems counterproductive and possibly dangerous.
By writing special-case code to bypass the issue and let the L2 driver handle it, like prior versions? That runs quite counter to the philosophy described, and becomes a slippery slope back to tens or hundreds of thousands of lines of code to handle special cases and operational guardrails.
I maintain my optimism for the v12 approach, but as I continue to think about it, I do think it needs more attention to the availability of information that humans ingest by just living in society. The weather report, the school schedule, recognition of scheduled or impromptu public events and so on.
Most of us agree that it's too much to expect high level AGI to emerge from under the glove box. I'm just saying that the system may need more sources of information to draw upon then just the nav route and the camera inputs. And I'm saying that I think Tesla will find it challenging to pick and choose scenarios that they want to deprioritize in the meantime. They already noted the problem of the system learning human behavior at stop signs, behavior that displeases NHTSA and runs counter to the teaching high school Drivers Ed videos.
In the coming months, I expect there to be a number of tweets from Elon about wonderful and surprising good behavior emerging from the system training. The question is, how much unacceptably and surprisingly bad behavior could come because the system doesn't know, and presently has no way of being told, information that human citizens take for granted? And further, in what ways could non-real time, non-visual information be made available, enabling it to become more intelligent, even if not "generally" intelligent?
Finally, I note that any such information, if added to the FSD computer inputs during operation, must also be included in the set of training data around each clipped scenario - otherwise the system can't train on how to associate it and use it in conjunction with the camera data. Hopefully, Tesla's data and Telemetry infrastructure is flexible enough to allow experimentation with these kinds of concepts. Ashok mentioned that they were looking at the system being able to take verbal suggestions or directives from the human operator/passenger. That makes me optimistic that they could extend the data and telemetry set to include other, yet to be determined forms of information.
----------
TLDR: with v12, Tesla may have a breakthrough, but the very things that allow it to work surprisingly well with little human generated code, also represent a reduction in Tesla's ability to control how it decides to behave. And Tesla's past ability to decide when and how much to emphasize certain scenarios, like school zones, is now possibly unavailable or much more difficult. It may prove necessary to provide the ML network with a useful data beyond the navigation map info, giving it a chance to perform better in scenarios that Tesla can no longer easily bypass or forbid it from attempting.
----------
Consider for example, the whole issue of stopping for school buses and behavior in school zones. This has been discussed off and on (here is a 2022 thread), but my conclusion is that Tesla has avoided addressing it because of the unfavorable risk-to-reward ratio. If they attempt to deal with this complex problem, and mostly but not completely succeed, it gives FSD drivers a false impression that they can probably let the car handle it - then with particularly harsh consequences if it doesn't.
Of course, this false confidence problem exists it nearly all aspects of FSD operation. We could point to any number of other driving cases that are presumably on the to-do list, but the issue of school bus / school zone behavior is a particularly complex and sensitive example.
With the prior major versions of FSD, there was a clear opportunity for the Autopilot team to make choices, coding whatever level of partial effort or stopgap band-aids they deemed necessary to to keep these issues on the shelf while they work on the rest of the problem.
But if we take Elon and Ashok's explanations as a marker for the continuing nature of v12 end-to-end FSD, they won't be writing a module to recognize school buses or school zones at all, whether for the purpose of improving them or deliberately ignoring them. It would seem that the training set will will include them, and the learning network will try to mimic what it sees.
I think this could work reasonably well for the problem of school buses with flashing lights and flip out stop signs. As noted in prior discussions, it's more challenging to deal with school zones that command altered behavior " When Children are Present" or "When School is in Session". To me, the latter suggests that the map data would helpfully include this information in a form that's available and understandable to the neural network. As a counter-argument, perhaps the system will essentially teach itself to read signs and conclude when school is in session from looking at school parking lot and traffic patterns.
Again this is only an example. The question for v12 edge cases is whether just providing an enormous quantity of training data, covering many past examples over many calendar dates, will prove to be sufficient. Or does Tesla need to work on providing more inputs regarding the local community environment, giving it at least a chance to draw the correct conclusions despite its lack of general real-world knowledge?
And even supposing Tesla were to make a developmental policy decision to deprioritize school zones for now, how could they do that?
By auto-curating the training clips to remove school-in-session scenarios? That seems counterproductive and possibly dangerous.
By writing special-case code to bypass the issue and let the L2 driver handle it, like prior versions? That runs quite counter to the philosophy described, and becomes a slippery slope back to tens or hundreds of thousands of lines of code to handle special cases and operational guardrails.
I maintain my optimism for the v12 approach, but as I continue to think about it, I do think it needs more attention to the availability of information that humans ingest by just living in society. The weather report, the school schedule, recognition of scheduled or impromptu public events and so on.
Most of us agree that it's too much to expect high level AGI to emerge from under the glove box. I'm just saying that the system may need more sources of information to draw upon then just the nav route and the camera inputs. And I'm saying that I think Tesla will find it challenging to pick and choose scenarios that they want to deprioritize in the meantime. They already noted the problem of the system learning human behavior at stop signs, behavior that displeases NHTSA and runs counter to the teaching high school Drivers Ed videos.
In the coming months, I expect there to be a number of tweets from Elon about wonderful and surprising good behavior emerging from the system training. The question is, how much unacceptably and surprisingly bad behavior could come because the system doesn't know, and presently has no way of being told, information that human citizens take for granted? And further, in what ways could non-real time, non-visual information be made available, enabling it to become more intelligent, even if not "generally" intelligent?
Finally, I note that any such information, if added to the FSD computer inputs during operation, must also be included in the set of training data around each clipped scenario - otherwise the system can't train on how to associate it and use it in conjunction with the camera data. Hopefully, Tesla's data and Telemetry infrastructure is flexible enough to allow experimentation with these kinds of concepts. Ashok mentioned that they were looking at the system being able to take verbal suggestions or directives from the human operator/passenger. That makes me optimistic that they could extend the data and telemetry set to include other, yet to be determined forms of information.