Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
In other news- people keep being surprised higher resolution cameras produce better images.

None of which tells us what resolution is sufficient for actual self driving of course.
Yeah, computers see images totally different:

And this is before raw photon count that will greatly improve this performance. And do people really struggle with driving in the night with HW3?
 
So Trump will likely be made favourite to win presidency soon. There is no way Americans will select Biden unless he does okay in a debate, which seems very unlikely. The west will likely step up in the short term but it won't be enough. What deal will Trump look to get with Putin?
Even Biden might beat Trump again, he's been doing himself no favors with much of his party as time has gone on and the court cases are piling up against him. Still, it's possible we'll be entering undiscovered legal frontiers though, if he's convicted and the 14th amendment isn't used to remove him from the ballots. In theory he could win while sitting in jail, and then what? Might be the elected VP immediately gets a promotion, might be Trump gets to skip out on jail (which to be realistic, is unlikely to be anything but "house arrest" for him). Maybe the VP becomes acting President while a multi-year court battle rages, and if it outlasts his term, then what?
 
I was also impressed by Elon‘s demonstration of the alpha-Version of FSD V12. However, in hindsight, the sucess is no surprise considering Rich Sutton‘s famous blogpost.

General approaches just relying on increasing AI computing power (like FSD V12) have proven to be more successful than approaches relying on „good“ rules (like FSD V11). I think the core of this phenomeon is just a human bias. We are all human and as human proud of our unique way of thinking and therefore we want the computing system doing it like we would do. I think we/humans are underestimating the constraints of human thinking in complex systems and also I think we are underestimating the amount of mistakes we make and how hard is to find and fix these mistakes (aka optimizing the system).

The Bitter Lesson​

Rich Sutton​

March 13, 2019​

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.

In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that ``brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.

A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.

In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use.

In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.

This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.

In my opinion Elons drive in Paolo Alto marks the beginning of a new era in human history. The coming new world has different characteristics than the old. While it took a long time optimizing FSD V11, for me it is not a given that the time needed to optimize FSD V12 will be comparable. We just have no idea since analogies are missing for significant parts of the problem. As Tim Zaman wrote on X, Tesla is adding 10‘000 H100 Clusters on monday, a significant additon to Tesla‘s video training capacity. The AI computing capabilities are now increasing fast.

Remember, FSD V12 is just one arrow in Tesla‘s quiver. The other obvious step is video training for human-like work with Optimus and then with robots in different shapes likes farming machines or construction machines. Computers winning chess against humans was nice. Computers translating languages better than human is nice. Computer summarizing text and voice or answering chat questions is nice. But Robots able to conduct the majority of physical work in the long run, this is next level.

The new world will likely have a „the winner takes it all or a few winners take it all“ characteristics, since the access to gigantic AI training systems is a prerequisite. As soon AI computing is available, the then established players will be very strong and could be able to prevent the entrance of new players for example using the network effects.

Since we are in a serious climate crisis which is far from beeing solved, I welcome a possible world dominated by Tesla as an opportunity to weaken the progression of the climate crisis. I also feel fear regarding the disruptions the future will bring. But for sure we need change, so I will welcome the change.
 
Having had some time to process V12 I think there will be some issues with end2end.

The previous feature engineered system had a lot of advantages. For example visualization, indicating which car it was waiting for in blue, saying why it did something, chiming when a light turns green etc. When you go black box you lose this, the model is not even told what a traffic light is much less that it should visualize it. If you look at the display it looks like they were showing cars and lanes in the visualization, indicating that it's not the fully end2end that is being displayed. Maybe they were running the V11 network in shadow mode and were displaying that.
Screenshot 2023-08-27 at 11.50.21.png


With V12 they predict how a good alert driver would drive. Before they predicted the environment and then used tree search to find the optimal solution to minimize energy, jerk, risk etc. The optimal solution would have a better cost function in most scenarios, but suffer from bugs in rare situations. Like this:

I guess for now the end2end performs better than the feature engineered solution. But I think they want to add search and optimization to the solution. The end2end now is like AlphaGo trained on only human games and very good at predicting how a good driver would drive in a given situation, they want to become AlphaZero with very high depth far outplaying even the experts.

I might be wrong here, excited about AI day 2023 to learn how they actually implemented this. If anyone have any ideas how to bring in the goodies from V11 please feel free to speculate!
 
Having had some time to process V12 I think there will be some issues with end2end.

The previous feature engineered system had a lot of advantages. For example visualization, indicating which car it was waiting for in blue, saying why it did something, chiming when a light turns green etc. When you go black box you lose this, the model is not even told what a traffic light is much less that it should visualize it. If you look at the display it looks like they were showing cars and lanes in the visualization, indicating that it's not the fully end2end that is being displayed. Maybe they were running the V11 network in shadow mode and were displaying that.
View attachment 968547

With V12 they predict how a good alert driver would drive. Before they predicted the environment and then used tree search to find the optimal solution to minimize energy, jerk, risk etc. The optimal solution would have a better cost function in most scenarios, but suffer from bugs in rare situations. Like this:

I guess for now the end2end performs better than the feature engineered solution. But I think they want to add search and optimization to the solution. The end2end now is like AlphaGo trained on only human games and very good at predicting how a good driver would drive in a given situation, they want to become AlphaZero with very high depth far outplaying even the experts.

I might be wrong here, excited about AI day 2023 to learn how they actually implemented this. If anyone have any ideas how to bring in the goodies from V11 please feel free to speculate!
I think people over-do the term 'black box' when it comes to neural networks. They really do not have to be. I've worked on custom-built NNs a bit. Really, a NN is just a data and processing structure, but how you implement it and how you label sections of it is entirely up to the developer. I would be extremely surprised to find out that the Tesla NN is just a big anonymous bunch of neurons and weights, without any attempt at organisation. Also, just because they are saying its fully NN and AI, doesnt mean its single step, or single network. I expect there are still a bunch of separate networks, with separate tasks, its just that the decision-making for controls in the vehicle is now a NN instead of C++.
 
We just have no idea since analogies are missing for significant parts of the problem.
This is important. AI in highly rule based learning - games, language… is making staggering advances. Agreed.

Advances in deeply chaotic systems not so staggering IMO.

Driving in over sampled Palo Alto on a perfect day on perfect streets means squat IMO. There is another world of potholes, trash, shadows, rain, manure, fog, blizzards, dust, ice, flying debris, farm equipment and animals to add to this orderly little Barbi world demo.

V11 and V12 need to be compared in all environments before conclusions are drawn. V12 could be something or it could just be todays “fluffer bot”. Just an opinion YMMV.
 
PSA: as of 0300 or so, I am downloading 11.4.7
Given HardCore FSD skeptic Jenny’s positive reaction to this week’s FSD drives we took (she is far more jaundiced than I) and with the “Elon’s Drive” as further background, I am more than just a little bit optimistic about what to expect.

“99 times bitten, 100 times shy” notwithstanding.
 
I think people over-do the term 'black box' when it comes to neural networks. They really do not have to be. I've worked on custom-built NNs a bit. Really, a NN is just a data and processing structure, but how you implement it and how you label sections of it is entirely up to the developer. I would be extremely surprised to find out that the Tesla NN is just a big anonymous bunch of neurons and weights, without any attempt at organisation. Also, just because they are saying its fully NN and AI, doesnt mean its single step, or single network. I expect there are still a bunch of separate networks, with separate tasks, its just that the decision-making for controls in the vehicle is now a NN instead of C++.
Black box is a common term used in modelling, signals and systems, system identification etc that has been adopted by machine learning. From wikipedia

In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings. Its implementation is "opaque" (black). The term can be used to refer to many inner workings, such as those of a transistor, an engine, an algorithm, the human brain, or an institution or government.

To analyze an open system with a typical "black box approach", only the behavior of the stimulus/response will be accounted for, to infer the (unknown) box. The usual representation of this black box system is a data flow diagram centered in the box.

The opposite of a black box is a system where the inner components or logic are available for inspection, which is most commonly referred to as a white box(sometimes also known as a "clear box" or a "glass box").

In neural networking or heuristic algorithms (computer terms generally used to describe 'learning' computers or 'AI simulations'), a black box is used to describe the constantly changing section of the program environment which cannot easily be tested by the programmers. This is also called a white box in the context that the program code can be seen, but the code is so complex that it is functionally equivalent to a black box.
1693135896241.png

Elon said:

so I have not intervened uh once and the drive has been better smooth
and again being being smart repetitive we're repetitive about being repetitive in fact
um but we have not programmed in the concept of traffic lights
there's not like uh this is a red light this is a green light and this is the
traffic light position we have that in the normal stack but we do not have that
in B12
this is just video video training
like I said nothing but uh neural Nets um and yet it knows which light
flies to it um and it stops at a red light accelerates the green light
um now one of the sort of very slightly funny challenges we've had is that
um since the car is being trained on what humans do humans almost never stop fully
at a stop stop treat if so when they get to a stop sign humans actually almost


Here he is referring to the neural network running in the car. Offline in their autolabel stack they have tons of feature engineering with neural networks finding traffic lights and stop signs, where cars were driving on red, where drivers failed to avoid potholes etc so they can select the right data to train the end2end network.

But yeah Tesla's end2end neural network is a combination of many architectures well defined by themselves but still differentiable. Something like this but steering angle(yaw) and velocity as output and some other changes to the architecture:
1693136689068.png


(sorry for OT, but it's weekend and FSD V12 is a big deal)
 
Last edited: