Doggydogworld
Active Member
My point is an E2E system doesn't "know" anything. It doesn't know there's a bike or a car or a lane to the right or whatever. It's just a huge equation with a billion fixed coefficients. This language engine, for lack of a better word, simply makes stuff up.I don't understand. The very purpose of this effort is to give people an idea of what the model is doing. The fact that it said that it was doing something that it wasn't was a means of letting us know that it has a bug, either in the text generation system or in the driving system. If FSD explained what it was doing, we'd know right away why it was driving at a particular speed, or why it refused to change lanes, or why it does all the other odd things that it does.
Here's a question: how do you train the language engine? You train NNs by feeding in a bunch training data and tweaking the coefficients until you get close to the "right" output. With image recognition the right output is what a human curator says each training image represents. With a braking NN the right output could be the lowest constant g force that stops at the stop sign.
What's the right output for this language NN? What a human curator says, as with image recognition? Probably. But the human curator is just guessing! So they trained the NN to guess like a human. It might make the rider feel better, but it doesn't represent the actual driving decisions.