Given that the system is apparently active in that it shows the lanes on the display, I have to assume that it's watching how I drive and comparing my behaviour with what it would have done if AP was actively in use. Much like the theory that REM sleep (or the type of sleep where you're paralyzed - I'm hardly an expert on sleep!) is a good time to have dreams that test scenarios - falling off the cliff etc. - because you can't damage yourself, I suspect that AP is doing the same. Not able to act dangerously, but able to test scenarios in a dream state! It would be interesting to know how or if AP's 'watching' differs from a drivers 'panic corrections' with respect to how the learning occurs. This is likely something Ohmman could comment on based on his experience?
There are a number of ways Tesla can be doing this, but first and foremost, it's learning from the driver. That's how these things work, and Elon said as much too. So when you're driving, you're training. When AP is on, it's mostly just obeying the model. There is a slim chance that there is a reinforcement learning algorithm that runs while AP is on, which has a penalty for things like line crossing or a driver taking over without prompting. However, the main training method is for drivers to control the vehicle while it collects environmental data.
One common thought seems to be that the car is learning specific routes or specific locations based on an individual's driving. That's quite unlikely for a few reasons, the most obvious being that a model needs many iterations to adjust. More likely, if a driver is seeing a real difference on the route, the vehicle is learning how to react given a set of inputs which may be unique to that setting. The benefit is that then the model can generalize this to another location on the other side of the country/world where the inputs are similar. And someone there is aiding the model in learning your particular set of inputs. In other words, if AP is doing better on your particular drive, it's because it's doing a better job generalizing to your route.
A simple example is the classic handwritten character recognition problem. If you have a specific way of writing a 4 which isn't classified properly in the existing model, sending it one or two of your 4s with a label ("this is 4") isn't going to do much. However, if there are hundreds of people like you who write slightly similar 4s, and those are part of the labeled training set, it'll properly classify your 4 even without your examples. Compare that to a spot in your route where maybe the line is wavy or worn off at one spot. Surely in the AP collection space, there are other locations like this. Every time a driver completes their trajectory through, the model is getting more confident in that set of inputs. If you waited long enough, the car would work properly even if it hadn't ever seen that particular spot on your route.