Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

General Discussion: 2018 Investor Roundtable

This site may earn commission on affiliate links.
Status
Not open for further replies.
Yes, I know what a coefficient of determination is, thank you. Would you like to discuss multivariable Calculus?

View attachment 289167

If you look at the last three days, and exclude the one outlier way below the line, which is most likely a major QC-lag, then R2 = 1 - 1 = 0

In English, that line explains *sugar*.

We need another best fit line that covers ONLY the period after the late-Feb production shutdown.
Sorry you would have to explain what to me what "R2 = 1 - 1 = 0" means. I haven't really looked at this level of math for a long time.
 
R^2 (I'm lazy)

If all data points are above the line, then the sum of squares of residuals is equal to the total sum of squares.
Thanks, now I understand what you mean. Practically speaking, the linear fit line doesn't go through the points of the last 3 days at all, so it's not a valid fit for the population at all, so R^2 is 0.

I hope you would agree that trying to fit a time trend with 3 days of data is pointless.

Another thing to keep in mind, is that we didn't see VINs below the trend in the last 3 days could be because there were no certain color/wheel combinations with lower VINs in the last 3 days, and future color/wheel rotation could trigger a bunch of lower VINs being assigned.
 
A practical and easy way to check validity of a linear fit, is to artificially "mess up" the data slightly and see if the trend still holds. For example I took 2-3 of the lowest VINs reported from 3/20-22, and moved them to 3/25, which caused the trend to look less "rampy".This basically simulate some randomness in when VIN assignments may occur or reported on Troy's sheet. The resulting linear fit slope is reduced from ~5k/wk to ~2.5k-3k/wk. The conclusion I draw from this exercise is that it's really difficult to say it's 5k/wk with any accuracy, but it does look like it's 2k+/wk.

Sure, but we know the data set is non-linear (in generation and hopefully increase). VINs for any day are a non-uniform distribution bounded by 0 and the max number produced by Fremont. If we want to solve for the production rate, we need to weight the data to the high side or apply some other function to convert that spread to a representative value of cars produced. If that is not done, a least squares fit will get pulled more and more to the low side the long production continues. For example, one VIN 1000 lower counters 100 VINs 100 higher.
Trend wise, I like the VIN registration approach with some time lag and batching delay correction applied. There must be some time cap when all VINs from a block have been produced.
 
Someone should plot the average VIN over time, not all the VINs over time. I started do this but my Excel crashes constantly and I don't have time to set this up in R.

At any point in time, there is wide variation in the VINs that are assigned due to a host of factors beyond the production rate (color batching, shipping, configs etc). If we want to understand production rate, we don't want to include any of this other variation, as it distorts our analysis.

Even graphing the average VIN over time is highly problematic for several reasons. Imagine it's week 1 of a production and Tesla makes 1000 cars, so VINs 1 - 1000 are headed out the door for delivery. Here, the average VIN at the end of the week is 500, even though production was 1000. So you'll underestimate production by half.

Using the max VIN would be better, except Tesla does VINs non-sequentially so a single batch of high VINs could really mess things up. So for a simple analysis, doing something like taking the average VIN for a date, and then adding 2x the standard deviation to that (to the get the 95th percentile, basically) is about as good as possible with this data.

EDIT: Okay I did it. Here is a plot of the average VIN + 2 x SD for each date, which is basically a smoothed line for the max vin.
Screen Shot 2018-03-25 at 5.00.04 PM 1.png


The best fit is a polynomial trendline (R2 = 0.94). The good news is that it's a positive polynomial, so the production rate is increasing. This gives a production rate of about 1200 - 1300 cars over the last couple weeks. I think that's a safe minimum.

You could make the argument that the data over the last week supports a higher rate still, but you'd be drawing on only a few data points for that, so it's a real possibility, but the statistical evidence isn't there yet.
 
Last edited:
Non-statistician opinion:
I think r^2 It pretty pointless for this type of data. This is not a linear set/ function that is being modeled, it's multiple Y values for the same X. The bigger the VIN spread the worse the correlation will be. You can make lines that fit the error function better, but that is just curve fitting the days' VIN distribution. As long as there are low numbered outliers, the fit will always be less that the real, more so as time goes on. If we take the top 2-5 points for each day and fit those, it does tell how fast the max VIN is rising.
Sure, but we know the data set is non-linear (in generation and hopefully increase). VINs for any day are a non-uniform distribution bounded by 0 and the max number produced by Fremont. If we want to solve for the production rate, we need to weight the data to the high side or apply some other function to convert that spread to a representative value of cars produced. If that is not done, a least squares fit will get pulled more and more to the low side the long production continues. For example, one VIN 1000 lower counters 100 VINs 100 higher.
Trend wise, I like the VIN registration approach with some time lag and batching delay correction applied. There must be some time cap when all VINs from a block have been produced.
Someone should plot the average VIN over time, not all the VINs over time. I started do this but my Excel crashes constantly and I don't have time to set this up in R.

At any point in time, there is wide variation in the VINs that are assigned due to a host of factors beyond the production rate (color batching, shipping, configs etc). If we want to understand production rate, we don't want to include any of this other variation, as it distorts our analysis.

Even graphing the average VIN over time is highly problematic for several reasons. Imagine it's week 1 of a production and Tesla makes 1000 cars, so VINs 1 - 1000 are headed out the door for delivery. Here, the average VIN at the end of the week is 500, even though production was 1000. So you'll underestimate production by half.

Using the max VIN would be better, except Tesla does VINs non-sequentially so a single batch of high VINs could really mess things up. So for a simple analysis, doing something like taking the average VIN for a date, and then adding 2x the standard deviation to that (to the get the 95th percentile, basically) is about as good as possible with this data.

EDIT: Okay I did it. Here is a plot of the average VIN + 2 x SD for each date, which is basically a smoothed line for the max vin.
View attachment 289206

The best fit is a polynomial trendline (R2 = 0.94). The good news is that it's a positive polynomial, so the production rate is increasing. This gives a production rate of about 1200 - 1300 cars over the last couple weeks. I think that's a safe minimum.

You could make the argument that the data over the last week supports a higher rate still, but you'd be drawing on only a few data points for that, so it's a real possibility, but the statistical evidence isn't there yet.

Yes and yes and yes. Math is awesome.
 
  • Like
Reactions: Ulmo
Someone should plot the average VIN over time, not all the VINs over time. I started do this but my Excel crashes constantly and I don't have time to set this up in R.

At any point in time, there is wide variation in the VINs that are assigned due to a host of factors beyond the production rate (color batching, shipping, configs etc). If we want to understand production rate, we don't want to include any of this other variation, as it distorts our analysis.

Even graphing the average VIN over time is highly problematic for several reasons. Imagine it's week 1 of a production and Tesla makes 1000 cars, so VINs 1 - 1000 are headed out the door for delivery. Here, the average VIN at the end of the week is 500, even though production was 1000. So you'll underestimate production by half.

Using the max VIN would be better, except Tesla does VINs non-sequentially so a single batch of high VINs could really mess things up. So for a simple analysis, doing something like taking the average VIN for a date, and then adding 2x the standard deviation to that (to the get the 95th percentile, basically) is about as good as possible with this data.

EDIT: Okay I did it. Here is a plot of the average VIN + 2 x SD for each date, which is basically a smoothed line for the max vin.
View attachment 289206

The best fit is a polynomial trendline (R2 = 0.94). The good news is that it's a positive polynomial, so the production rate is increasing. This gives a production rate of about 1200 - 1300 cars over the last couple weeks. I think that's a safe minimum.

You could make the argument that the data over the last week supports a higher rate still, but you'd be drawing on only a few data points for that, so it's a real possibility, but the statistical evidence isn't there yet.

Thank you for this. What happens if you focus only on the most recent weeks post the late-feb production shutdown?
 
Thanks, now I understand what you mean. Practically speaking, the linear fit line doesn't go through the points of the last 3 days at all, so it's not a valid fit for the population at all, so R^2 is 0.

I hope you would agree that trying to fit a time trend with 3 days of data is pointless.

Another thing to keep in mind, is that we didn't see VINs below the trend in the last 3 days could be because there were no certain color/wheel combinations with lower VINs in the last 3 days, and future color/wheel rotation could trigger a bunch of lower VINs being assigned.

Although fitting a line to a time series is certainly an easy thing to do, broadly and generally speaking, time series data are a prime example of data that violates the basic assumptions of linear regression. The recent 3 days all being above the line being an example of how time series data can violate those basic assumptions (while not being a guarantee). If you're so inclined, you can read about the assumption here: Homoscedasticity - Wikipedia

The primary problem with time series data is that typically, the best predictor for tomorrow is today. Linear regression specifically assumes that all points contribute approximately equally to the resulting line, and clearly that isn't true if yesterday is the best predictor for today.

The rule of thumb I was taught during my briefish encounter studying time series data is that 50 points in the series is sort of the minimum for getting good results. With daily grain data, 50 points is 7 to 10 weeks depending on whether weekend days are included, and that's not too bad to come by. But weekly grain data starts looking like a year. Doesn't mean that you can't start modeling and drawing conclusions earlier (and people do), but confidence goes down.
 
  • Disagree
Reactions: Oil4AsphaultOnly
Visually tracing my finger across the last portion after the shutdown dip linearly shows up at about 16K for the next week. Considering it takes about 10 days to order parts for a Model S and build it (someone correct me), I'm thinking the March 20 Twitter Model3VINs post of "#Tesla registered 2,042 new #Model3 VINs. Highest VIN is 15885." is about right for my linear finger trace for end of quarter. If I'm exactly right (not that likely), the following March 23 "#Tesla registered 2,655 new #Model3 VINs. Highest VIN is 18540." would probably be in production about the end of March, plus or minus, no committment to which quarter, probably heavily on the April side of the line. Anything in The Machine could prove me wrong on its terms, not mine. (I'm fully aware they are fulfilling orders via logistics, not order sequencing; since I have no view into that, I'm not going to pretend to guess its effect after all variations are averged over Canada & USA.) Production vs. Delivery? Whatever the average Logistics time is. They'll probably push some California models with slighty lower logistics times, but not by much. What do we figure, about 2,000 less delivered than produced? That would include those still at stores or in logistics yards at train terminals and on slow moving trucks for the whole continent as well as the half week to week or so it takes to deliver within California. So 14K/16K is my guess for total delivered/produced to date by end of quarter. Am I far from the norm here?
If we assume 2k production the last week of March, I think we can assume that those 2k won't be delivered, plus some from the previous week. My guess is cumulative 12.5K produced, 10K delivered. 14/16k seems way too high to me since we're no where near 16k in VIN in assignment, and we have 1 week to go.
 
If we assume 2k production the last week of March, I think we can assume that those 2k won't be delivered, plus some from the previous week. My guess is cumulative 12.5K produced, 10K delivered. 14/16k seems way too high to me since we're no where near 16k in VIN in assignment, and we have 1 week to go.
Ah damn. Where did @dandurston get his data? This is from the spreadsheet reported by buyers:

Screen Shot 2018-03-25 at 8.39.13 PM.png

My finger trace on that shows up on 12,500 assigned, so 12,500 produced, 10,300 delivered. That's closer to what you wrote. This time I followed a less linear line, but one that followed the recent trend (a slight speedup but not much).

The post you quoted is now a ghost in the machine ...
 
Thank you for this. What happens if you focus only on the most recent weeks post the late-feb production shutdown?
Here is March alone:
Screen Shot 2018-03-25 at 9.28.23 PM.png

That gives a rise of 5400 cars over 26 days, or 208 cars per day, or 1455 per week. It's still curving up slightly, so maybe 1400 at the start and 1500 currently.

Ah damn. Where did @dandurston get his data? This is from the spreadsheet reported by buyers...
It's the same data as you posted, just with some manipulation as I described (I graphed the average + 2 x the standard deviation for each day, to approximate the highest VIN). Adding 2 x standard deviation gives us an estimation of the highest VIN out there, regardless or not of whether it's been reported. For any given day, it's very unlikely that the highest VIN in existence is actually on the spreadsheet, since only a small percentage of the VINs (about 5%) are reported. So when you take the standard deviation you get an idea of how much variation there is in the data that is reported, and then based on that you can make a statistical guess for what the highest assigned VIN is. A downside is that the sample size for any given day is very small, so you can have an erroneously small or large standard deviation based on chance, but over time that averages out.

It's also worth nothing that this is assigned VINs, not delivered, which likely trail by 2 weeks or so. My guess is that the highest delivered VIN will be around 12,500, which is an increase of 9500 or so for the quarter. But of course there are some gaps in there, so I think we're going to see about 8K Model 3 deliveries in Q1.

Even if you don't like that method - and any method has it's pros and cons - just a simple graph of the averages looks very much the same. It's a similar slope (and thus rate).
 
Although fitting a line to a time series is certainly an easy thing to do, broadly and generally speaking, time series data are a prime example of data that violates the basic assumptions of linear regression....The primary problem with time series data is that typically, the best predictor for tomorrow is today. Linear regression specifically assumes that all points contribute approximately equally to the resulting line, and clearly that isn't true if yesterday is the best predictor for today

I generally agree, but you're describing a problem that exists with virtually any linear regression: it's only valid for the range you have data for and not beyond that. Using any linear regression to predict beyond the data is never okay Doing so would only be valid if everything stays the same, and if we're going to assume everything stays the same, then we don't really need to be predicting it.
It's quite fair to use linear regression for time series data where you've have data across the time range, and you are trying to understand that data.
 
  • Helpful
Reactions: ValueAnalyst
At least he had a dad willing to help his family, and he was allowed to go to college.

Most dad's help their family, it's part of the job description. The ones who don't are the exceptions.

Besides, do you always believe the words of a known sociopath? Remember that this guy (Errol Musk) divorced his second wife (Maye Musk was the first wife and Elon's mother) of 18 years (and bore 2 children) only to turn around and marry his stepdaughter and have a child with her, because it was "god's plan".

Nah, he gets no redemption.
 
Someone should plot the average VIN over time, not all the VINs over time. I started do this but my Excel crashes constantly and I don't have time to set this up in R.

Very nice idea. Why didn't I think of that!

Using the max VIN would be better, except Tesla does VINs non-sequentially so a single batch of high VINs could really mess things up. So for a simple analysis, doing something like taking the average VIN for a date, and then adding 2x the standard deviation to that (to the get the 95th percentile, basically) is about as good as possible with this data.

I am not convinced that adding the standard deviation is the right call. Let me try to understand better : which standard deviation are you exactly adding. The one on the average of that just that day or the standard deviation over all?

The best fit is a polynomial trendline (R2 = 0.94). The good news is that it's a positive polynomial, so the production rate is increasing. This gives a production rate of about 1200 - 1300 cars over the last couple weeks. I think that's a safe minimum.

Which order polynomial did you fit?
 
Status
Not open for further replies.