It's been decades since I've done that level of statistical analysis and, at that, long before spreadsheets and Excel/LibreOffice/OLE/DDE/COM etc. To answer your question no, that's not really what I had in mind. I think the chart you did with the big bubble on the left and, what I'll call, noise after that using reserve date was wrong. Not that what you did was, in any way, wrong. It was statistically insignificant (my hypothesis). If, however, you used config date I believe it would correlate better. What separates multiple regression from linear regression is one can add other independent variables such as EAP or FSD. I believe the macro is linest() in spreadsheets but I suspect what it would do is just give you a number representing significance of the independent variable to the dependent variable. A sure sign of age, I'm having a hard time coming up with what the dependent variable is. This is something Ben Sullins excels at data science. There is a flaw using reservation date as they were queued from 3/31/16 or earlier to June 18? When they were producing a few hundred a week that's nothing compared to 5500/wk. So if one came up with a 'best fit' line...but line for what? Again, no obvious dependent variable. I think I'll review examples maybe on Kahn Academy for statistical analysis.
Null hypothesis 1 - no correlation between config date delivery date
Null hypothesis 2 - presence of EAP (FSD requires EAP) is irrelevant to delivery date distance from config date.
Is that helpful? BTW, thank you for your work on this.
When I was at Monster.com they had years worth of query data. I scienced the sugar out of that. Talk about big data and data mining!