AlanSubie4Life
Efficiency Obsessed Member
I aced stats, I'm going to let you do your homework and figure out how a sample size of 100 out of hundreds of millions, with a human choice bias, cannot be statistically significant.![]()
![Pile of poo :poop: 💩](https://cdn.jsdelivr.net/joypixels/assets/8.0/png/unicode/64/1f4a9.png)
You can install our site as a web app on your iOS device by utilizing the Add to Home Screen feature in Safari. Please see this thread for more details on this.
Note: This feature may not be available in some browsers.
I aced stats, I'm going to let you do your homework and figure out how a sample size of 100 out of hundreds of millions, with a human choice bias, cannot be statistically significant.![]()
Ok I see the confusion. They report the mDAU's quarterly and do 9,000 samples per quarter (100 a day). I've removed my disagree.I aced stats, I'm going to let you do your homework and figure out how a sample size of 100 out of hundreds of millions, with a human choice bias, cannot be statistically significant.![]()
Twitter claims 9000 samples per quarter for a metric reported quarterly. Elon was the one who claimed 100! (FUD! No one likes FUD!)Sample Size Calculator
This free sample size calculator determines the sample size required to meet a given set of constraints. Also, learn more about population standard deviation.www.calculator.net
385 is the minimum DAILY, RANDOM users to be surveyed for a 95% confidence. This is based on a user count of 330mil users, as a quick Google search came up with.
All applicable formulas are at that link.
You guys forgot how to Google simple stuff like that?
EDIT - hopping on a red-eye. You are on your own from here on out kids.
Even if they were reporting it daily (they actually report quarterly) you used the calculator wrong. The actual confidence interval is dependent on the population proportion (you used 50%, the worst case). Also, the number of users is basically irrelevant, margin of error in the example below is 0.43% with 100k population size (which is why Elon's talking point about the sample being 0.0000000000whatever percent of users makes me doubt his understanding of sampling statistics).Sample Size Calculator
This free sample size calculator determines the sample size required to meet a given set of constraints. Also, learn more about population standard deviation.www.calculator.net
385 is the minimum DAILY, RANDOM users to be surveyed for a 95% confidence. This is based on a user count of 330mil users, as a quick Google search came up with.
All applicable formulas are at that link.
You guys forgot how to Google simple stuff like that?
EDIT - hopping on a red-eye. You are on your own from here on out kids.
My point is that Twitter is already doing what you suggested they should do to detect bots, using "tangible, measurable, repeatable criteria that don't rely on human judgement".
Are you saying they should just stop there and not try to estimate how many bots they fail to detect? Obviously if there's a method that doesn't rely on human judgement they can just add it to the automated bot removal.
Still waiting on @bkp_duke to calculate that confidence interval and provide support for his claim that the sample of mDAU's is non-random.
Twitter reported 229 million users in the mDAU metric last quarter, not 330 million, which is the monthly active user number.385 is the minimum DAILY, RANDOM users to be surveyed for a 95% confidence. This is based on a user count of 330mil users, as a quick Google search came up with.
But you haven't actually suggested what would be better!Again, what you are showing is that Twitter is auto-removing SOME of the bots each day.
But what we are discussing is their filing with the SEC about how many they MISS and are included in their reported DAU's. Twitter is not monetized or valued based on the million bots they dismiss each day - all companies deal with garbage signups. What matters is the number of supposedly active USERS they claim to have - which needs to be reduced by any non-humans in there. Twitter has a long tradition of not doing any form of deep analysis on the topic - instead they have a team of humans that looks at a very small sample and magically pronounces 95% of it "good". That process is deeply riddled with potential bias, which is why I was suggesting they consider something better.
But you haven't actually suggested what would be better!
By your logic Twitter could claim 0% of their mDAUs are bots because they've used "tangible, measurable, repeatable criteria that don't rely on human judgement" to remove all the bots from their count.
The sample size is plenty large to be able to make the quarterly <5% claim. I have no idea what the benefit would be of more precision.
Twitter has a long tradition of not doing any form of deep analysis on the topic - instead they have a team of humans that looks at a very small sample
The sample is random from the prescreened list of mDAUs. The humans looking at the selection don’t get to decide whether or not they get to decide whether it is a bot. Where exactly is the bias?Doing so would eliminate the current system of human-opinion which is horrifically prone to bias.
How do you know what they are doing exactly for their analysis?
So...we have two outcomes...either you are sober, in which case you are wrong...or you are drunk, and therefore I agree with you...who said stats is difficult..Twitter reported 229 million users in the mDAU metric last quarter, not 330 million, which is the monthly active user number.
I also aced engineering stats, and did it drunk. Literally. I took the final after an all night bender, went straight to class from the party at 8am and got 100%. Not bragging, just saying maybe it's not entirely still with me, so this might not be exactly correct:
These confidence interval measurement calculators are for finding a single sample that is representative (95% of the samples will contain the mean, IIRC) using the Central Limit Theorem. But they do not reflect repeated sampling of the same (or at least statistically identical) populations. Taking a random sample of 100 per day adds up to 385 (or whatever number matters) in just a few days. Assuming a statistically stable mDAU population, everything past that date just creates a statistically valid rolling average, does it not?
I understand the confound here, which is that the random sample could also theoretically resample the same users on consecutive days. But I think the percentages are such that it's equivalent to a research firm doing a survey and calling 100 people a day until they get their minimum sample size on day X.
The humans looking at the selection don’t get to decide whether or not they get to decide whether it is a bot. Where exactly is the bias?
I was trying to state that selection bias is not an issue. (Just to clarify that since your post was ambiguous on that.) They don’t get to say…meh, I don’t know…let’s try another account.I genuinely don't understand what you're trying to say.
Please state the specific information in that filing that you are referring to, which describes their process.Because they had to describe it in their SEC filing...
I had not even read the link at the end of the above thread when I wrote this, but for example:But that same bias is also an issue with any approach if humans have to decide whether or not the approach chosen is correct or not.
They don’t get to say…meh, I don’t know…let’s try another account.
Twitter claims that they are doing exactly what you're suggesting. Once their algorithms determine that a user is a bot they go back and remove it from the mDAU count. There are two separate things removal of spam accounts and the estimation of how many spam accounts remain despite your best efforts to remove spam accounts.We are just not communicating here.
There are two totally separate things.
1: Operationally blocking bots as they try to create accounts and banning them in realtime
2: Reporting your DAU's to the SEC quarterly, given 90 days to examine everything you want about your userbase including how they acted AFTER they signed up, if they got reported for being a bot, etc.
You want to conflate the above two items. They are completely different.
Twitter has chosen a pure-human approach to (2), and their humans (who know who signs their paychecks) report very low bot levels.
Even though I don't like Elon's obsession with Twitter - he IS right that you could use all kinds of deep data science, AI, multiple-regression and other serious techniques to make BACKWARD LOOKING highly accurate measurements about who your real users were over the last quarter, and who the bots were. Doing so would eliminate the current system of human-opinion which is horrifically prone to bias.
We are continually seeking to improve our ability to estimate the total number of spam accounts and eliminate them from the calculation of our mDAU, and have made improvements in our spam detection capabilities that have resulted in the suspension of a large number of spam, malicious automation, and fake accounts. We intend to continue to make such improvements. After we determine an account is spam, malicious automation, or fake, we stop counting it in our mDAU, or other related metrics.
If they remove one million bots a day, they can’t all have been supervised by a human
Humans also write the algorithms (or label the data for training the NN) that determine the 1 million accounts to delete a day... of course there is human bias.You don't see ANY possible bias problem in there?
They claim to sample 9,000 accounts per quarter and clearly say that the <5% is the average over the quarter.100 accounts are a bot
The bias problem is that they get to DECIDE if they think any of the 100 accounts are a bot, and they are hired, paid and potentially fired by the entity who's revenue figure depends on coming up with a LOW number for bot-count.
You don't see ANY possible bias problem in there?