AI is notoriously difficult to diagnose .... if he really wants a "transparent" system, AI will not accomplish that.
But the question is - does he even want "moderation". Looks like he just wants illegal content out. No moderation at all ... so abuse, harassment etc are ok.
I feel that is what will make Twitter a graveyard. If you don't ban the fringe 1% - they will harass and abuse the 99% until they leave. That will completely collapse twitter ad revenue too.
ps :
Lets get out of this left-right false dichotomy. The question really is about harassment and abuse. I'm not going to post on twitter if all I see is a string of abuses all the time because of my skin color. That might be covered under "free speech" - but it won't happen in my physical town square. But happens all the time on Twitter. In earlier years the biggest issue that posters (esp. journalists) faced was abuse based on religion or gender.
So, the question really is how does he go about only removing illegal content without making Twitter a grave yard full of harassers and abusers.
Yes - you can't selectively ban "misinformation". It is extremely difficult to determine what is misinformation and I don't think Elon will step into that.
Yet there are so many possibilities for automating transparency and labeling, not necessarily even involving AI,
and not even needing to trigger outright posting bans. This can apply to topics rife with misinformation, hate speech,
or any subject, really.
Consider Bot Sentinel, with a percentage display. Imaging combining this with other metrics such as detectors
for verbiage loaded with personal insults, racist dogwhistling, etc. There are instant linguistic sentence analyses for
reading level (Flesch-Kincaid and many others). How about displaying important tweet-specific metrics in an
optional "dashboard" for the reader?
Similar to Tesla FSD-beta "Safety Scores", the underlying components or algorithms can be made transparent
(to open source, say), so can create a useful feedback loop. One can then shape a personal preference "bubble"
to make custom filters.
Further, Twitter can use such metrics to suppress display of the absolute worst garbage, not necessarily
by banning outright, but by segregating as do email spam filters. Shunning can work wonders, with
anyone being able to "unearth" the dreck if wanted.
Sure, all metrics can be "gamed" just as Tesla driving safety scores have been. If one wants to spend
much effort to game this stuff, go for it and post a lot of crap, but it will be exposed to humans soon enough,
which won't require Sherlock's smarter brother to discern.
Shear volume of whacky 240-character tweets can feed into other monitors. Say someone with millions
followers posts hundreds of low-brow tweets per day, indicating a superhuman level of bogosity. Then
their pearls of wisdom won't really be followed by millions who have created their own filters. Re-tweeting
crap exposed by various algos can be attenuated, too. If the ostensible multi-million follower finds that
they are filtered down to a tiny fraction of actual readers this score can be displayed, to see that they
are really an emperor with no clothes. Feedback can improve "reach" if that is a goal.
Nothing needs to be coded with "hard" thresholds, but can use fuzzy logic to allow grey areas.
The system doesn't need to be "one way", in favor of metric/reader symbiosis.
Again, an apt Tesla-related analogy might be with FSD Beta. At first the AI system seems opaque to a city driver,
but then the driver's own brain can suss out its limitations (say undue hesitancy at seemingly-clear
intersections, getting too close to clipping a wheel on a curb, not understanding the "intent" of
a pedestrian (er, "VRU") stopping before walking into a path). Human/machine symbiosis here creates
a better system overall. I believe this can be done also with Twitter.