Whereas Neighborhood Notes has the potential to be extraordinarily efficient, the troublesome job of content material moderation advantages from a mixture of completely different approaches. As a professor of pure language processing at MBZUAI, I’ve spent most of my profession researching disinformation, propaganda, and faux information on-line. So, one of many first questions I requested myself was: will changing human factcheckers with crowdsourced Neighborhood Notes have damaging impacts on customers?

Knowledge of crowds
Neighborhood Notes bought its begin on Twitter as Birdwatch. It’s a crowdsourced function the place customers who take part in this system can add context and clarification to what they deem false or deceptive tweets. The notes are hidden till group analysis reaches a consensus—which means, individuals who maintain completely different views and political opinions agree {that a} submit is deceptive. An algorithm determines when the brink for consensus is reached, after which the notice turns into publicly seen beneath the tweet in query, offering extra context to assist customers make knowledgeable judgments about its content material.
Neighborhood Notes appears to work slightly properly. A crew of researchers from College of Illinois Urbana-Champaign and College of Rochester discovered that X’s Neighborhood Notes program can cut back the unfold of misinformation, resulting in submit retractions by authors. Fb is basically adopting the identical method that’s used on X at the moment.
Having studied and written about content material moderation for years, it’s nice to see one other main social media firm implementing crowdsourcing for content material moderation. If it really works for Meta, it could possibly be a real game-changer for the greater than 3 billion individuals who use the corporate’s merchandise day-after-day.
That mentioned, content material moderation is a fancy downside. There isn’t a one silver bullet that may work in all conditions. The problem can solely be addressed by using quite a lot of instruments that embody human factcheckers, crowdsourcing, and algorithmic filtering. Every of those is greatest suited to completely different sorts of content material, and might and should work in live performance.
Spam and LLM security
There are precedents for addressing related issues. A long time in the past, spam e mail was a a lot larger downside than it’s at the moment. Largely, we’ve defeated spam via crowdsourcing. Electronic mail suppliers launched reporting options, the place customers can flag suspicious emails. The extra broadly distributed a selected spam message is, the extra seemingly it is going to be caught, because it’s reported by extra individuals.
One other helpful comparability is how giant language fashions (LLMs) method dangerous content material. For probably the most harmful queries—associated to weapons or violence, for instance—many LLMs merely refuse to reply. Different instances, these techniques might add a disclaimer to their outputs, equivalent to when they’re requested to supply medical, authorized, or monetary recommendation. This tiered method is one which my colleagues and I on the MBZUAI explored in a current research the place we suggest a hierarchy of the way LLMs can reply to completely different varieties of doubtless dangerous queries. Equally, social media platforms can profit from completely different approaches to content material moderation.
Automated filters can be utilized to establish probably the most harmful data, stopping customers from seeing and sharing it. These automated techniques are quick, however they’ll solely be used for sure sorts of content material as a result of they aren’t able to the nuance required for many content material moderation.