Introduce AI sentiment analysis tools to detect hate speech and Nazism (Community Wishlist/W518)

Description

The advent of AI brings countless opportunities. I am hesitant to adopt AI tools too quickly, but I do like the idea of long-term experiments with new tools so that we can use them slowly on our own terms, rather than being forced to use them in crisis.

Sooner rather than later, so that the Wikimedia community can begin to use new tools and discuss the social and ethical consequences of having them, I would like for the Wikimedia platform to have an in-house evaluation system to evaluate hate speech.

From one perspective, extreme hate speech, like Nazi ideology, can seem apparent. From another perspective, there are social taboos on users calling out violent hate speech. People who use hate speech may themselves be threatening. They may use "dog whistles" - coded language which is recognized as being hateful, but which also shields the speech with ambiguity and deniability.

There is never going to be a time when Wikipedia is better for tolerating the worst hate speech. It will be around indefinitely, and we should be good at catching the worst of it.

The tool that I am imagining is one in which a user's edits can be entered, and the tool makes an objective evaluation based on a public training dataset about whether the text is hate speech. The impact of having an evaluation is more confidence in making serious accusations against users who are using hate speech.

I have the idea that there are some negative characteristics which alone cold be a misunderstanding, but a user with a repeated history of hate speech could be more easily evaluated with automated processes. A human and tool process could better manage these cases.

English Wikipedia has a guideline en:Wikipedia:No Nazis. Conversations about enforcing this guideline are quite defeatist, and presume that Nazi speech is not possible to detect reliably. I think modern AI tools can detect this speech sufficiently. en:special:permalink/1342090967#Make_Wikipedia:No_Nazis_a_policy?

I have the idea that other platforms - the major social media platforms - very well manage hate speech with automated tools, and I want the Wikimedia platform to have a public plan and schedule for testing such things.

Assigned focus area

Unassigned

Type of wish

Feature request