New technology automatically exposes German hate speech
22 February 2018
University of Antwerp and University of Hildesheim create social media hate speech monitor.
A recent study from the University of Warwick has shown a strong correlation between hate speech on German social media and physical violence towards refugees. The EU has been pressing tech companies such as Twitter, Facebook and Google to increase their efforts to counter online hate speech. In Germany, the new NetzDG law now forces social media platforms to delete hateful content within 24 hours, with remarkable consequences such as one AfD politician being temporarily suspended from Twitter.
Since 2016, Twitter has been suspending hundreds of thousands of profiles that promote hate and violence, but these usually include users that communicate in English, while German, French or Dutch tweets often seem to slip through undetected. The challenge is incredibly hard. Over 500 million new tweets are published every day. If Twitter employed 10,000 coworkers to scan daily content, each of them would have to read 50,000 new tweets every day, or 1-2 tweets every second. With no lunch breaks.
During the past year, AI developer Tom De Smedt from the Computational Linguistics Research Group of the University of Antwerp (BE) and media linguist Sylvia Jaki from the department of Übersetzungswissenschaft & Fachkommunikation of the University of Hildesheim (DE) have developed a computer programme that automatically detects inflammatory German words and word combinations in tweets, in real-time.
Pictures and emojis
The new computer program is part of a study in which the two researchers are analysing political debate during the 2017 German elections. To find out how specific politicians express themselves and how people react to political topics on social media, they have been inspecting political TV debate shows and social media comments by, about, and for politicians and political parties. Sylvia Jaki elaborates: “In order to gain a better understanding of how political hate speech works on social media, we take into account that comments are not exclusively verbal and also take a look at non-verbal elements such as pictures or emojis.”
The Antwerp language technology group has a history with implementing cybersecurity systems, for example to detect jihadist content. “The software independently learns to spot hateful content and continually adapts, since the rhetoric tends to evolve quickly,” says Tom De Smedt. “In lab tests, we see that this approach is over 80% accurate. We see opportunities to collaborate with local police and security services, but we also have to be careful about how we use this kind of technology, as a society. The EU has no legal definition of what exactly constitutes hate speech.”
The algorithm developed by the two linguists shows that German hate speech is often preoccupied with African refugees (Afrikaner, Südländer, Nafris), Muslims (Araber, Syrer, Salafisten, Islam Terroristen), Jews (die Juden), others (Polen und Ungarn, Gebrochen Deutsch Sprechender, Gefährder und Kriminelle, Obdachlosen, Gutmenschen, Linksextremisten, Frauen), rhetoric involving violence (schlagen, schießen, überfallen, bekämpfen, Widerstand) and swearing in general (Scheißdeutschland).
Example of German hate cartoon