Language technology exposes hate messages

Date: 19 April 2016

Introduction: Textgain, a new UAntwerp spin-off, analysing huge amounts of social media data

Twitter does not want its platform to be used to promote terrorism. But how can you check millions of tweets per day? Textgain, a new spin-off from the University of Antwerp, has developed language technology that can automatically track hate messages posted by IS sympathisers.

Over the last few years, the University of Antwerp has built up significant expertise in the automatic analysis of massive amounts of text. In 2014, for example, our scientists screened all Twitter messages posted in Dutch for the name of a famous politician. This resulted in a ‘political barometer’ that indicated the particular sentiments being tweeted about a politician or political party. During the VTM programme K3 zoekt K3, the researchers analysed an overwhelming number of tweets in real time, allowing them to predict the winning trio before the results were announced.

With that experience under their belts, researchers Guy De Pauw, Tom De Smedt and Professor Walter Daelemans recently established the Textgain spin-off. Language technology is central to this new project: “We want to use this spin-off to commercialise the technology developed within CLiPS (Computational Linguistics and Psycholinguistics), the research group”, says De Pauw. “This technology allows us to extract facts, opinions and demographic information automatically from social media data, newspaper articles, emails and so on in a wide range of languages. That type of information has invaluable applications in big data and e-marketing.”

“It is important for companies to know what is being said about them on social media”, explains De Smedt. “However, so much is posted and tweeted that it is impossible for them to screen this data. This is where Textgain can help. It goes further than you might think: it seems obvious that the statement ‘I love it!’ is a positive one. But the technology can also conclude that this comment is more likely to have been written by a woman than by a man. Age and even personality traits can be identified. That type of information is very useful for marketers.”

Security agencies
It’s not only in marketing that Textgain has a role to play. The spin-off also puts its language technology to use in tracing hate messages on Twitter. De Smedt adds: “In February, Twitter announced that they did not want to see their platform being used to glorify terrorism. Around 125,000 accounts have already been closed, mostly those linked to IS and its sympathisers.”

But the fight against hate messages is a hard one. Textgain has now developed software able to detect hate speech and related combinations of words automatically. “In addition, our software continuously adapts itself to the evolving reality. It goes without saying that this technology must be used cautiously, but in the long run we do see opportunities for collaboration with other parties, such as the security forces."