African Language Technology

Automatic Diacritic Restoration for African Languages

A demonstration system for a diacritic restoration method that is able to automatically restore diacritics for the African languages of Cilubà, Gĩkũyũ, Kĩkamba, Maa, Sesotho sa Leboa, Tshivenḓa and Yoruba.


English ↔ Luo Machine Translation

This is a rudimentary machine translation system for the language pair English - Luo (Dholuo).


Northern Sotho Part-of-Speech Tagger

This demo showcases a part-of-speech tagger for Northern Sotho. It retrieves the morpho-syntactic categories for words in a sentence.


Swahili Part-of-Speech Tagger

This demo showcases a broad coverage part-of-speech tagger for Kiswahili. It retrieves the morpho-syntactic categories for words in a sentence.



Analyze your writing style and compare yourself to other people (Dutch).



In een ingevoerde tekst worden alle woorden omgezet naar verkleinwoorden met behulp van memory-based machine learning en extra regels.

Semantic Analysis


The NeSp Scope Labeler takes as input a text and gives as output the text splitted into sentences where the scopes of negation and speculation cues are marked. It is trained to process biomedical texts. A bit more information can be found here: BiographTA software


NEON/DAESO Sentence compression

The compression system deletes part of a sentence in order to compress it. This system was developed for Dutch within the NEON project and within the Daeso Project, and is based on earlier research for English in the MUSA project. The system takes a hybrid rule-based - statistical approach. First each sentence is parsed with the memory-based shallow parser. The parser tokenizes the sentence and asigns part-of-speech tags, IOB- chunk tags and lemmas to every token. The compression system uses the predicted chunk tags to determine which words or phrases are a candidate for removal. 



The OntoBasis tools allow the extraction of semantic information from a text corpus.

Shallow Parsing

Memory-Based Shallow Parsing of English

MBSP is a set of linguistic tools based on the Timbl and Mbt memory based learning applications developed at CLiPS and ILK. It provides tools for Part of Speech tagging, Chunking, Lemmatizing, Relation Finding and (for medical language) Semantic tagging. The general English version of MBSP has been trained on data from the Wall Street Journal corpus, the (bio-)medical English version was originally developed for use in the BioMint Text Mining tool and uses training data from the GENIA corpus. 

Reference: Daelemans, W., Bucholz, S., and Veenstra, J. (1999) Memory-Based Shallow Parsing. Proceedings of CoNLL-99, Bergen, Norway, pp. 53-60.

Text Mining

ATraNoS Dutch Summarization

This is a sentence summarization demo for Dutch with was developed in the framework of the ATraNoS project.


Dutch Multi-Document Summarization

This automatic summarization demo can summarize up to three documents simultaneously. The summarizer works as follows. It starts with recognizing the separate sentences in the texts. Next it computes for each sentence an importance score. The system sorts the sentences on their importance and extracts the most important 25% (by default) as summary.


TACTiCS Tool for Analyzing and Categorizing Text using Characteristics of Style

This webdemo allows you to categorize an unlabelled test document based on a model trained on two labelled training texts (one of each class).

Other Demos

Feature-Label Association

Visualize feature-label associations


Sequence Translation

This demo is an implementation of a sequence alignment script. The input is two collections of sequences, viz. tokens and tags. Each sequence in the collection of tokens is paired with a sequence from the collection of tags. The order of the elements of a sequence can be random and is not used by the algorithm.