African Language Technology

Automatic Diacritic Restoration for African Languages

A demonstration system for a diacritic restoration method that is able to automatically restore diacritics for the African languages of Cilubà, Gĩkũyũ, Kĩkamba, Maa, Sesotho sa Leboa, Tshivenḓa and Yoruba.


English ↔ Luo Machine Translation

This is a rudimentary machine translation system for the language pair English - Luo (Dholuo).


Northern Sotho Part-of-Speech Tagger

This demo showcases a part-of-speech tagger for Northern Sotho. It retrieves the morpho-syntactic categories for words in a sentence.


Swahili Part-of-Speech Tagger

This demo showcases a broad coverage part-of-speech tagger for Kiswahili. It retrieves the morpho-syntactic categories for words in a sentence.



Analyze your writing style and compare yourself to other people (Dutch).

Semantic Analysis


The OntoBasis tools allow the extraction of semantic information from a text corpus.

Shallow Parsing

Memory-Based Shallow Parsing of English

MBSP is a set of linguistic tools based on the Timbl and Mbt memory based learning applications developed at CLiPS and ILK. It provides tools for Part of Speech tagging, Chunking, Lemmatizing, Relation Finding and (for medical language) Semantic tagging. The general English version of MBSP has been trained on data from the Wall Street Journal corpus, the (bio-)medical English version was originally developed for use in the BioMint Text Mining tool and uses training data from the GENIA corpus. 

Reference: Daelemans, W., Bucholz, S., and Veenstra, J. (1999) Memory-Based Shallow Parsing. Proceedings of CoNLL-99, Bergen, Norway, pp. 53-60.

Text Mining

ATraNoS Dutch Summarization

This is a sentence summarization demo for Dutch with was developed in the framework of the ATraNoS project.


Dutch Multi-Document Summarization

This automatic summarization demo can summarize up to three documents simultaneously. The summarizer works as follows. It starts with recognizing the separate sentences in the texts. Next it computes for each sentence an importance score. The system sorts the sentences on their importance and extracts the most important 25% (by default) as summary.


TACTiCS Tool for Analyzing and Categorizing Text using Characteristics of Style

This webdemo allows you to categorize an unlabelled test document based on a model trained on two labelled training texts (one of each class).

Other Demos

Sequence Translation

This demo is an implementation of a sequence alignment script. The input is two collections of sequences, viz. tokens and tags. Each sequence in the collection of tokens is paired with a sequence from the collection of tags. The order of the elements of a sequence can be random and is not used by the algorithm.