A demonstration system for a diacritic restoration method that is able to automatically restore diacritics for the African languages of Cilubà, Gĩkũyũ, Kĩkamba, Maa, Sesotho sa Leboa, Tshivenḓa and Yoruba.
This is a rudimentary machine translation system for the language pair English - Luo (Dholuo).
This demo showcases a part-of-speech tagger for Northern Sotho. It retrieves the morpho-syntactic categories for words in a sentence.
This demo showcases a broad coverage part-of-speech tagger for Kiswahili. It retrieves the morpho-syntactic categories for words in a sentence.
Analyze your writing style and compare yourself to other people (Dutch).
The OntoBasis tools allow the extraction of semantic information from a text corpus.
MBSP is a set of linguistic tools based on the Timbl and Mbt memory based learning applications developed at CLiPS and ILK. It provides tools for Part of Speech tagging, Chunking, Lemmatizing, Relation Finding and (for medical language) Semantic tagging. The general English version of MBSP has been trained on data from the Wall Street Journal corpus, the (bio-)medical English version was originally developed for use in the BioMint Text Mining tool and uses training data from the GENIA corpus.
Reference: Daelemans, W., Bucholz, S., and Veenstra, J. (1999) Memory-Based Shallow Parsing. Proceedings of CoNLL-99, Bergen, Norway, pp. 53-60.
This is a sentence summarization demo for Dutch with was developed in the framework of the ATraNoS project.
This automatic summarization demo can summarize up to three documents simultaneously. The summarizer works as follows. It starts with recognizing the separate sentences in the texts. Next it computes for each sentence an importance score. The system sorts the sentences on their importance and extracts the most important 25% (by default) as summary.
This webdemo allows you to categorize an unlabelled test document based on a model trained on two labelled training texts (one of each class).
This demo is an implementation of a sequence alignment script. The input is two collections of sequences, viz. tokens and tags. Each sequence in the collection of tokens is paired with a sequence from the collection of tags. The order of the elements of a sequence can be random and is not used by the algorithm.