Fine-Grained Sentiment and Opinion Mining of Political Social Network Messages 01/02/2014 - 31/12/2014

Abstract

This project aims to develop an annotated corpus for the purpose of fine-grained sentiment and opinion mining of social network messages. As a case study, we will monitor messages on politics in the run-up to the 2014 Belgian elections. We will annotate not only the sentiment expressed in the message in a more robust way, but also mark information on the opinion holder, the object of the opinion and the features of the object.

Researcher(s)

Research team(s)

The computational learnability of morphologically complex languages. 01/10/2009 - 30/09/2012

Abstract

Goals of the project: Traditional spell checkers make use of an extensive word list. If a word does not occur in this list, it is marked as a spelling error. More recent systems (e.g. Németh 2009) approach the problem of spell checking for agglutinating languages from a different angle: a word is considered as a spelling error, if it cannot be generated by an underlying morphological model of the language. In this project, we investigate how such a spell checker can be used as a tool in the automatic induction of a morphotactic system for Swahili.

Researcher(s)

Research team(s)

Computational Techniques for Stylometry for Dutch. 01/01/2007 - 31/12/2010

Abstract

In this project we investigate a methodology for the automatic extraction and analysis of style that we want to apply to both individual authors (authorship attribution, both fiction and non-fiction) and groups of authors (extraction of stylistich characteristics associated to gender and age). This methodology covers several aspects: (1) Automatic linguistic analysis of documents by means of available text analysis tools on the level of morphological structure, part of speech, global syntactic structures and semantic roles (subject, object, temporal, location) for the construction of potentially relevant stylistic characteristics. (2) Unsupervised and supervised learning techniques for selecting characteristics with high information value and constructing a model of authorial style. (3) Evaluation of these models by (a) comparison with stylistic analyses in linguistics and literary science and (b) empiric testing of the predictive power of the models.

Researcher(s)

Research team(s)

The Sawa Corpus ¿ a parallel corpus "English ¿ Kiswahili". 01/01/2007 - 31/12/2008

Abstract

This project aims to develop an aligned parallel corpus for the language pair English ¿ Kiswahili by means of semi-automatic annotation. This alignment not only facilitates research on statistical machine translation, but also enables projection of annotation between the two languages. In this project we investigate how dependency analyses can be projected from a source language (English) unto a target language (Kiswahili).

Researcher(s)

Research team(s)

Linguistic description of resource-scarce languages using machine learning techniques. 01/10/2006 - 30/09/2009

Abstract

Linguistically annotated corpora are an important tool in the development of Natural Language Processing (NLP) applications. For commercially interesting languages, these corpora can be used to induce accurate and robust NLP tools to process new data. If no such corpora exist, which is by definition the case for resource-scarce languages, the traditional data driven algorithms are largely useless. This project investigates the automated linguistic description of minority languages on the basis of alternative classification techniques. The algorithms researched in this project avoid the need for annotated data in the target language by automatically inducing a classification, either on the basis of free text (technique: "unsupervised learning") or by using existing annotated corpora in another language (technique: "knowledge transfer"). The methodology proposed in this project allows for a hitherto largely unexplored systematic comparison and evaluation of these techniques.

Researcher(s)

Research team(s)

Exploitation of CGN annotation for portability to new information sources. 01/05/2005 - 31/12/2006

The bottleneck of syntax in language technology research: integration of memory based machine learning in the AI-investigation of the Origins of Language. Optimising both paradigma through and for syntactic research. 01/10/2000 - 30/09/2002

Abstract

This project investigates the potential integration of two Artificial Intelligence domains by investigating the problematic role of syntax within both lines of research. Syntactic research within the subfield of Memory Based Reasoning is concerned with optimising two classification tasks: classification of segmentation (delimiting consituents) and classification of disambiguation (assigning grammatical labels).The robotic experiments that are being conducted within the Origins of Language research at the AI-lab (VUB), can likewise be interpreted as classification experiments. This classification task is problematic in both domains. Joint experiments, in which properties of both MBR and the OoL research will be combined, will try to attribute new insights in both research areas, so that a number of important limitations can be resolved.

Researcher(s)

Research team(s)

    The bottleneck of syntax in language technology research: integration of memory based machine learning in the AI-investigation of the Origins of Language. Optimising both paradigma through and for syntactic research. 01/10/1998 - 30/09/2000

    Abstract

    This project investigates the potential integration of two Artificial Intelligence domains by investigating the problematic role of syntax within both lines of research. Syntactic research within the subfield of Memory Based Reasoning is concerned with optimising two classification tasks: classification of segmentation (delimiting consituents) and classification of disambiguation (assigning grammatical labels).The robotic experiments that are being conducted within the Origins of Language research at the AI-lab (VUB), can likewise be interpreted as classification experiments. This classification task is problematic in both domains. Joint experiments, in which properties of both MBR and the OoL research will be combined, will try to attribute new insights in both research areas, so that a number of important limitations can be resolved.

    Researcher(s)

    Research team(s)