Ongoing projects

Scientific Chair "International Francqui Professor 2021-2022" (Prof. dr. Bruce Connell). 01/10/2021 - 30/09/2022

Abstract

This International Chair focuses on language endangerment and the phonetic description of speech sounds which occur in the languages of the world. It will be investigated which phonetic dimensions are frequent in languages of the world and which typical patterns of sounds can be distinguished.

Researcher(s)

Research team(s)

The development of fundamental frequency in babbles and early words of typically developing children and children with hearing impairment: the case of intrinsic vowel pitch. 01/01/2021 - 31/12/2024

Abstract

In all languages of the world high vowels (such as /i/ in 'key') and /u/ in 'who') are pronounced with a higher pitch than low vowels (such as /a/ in 'far'). This phenomenon is known as 'intrinsic vowel pitch'. In the past, this phenomenon has been explained in two ways. On the one hand, intrinsic vowel pitch has to do with the operation of the speech organs: during the articulation of /i/ and /u/ the tongue is lifted far forward in the mouth. This tension pulls on the larynx and this stretches the vocal folds so that a higher pitch is obtained. In vowels like /a/ the vocal folds are not stretched to the same degree so that a lower tone is heard. On the other hand, this phenomenon supports the intentions of speakers who aim to make vowels sound as different as possible from each other in order to speak clearly. Scientists do not agree on which explanation is correct, but they do agree on the following: if the first explanation is correct then intrinsic vowel pitch is expected to occur in babble of deaf babies. Remarkably, this has never been systematically investigated in a large-scale study and this is precisely what this project aims to investigate.

Researcher(s)

Research team(s)

Exploring the limits of language non-selectivity: How do multilinguals process non-native cognates and interlingual homographs in sentences? 01/11/2020 - 31/10/2022

Abstract

When a bi- or multilingual reads a word in one language, does he/she automatically activate lexical representations from all his/her languages? This is what the widely accepted language non-selective account suggests, but the studies that support this hypothesis evince various methodological pitfalls. In a multilingual context like Flanders, it is especially important to investigate multilingual language processing in a systematic and thorough manner, to achieve an accurate understanding of how multilinguals process languages in their everyday lives. By doing so, we can find a reliable answer to the question whether they access their mental lexicon in a language non-selective or a language selective manner. The vast majority of studies that have examined bi-/multilingual language processing have used cognates (words that exist in two languages with the same meaning, e.g., "water") or interlingual homographs (IHs; words with two different meanings in two different languages, e.g., "fee" is Dutch for "fairy") that exist in the native language (L1). However, it is exactly this native language that is supposedly qualitatively different from any other language that a multilingual knows, and, hence, using words that occur in L1 may yield results that are not representative of lexical processing in general. Besides, studies often show words in isolation, which is not how we normally read. The present proposal circumvents these issues by embedding L2-L3 cognates and IHs in sentences.

Researcher(s)

Research team(s)

Dialect Syntax Revisited 01/01/2020 - 31/12/2024

Abstract

The scientific research network Re-Examining Dialect Syntax (REEDS-network) brings together linguistic researchers from Flanders, Europe and the US from different empirical and theoretical backgrounds and with complementary expertise, in an attempt to arrive at a deeper, more rounded and better grounded understanding of dialect syntax in particular and language variation in general.

Researcher(s)

Research team(s)

European Language Grid. 12/12/2019 - 30/11/2021

Abstract

ELG will strengthen the commercial European Language Technology landscape by establishing a pan-European marketplace. CLiPS is National Competence Centre (NCC) for Belgium. ELG has set up 32 NCCs to establish a strong European network. They will act as regional bridges to the project. The NCCs will support ELG in collecting regional information about companies, research centres, resources, services and projects. They will organise regional ELG workshops and promote ELG in their area and establish bridges to funding agencies.

Researcher(s)

Research team(s)

Artificial intelligence for creative language use. 01/12/2019 - 30/11/2021

Abstract

Recent progress in Natural Language Processing (NLP) has resulted in reliable pattern matching techniques (mostly based on deep neural networks) for many NLP tasks (text to speech, speech to text, text generation, text translation, multimodality, text analysis, …). The creative use of language (e.g. in advertising slogans, song texts, humor, irony, metaphor, …) has remained out of reach of current approaches. We will investigate how the improved stated of the art in 'literal' language processing can push the design of creative language processing systems. Valorisation roadmap: The research address two types of users and applications: (i) professional writers who will be able to use tools to generate ideas and concepts (puns, jokes, titles, short texts with metaphors) and (ii) language enthusiasts who will be provided with tools that can boost their output by producing examples and ideas. Approach: 1. Development of proofs of concept of domain-dependent creative writing 2. Design of applications in copy-writing 3. Design of applications in entertainment writing

Researcher(s)

Research team(s)

Accommodation and non-accommodation in adolescents' informal online writing: Social determiners and linguistic effects. 01/10/2019 - 19/01/2023

Abstract

The proposed study will analyze how teenagers adapt their informal online writing to their conversation partner, and by which social and contextual factors this process of accommodation is influenced. Since linguistic accommodation remains largely un(der)explored for social media writing, the project fills a gap. It will investigate the impact of multiple aspects of adolescents' socio-demographic profile and their interaction on a wide range of linguistic and pragmatic features. We will examine whether divergent patterns of linguistic adjustment can be observed for teenagers with distinct socio-demographic profiles, and which language features appear to be most or least affected. A major distinction will be made between analyses of robust intergroup accommodation and in-depth diachronic analyses of accommodation between particular individuals. This unique design might lead to challenging sociolinguistic findings with respect to the profile of (non-)accommodators. While it will primarily increase our understanding of the social, linguistic and pragmatic parameters that govern accommodative language behavior, it may in the end also open up a unique perspective on language change. Moreover, on a more general, theoretical level, this project aims to accurately delimit the concept of accommodation, in order to answer the fundamental question of whether we can unambiguously discriminate between true accommodation and other instances of linguistic adaptation.

Researcher(s)

Research team(s)

The linguistic landscape of hate speech on social media. 01/01/2019 - 31/12/2022

Abstract

Hate speech online is a widespread social phenomenon that frequently receives a lot of media attention. We are interested in the language that is being used to express hate in social media, specifically hate against migrants and LGBT people. After gathering enough examples from public Facebook pages, we will develop methods to automatically analyze the language in these texts. The analysis will be on different levels. Some simple forms of analysis include counting words, looking at spelling mistakes, and investigating grammatical aspects. In the more complex analysis we will examine the use of metaphors, the context of the hate speech and how the hate speech can be implicit in the text, rather than overtly present. Apart from the linguistic description of this phenomenon, we strive to build systems that can automatically recognize hate speech in social media text. The project is in cooperation with research groups in Slovenia and targets Dutch, Slovene, and English.

Researcher(s)

Research team(s)

The development and representation of Dutch syntax in learners of Dutch as a foreign language and learners of Dutch as a second native language. 01/01/2019 - 31/12/2022

Abstract

In current days of mass migration, many people learn a brand new language at a later age. This is not easy: Languages have both similarities and differences in the sentence structures with which they express particular meanings. For instance, the Dutch and French active sentences are similar in both languages (Le chat chasse la souris - De kat jaagt op de muis [The cat cases the mouse]), but Dutch has three different forms for the full passive sentence, whereas French has only one (La souris est chassée par le chat). How do learners deal with this? Previous research suggests that bilinguals share information about sentence structure across their languages, whenever these structures are similar enough. We proposed a developmental model for second language syntax in which learners go through 5 consecutive learning stages before they share syntax between languages. The goal of this project is to test and refine that theory. We will investigate the syntactic representations in different speakers of Dutch: 1) Flemish students with Dutch as their only native language; 2) Arabic-Dutch simultaneous bilinguals; 3) Walloon students who learned Dutch at the age of 10; 4) first generation immigrants learning Dutch as second Indo-European language. This will provide valuable information on the learning trajectory for Dutch syntax (with its possible problems) and on the influence of native language syntax on the development and the final representation of Dutch syntax. -

Researcher(s)

Research team(s)

The development of Dutch syntax in learners of Dutch as a foreign language: effects of immersion, language background and training by means of syntactic priming. 01/10/2018 - 30/09/2022

Abstract

Background: In these days of mass migration, many people learn a brand new language at a later age. This is not easy: Languages have both similarities and differences in the sentence structures with which they express particular meanings. For instance, the Dutch and French active sentences are similar in both languages (Le chat chasse la souris - De kat jaagt op de muis [The cat cases the mouse]), but Dutch has three different forms for the full passive sentence, whereas French has only one (La souris est chassée par le chat). How do learners deal with this? Aims: Previous research suggests that bilinguals share information about sentence structure across their languages, whenever these structures are similar enough. Hartsuiker and Bernolet (2017) proposed a developmental model for second language syntax in which learners go through several consecutive learning stages before they share syntax between languages. The challenging aspect is our goal to test that theory in ecologically valid settings. More specifically, we investigate the influence of immersion in the L2 and of knowledge of related languages on the development and the representation of Dutch syntax in students who learn Dutch as a foreign language. Additionally, we investigate whether and how syntactic priming experiments can aid the develoment of native-like production preferences in Dutch as an L2. Methodology: All studies in the project use syntactic priming as a tool (Branigan & Pickering, 2017): all sentences that need to be produced or comprehended are preceded by a prime sentence with the same or a competing syntactic structure. If a prime structure is represented in memory, it will influence the production and the comprehension of the upcoming sentence, within and across languages. We will investigate the syntactic representations in different speakers of Dutch: 1) Flemish students with Dutch as their only native language; 2) Walloon students who learned Dutch at the age of 10; 3) first generation immigrants learning Dutch as their first or second Indo-European language. The first production study compares groups 1 and 2. We investigate the representation of Dutch syntactic structures that lack a similar counterpart in the learners' native language (French) and we compare the production preferences for Walloon learners of Dutch living inan immersion context with the preferences of learners living in a monolingual French context. The second study investigates how we can boost the production of Dutch syntactic structures that are dispreferred due to influence of a native language. Studies 3 is a longitudinal study that explores the differences between the learning trajectories for Dutch syntax in native Arabic speakers who learn Dutch as their first or second Indo-European language (after English). Impact: By documenting the different stages in L2 syntactic development with actual learner data, this project will have a strong impact on both the psychology of language and on second language acquisition research. Additionally, this project will provide valuable information on the learning trajectory for Dutch syntax, more specifically on the influence of native language syntax, and on the effects of immersion, knowledge of related languages and specific training on the development and the final representation of Dutch syntax. Hence, the project outcome will be relevant to teachers and trainers of Dutch as a foreign language.

Researcher(s)

Research team(s)

Solving Combinatorial and Probabilistic Problems in Natural Language. 01/01/2018 - 31/12/2021

Abstract

This project wants to develop a fully automated approach to solving exercises about combinatorics and probability that can be found in introductory textbooks on discrete mathematics. The ability to solve such problems is an important cognitive and intellectual skill as it is evaluated as part of academic admission tests such as SAT, GMAT and GRE. The combinatorics and probability questions will be formulated in natural language and the task will be to automatically answer these questions. We shall develop a two-step approach for tackling this task. In the first step, a question formulated in natural language will be analysed and transformed into a high-level model specified in a declarative language. In the second step, the high-level model will be solved solved using the inference mechanisms of for the declarative modeling language. The language and its solvers will be based on principles of probabilistic programming, is an increasingly popular programming paradigm. While the immediate goal is to solve textbook exercises, the long term goal is to contribute to the automation of probabilistic and combinatorics problem solving and to enable the modeling and programming for such problems in natural language, two goals that are highly relevant to cognitive computing and artificial intelligence

Researcher(s)

Research team(s)

Errors outside the lab: the interaction of psycholinguistic and sociolinguistic variables in the production of verb spelling errors in informal computer-mediated communication. 01/01/2018 - 31/12/2021

Abstract

We will investigate how social and mental processes interact in the production of spelling errors in informal computer-mediated communication (CMC). Unlike many CMC-studies, the research will not focus on prototypical CMC-features, but on unintentional spelling deviations on verb forms whose pronunciation corresponds to two spelling forms (homophones). We will study an extensive corpus of informal CMC produced by Flemish adolescents. The correct rendering of verb homophones presupposes the time-consuming application of grammatically informed spelling rules. Psycholinguistic findings show that, when working memory runs out of resources, the higherfrequency homophone can cause intrusion errors. While we expect social variables to affect (1) the NUMBER of spelling errors, we assume that they are less likely to affect (2) the PATTERN of these errors. Hypothesis (1) is inspired by sociolinguistic findings on gender and age differences with respect to norm sensitivity. Norm sensitivity should affect working-memory (conscious processing); hence, only error rates. We will also include the youngsters' educational track. Hypothesis (2) is related to the online writing process, which triggers speedy interaction. We will investigate whether the CMC-context leads to the same intrusion errors that writers find so hard to control under time-pressure. This interdisciplinary approach should lead to innovative contributions to psycholinguistics, sociolinguistics and CMC-studies. -

Researcher(s)

Research team(s)

A longitudinal approach to phonetic enhancement in infant directed speech: normally hearing infants and hearing-impaired infants with a cochlear implant. 01/01/2018 - 31/12/2021

Abstract

The aim of the present project is to investigate Infant Directed Speech (IDS). Since the pathbreaking work of i.a. Snow & Ferguson (1977) a consensus has grown that IDS exhibits particular characteristics that distinguish it from Adult Directed Speech (ADS). A case in point is the production of vowels: in IDS vowels are produced more "clearly" than in ADS, as can be inferred from the larger vowel space in IDS (Kuhl 2000). This "received wisdom" has recently been fundamentally questioned. For instance, Martin et al. (2015) conclude their study of Japanese IDS and ADS: "Mothers speak less clearly to infants than to adults." We want to further investigate this contradiction by replicating the findings reported in the literature using a large database of Dutch IDS and ADS, and by systematically scrutinizing two variables that have been largely neglected up till now: 1. longitudinal development: how does IDS change relative to chronological age and, more importantly, "linguistic age" as represented by a.o. the child's evolving cumulative vocabulary and utterance length? 2. characteristics of the child as interlocutor: does speech directed to a child with normal hearing (NH) differ from speech directed to a deaf child with a cochlear implant (CI)?

Researcher(s)

Research team(s)

Optimization of the adaptability of clinical information extraction systems: deep learning and use of feedback propagation techniques. 01/09/2017 - 31/08/2021

Abstract

Large amounts of unstructured medical data (for example clinical notes) are today available, which offers opportunities for optimization of healthcare quality and patient security. Although Natural Language Processing technology already offers great tools and solutions to automate the processing of medical documents, performance of this technology often decreases with changes of the extraction context (medical specialty, hospital, physician's writing style). This project will study the possibility of a scalable NLP engine able to adapt to such new contexts. To reach this goal, we will explore and combine approaches based on deep neural networks, the human-in-the-loop paradigm and persistent learning. The project is a collaboration with LynxCare Clinical Informatics, a medical IT company focusing on promoting access to medical information and reducing administrative costs in hospitals.

Researcher(s)

Research team(s)

Past projects

FWO Sabbatical 2019-2020 (Steven Gillis). 01/10/2019 - 30/09/2020

Abstract

The aims of planned research are: 1. Study of the speech and language development in congenitally deaf children with a cochlear implant: preparation of a state-of-the-art of the recent literature including our own empirical findings; 2. Study of speech and language development in congenitally deaf children with an auditory brainstem implant: analysis of a recently collected longitudinal corpus; 3. Preparation of the longitudinal and cross-sectional corpora collected by our research group over the last 40 years: integration into TalkBank.

Researcher(s)

Research team(s)

Language development after pediatric brainstem implantation. 01/04/2019 - 30/03/2020

Abstract

This project aims to examine the oral language development of congenital deaf children with auditory brainstem implants. Thus far, only a handful of studies examined the oral language skills of children and adolescents with ABI, without going into linguistic detail. In this project, the development of their speech production will be investigated into linguistic detail and compared to that of children with typical hearing and another group of congenital deaf children, viz. children with cochlear implants. The outcomes of this project will be crucial on different levels. First, they are theoretically important to further our understanding about the role of auditory input and brain stimulation for language development. Second, results can guide speech and language therapy for these children with auditory brainstem implants, since the current therapy is entirely based on that for children with cochlear implants, without any linguistic comparisons between both groups of children. Finally, the resulting information is crucial, for e.g. parents, to determine whether the benefit of ABI implantation outweighs the surgical risk.

Researcher(s)

Research team(s)

The role of semantics in modeling the bilingual mental lexicon. 01/10/2018 - 18/06/2020

Abstract

Bilinguals, people who simultaneously know and use two or more languages, are an interesting source of clues for discovering the internal make-up of our language system. Specifically, it is interesting how bilinguals are able to reliably access the right words in the right language without making mistakes, even though languages contain significant amounts of overlap in terms of semantics, orthography and phonology. In computational psycholinguistics, we model phenomena such as word retrieval via computer models. Despite the fact that we do not have access to the actual word store embedded in our mind, modeling can provide us with clues as to how it is organized, more particularly, by constructing models that can simulate key findings in psycholinguistic experiments. Having said that, current models for bilingual word reading can account for most of the facts, but largely underspecify a crucial component of our day-to-day word retrieval: meaning. Moreover, and related to this shortcoming, most models of word access have only modeled words in isolation. In reality, however, words are always embedded in sentences and larger linguistic and non-linguistic contexts, which also influence the way we access our words. By creating models of sentence processing, we can make sure that meaning has a more central role in our models, and thereby give new explanations for several phenomena in bilingual word processing.

Researcher(s)

Research team(s)

Sabbatical Leave Project, 2018-2019 01/10/2018 - 30/09/2019

Abstract

Two sub-projects are addressed: in stylometry, methodological issues are addressed, especially related to personality prediction from text: feature optimization, data acquisition and quality, model selection, and especially explanation of trained machine learning models. In machine learning for natural language, approaches are investigated on how to combine knowledge and reasoning with the currently predominant deep learning "black boxes".

Researcher(s)

Research team(s)

Auditory brainstem implantation and language development 01/10/2017 - 30/09/2020

Abstract

This project aims to investigate the oral language development of congenitally hearing-impaired children with an auditory brainstem implant (ABI). ABI is a relatively new development to restore the hearing of children with a severe-to-profound hearing loss due to i.a. the absence of the auditory nerve. The speech perception outcomes of children with ABI have been investigated, but detailed linguistically underpinned studies of their speech production are virtually lacking. The goal of the present research project is to provide a first linguistically motivated description of the lexical and phonological development of children with ABI. Their development will be evaluated against the background of the acquisition process of normally hearing children and that of severe-to-profound hearing-impaired children who received a cochlear implant. The focus is on the longitudinal development of the word productions of children with ABI. First, we investigate their cumulative vocabularies and the balance between their spoken and signed words (lexical development). Second, their word productions are analysed from a phonological perspective: in what order are segments acquired and what phonological regularities account for that order and (possible) deviations from that order? Which segmental substitution and deletion patterns occur? What is the consistency and variability of their productions and how does the accuracy of their word productions develop relative to the adult target forms?

Researcher(s)

Research team(s)

Identifiability and intelligibility of the speech of hearing impaired children using a cochlear implant 01/10/2017 - 30/09/2019

Abstract

Until recently children who were born "deaf" remained "deaf", and thus were unable to acquire spoken language. Fortunately nowadays deaf children with a cochlear deficit can be helped with a surgical intervention: they receive a cochlear implant (CI) very early in life so that they can "hear", i.e., can experience sound sensations. The first concern that the parents of these children phrase, is: "will my child hear with an implant?" The answer is definitely positive. The second question usually is: "will my child speak and sound like a normal hearing (NH) child of the same age?" This question remains unanswered. We want to address this issue from two perspectives: the identifiability and intelligibility of CI children. Recent findings indicate that the speech of 6- to 7-year-old CI users deviates from that of NH peers in particular fine details. But are those details that we can measure also detectable by the human ear? Are they sufficient to reliably identify CI children's speech? This will be investigated by having people listen to recordings of speech of CI children, children with an acoustic hearing aid (HA), and NH children. A second main research question concerns the intelligibility of CI children's speech. When the children enter mainstream primary school, it is quintessential to know if they are intelligible for people not familiar with them. In this project we will assess their intelligibility using different methodologies.

Researcher(s)

Research team(s)

The role of semantics in modeling the bilingual mental lexicon. 01/10/2016 - 30/09/2018

Abstract

Bilinguals, people who simultaneously know and use two or more languages, are an interesting source of clues for discovering the internal make-up of our language system. Specifically, it is interesting how bilinguals are able to reliably access the right words in the right language without making mistakes, even though languages contain significant amounts of overlap in terms of semantics, orthography and phonology. In computational psycholinguistics, we model phenomena such as word retrieval via computer models. Despite the fact that we do not have access to the actual word store embedded in our mind, modeling can provide us with clues as to how it is organized, more particularly, by constructing models that can simulate key findings in psycholinguistic experiments. Having said that, current models for bilingual word reading can account for most of the facts, but largely underspecify a crucial component of our day-to-day word retrieval: meaning. Moreover, and related to this shortcoming, most models of word access have only modeled words in isolation. In reality, however, words are always embedded in sentences and larger linguistic and non-linguistic contexts, which also influence the way we access our words. By creating models of sentence processing, we can make sure that meaning has a more central role in our models, and thereby give new explanations for several phenomena in bilingual word processing.

Researcher(s)

Research team(s)

Deep linguistic features for computational stylometry. 01/10/2016 - 30/09/2018

Abstract

The goal of stylometry is to understand and model how variations in writing style are related to (properties of) the author of a text. This research provides insight into how psychological and sociological properties of the author such as age, gender, region, personality, and others, are reflected in his or her idiolect. Such models can also be used to predict these author properties on the basis of text analysis. Applications range from literary studies to forensic science.

Researcher(s)

Research team(s)

ACCUMULATE: Acquiring crucial medical information using language technology. 01/01/2016 - 30/06/2020

Abstract

The ACCUMULATE project will automatically recognise crucial information in the free text of clinical reports written in English and Dutch by designing, developing and evaluating advanced language technology (LT) for deep semantic processing of the texts that are often morpho-syntactically not well-formed. An additional focus is on easy portability of the technology across domains and languages and on the use of visualisation techniques.

Researcher(s)

Research team(s)

An acoustic analysis of lexical stress and rhythm in early speech interactions of Dutch children and their primary caretakers: A longitudinal study. 01/10/2015 - 30/09/2018

Abstract

The main objective of this study is to investigate the acquisition of "lexical" stress and rhythm in the period when children produce canonical babbling and their first identifiable words. A good understanding of these phenomena in children's speech is of prime importance because it has been shown that prosody plays a cardinal role in children's language acquisition.

Researcher(s)

Research team(s)

Identifiability and intelligibility of the speech of hearing impaired children using a cochlear implant. 01/10/2015 - 30/09/2017

Abstract

Until recently children who were born "deaf" remained "deaf", and thus were unable to acquire spoken language. Fortunately nowadays deaf children with a cochlear deficit can be helped with a surgical intervention: they receive a cochlear implant (CI) very early in life so that they can "hear", i.e., can experience sound sensations. The first concern that the parents of these children phrase, is: "will my child hear with an implant?" The answer is definitely positive. The second question usually is: "will my child speak and sound like a normal hearing (NH) child of the same age?" This question remains unanswered. We want to address this issue from two perspectives: the identifiability and intelligibility of CI children. (1) Identifiability: Recent findings indicate that the speech of 6- to 7-year-old CI users deviates from that of NH peers in particular fine details. But are those details that we can measure also detectable by the human ear? Are they sufficient to reliably identify CI children's speech? This will be investigated by having people listen to recordings of speech of CI children, children with an acoustic hearing aid (HA), and NH children. (2) Intelligibility: A second main research question concerns the intelligibility of CI children's speech. When the children enter mainstream primary school, it is quintessential to know if they are intelligible for people not familiar with them. In this project we will assess their intelligibility using different methodologies.

Researcher(s)

Research team(s)

Francqui Chair 2015-2016 Prof. Peter Mariën. 01/10/2015 - 30/09/2016

Abstract

Proposed by the University, the Francqui Foundation each year awards two Francqui Chairs at the UAntwerp. These are intended to enable the invitation of a professor from another Belgian University or from abroad for a series of ten lessons. The Francqui Foundation pays the fee for these ten lessons directly to the holder of a Francqui Chair.

Researcher(s)

Research team(s)

Digital Humanities Flanders. 01/01/2015 - 31/12/2019

Abstract

This is a fundamental research project financed by the Research Foundation – Flanders (FWO). The project was subsidized after selection by the FWO-expert panel. Its aim is to initiate cooperation between research groups.

Researcher(s)

Research team(s)

The interaction of gender and social class in Flemish online teenage talk. 01/01/2015 - 31/12/2018

Abstract

Social class differences in teenage speech remain largely unexplored, while gender has been focused on in quite a lot of sociolinguistic research on adolescent peer group language. The interest in gender differences has also pervaded the research on informal computer-mediated communication (CMC) and more specifically on the online writing practices of adolescents in chat or texting media, but then again, the link with social class is generally absent. Yet some studies (though not on CMC) suggest that gender differences manifest themselves in different ways in different social class groups. The present research is a first attempt to fill this gap, by focusing on the interaction between social class and gender in Flemish chat language produced by adolescents with a low versus a high level of education.

Researcher(s)

Research team(s)

Deep linguistic features for computational stylometry. 01/10/2014 - 30/09/2016

Abstract

The goal of stylometry is to understand and model how variations in writing style are related to (properties of) the author of a text. This research provides insight into how psychological and sociological properties of the author such as age, gender, region, personality, and others, are reflected in his or her idiolect. Such models can also be used to predict these author properties on the basis of text analysis. Applications range from literary studies to forensic science.

Researcher(s)

Research team(s)

Text analytics web services for profiling and opinion mining. 01/02/2014 - 31/01/2015

Abstract

Our aim is to implement commercial web services for automatic opinion detection and author profiling (age, gender, personality, education, dialect) in text. In this project we will develop the core technology: data mining and annotation, machine learning and setting up the server. In a follow-up project we will then launch a spin-off company. This kind of language technology is useful for a wide range of big data applications, and does not yet exist for Dutch, and only in part for English.

Researcher(s)

Research team(s)

Fine-Grained Sentiment and Opinion Mining of Political Social Network Messages 01/02/2014 - 31/12/2014

Abstract

This project aims to develop an annotated corpus for the purpose of fine-grained sentiment and opinion mining of social network messages. As a case study, we will monitor messages on politics in the run-up to the 2014 Belgian elections. We will annotate not only the sentiment expressed in the message in a more robust way, but also mark information on the opinion holder, the object of the opinion and the features of the object.

Researcher(s)

Research team(s)

Cognitive control in the lexical processing of interlingual and intralingual homographs. 01/01/2014 - 31/12/2017

Abstract

The research project has two major objectives: 1. An in-depth study of cognitive control in the process of visual word recognition 2. The integration of research on intralingual and interlingual lexical processing

Researcher(s)

Research team(s)

Bootstrapping operations in language acquisition: a computational psycholinguistic approach. 01/01/2014 - 31/12/2017

Abstract

The acquisition of abstract linguistic categories is investigated. Computational models of bootstrapping operations are constructed in order to investigate how knowledge from one domain can be instrumental in acquiring knowledge of another domain. In our simulations the language addressed to very young children is used in an attempt to elucidate how grammatical categories and grammatical gender are acquired given a combination of distributional, phonological and morphological bootstrapping.

Researcher(s)

Research team(s)

Stress and Rhythm in Early Speech Productions of Hearing and Congenitally Deaf Children with a Cochlear Implant: A Longitudinal Study. 01/11/2013 - 31/10/2017

Abstract

Newborn babies have been shown to be sensitive to the speech melody of the language that they hear: they recognise the word stress patterns of their mother's language, and they are sensitive to the rhythm of that language (for instance, babies can distinguish what has been called the 'Morse Code' rhythm of Germanic languages and the 'Machine Gun' rhythm of Romance languages). Thus, already in the first year of life, infants seem to know a lot about how their ambient language sounds. Nevertheless, it is not known when and how they use this knowledge in their own speech production. This project investigates infants' babbling (adult sounding syllable sequences) and their early word productions in the first two years of life. The main research question is: when and how do they produce stress (the relative prominence of syllables) and when do we find evidence that they adopt the speech rhythm of the ambient language? This is investigated by means of an acoustic analysis of children's speech and an analysis of the speech of their primary caretakers, which will represent the adult target model. A second aim is to investigate whether congenitally hearing impaired children who received a cochlear implant very early in life show similar acoustic correlates of stress marking in their speech and display similar rhythmicity as their hearing peers.

Researcher(s)

Research team(s)

Evaluation of tools within the SUCCEED project. 25/10/2013 - 24/10/2014

Abstract

This project represents a formal service agreement between UA and on the other hand the University of Alicante. UA provides the University of Alicante research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Research team(s)

What masked words can do: Preactivation or Retrospective recruitment? 01/10/2013 - 30/09/2015

Abstract

The purpose of the present research proposal is to find out whether Bodner & Masson's view can be upheld. The general rationale that will guide all experiments is the question whether masked priming effects activate episodic memory traces when access to lexical memory (the mental lexicon) is sufficient for performing the experimental task. This general question will be approached in two ways: (i) can masked primes access episodic traces that were created in a training phase prior to the experiment and (ii) do masked primes themselves leave episodic traces?

Researcher(s)

Research team(s)

An acoustic analysis of lexical stress and rhythm in early speech interactions of Dutch children and their primary caretakers: a longitudinal study. 01/10/2013 - 30/09/2015

Abstract

The main objective of this study is to investigate the acquisition of "lexical" stress and rhythm in the period when children produce canonical babbling and their first identifiable words. A good understanding of these phenomena in children's speech is of prime importance because it has been shown that prosody plays a cardinal role in children's language acquisition.

Researcher(s)

Research team(s)

Speech accuracy in young children: hearing and hearing impaired toddlers with a cochlear implant. 01/01/2013 - 31/12/2016

Abstract

The aim of the current project is to investigate early sound development in two populations differing in access to spoken language: children with normal hearing (NH) and congenitally deaf children with "received hearing" due to cochlear implantation (CI) at an early age. In comparing speech accuracy of these two groups with "different degrees of hearing", we aim to gain a better insight into the role of the auditory perception system in language development.

Researcher(s)

Research team(s)

Automatic Monitoring for Cyberspace Applications (AMiCA). 01/01/2013 - 31/12/2016

Abstract

This project represents a research agreement between the UA and on the onther hand IWT. UA provides IWT research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Research team(s)

Project website

Digital Archive of Belgian Neo-Avant-garde Periodicals (DABNAP). 01/01/2013 - 31/12/2014

Abstract

Post-war artists' periodicals are a prime example of the neo-avant-garde DIY ethos, and simultaneously constitute a crucial source of information about this movement. This project aims to digitize a substantial and representative corpus of Belgian neo-avant-garde periodicals. Subsequently, innovative language processing tools will be applied in order to extract and visualize the network of artists who were behind the periodicals.

Researcher(s)

Research team(s)

Project website

Audio-description in Dutch: A corpus-based study into the linguistic features of a new, multimodal text type. 01/10/2012 - 30/09/2016

Abstract

The project presented here is a corpus-based study of the linguistic features of a new, multimodal text type within Audiovisual Translation (AVT): Audio-description (AD) for the blind and visually impaired. The aim of this interdisciplinary project is to describe the lexico-grammatical features of AD-scripts and examine the role they play in the specific communicative function of the text. The object is to explore one of the key-issues in AD research: How are images put into words and what are the implications for the language use in AD? A recent pilot study confirmed the hypothesis that the language of AD contains distinctive grammatical (morpho-syntactic) and lexical features and that these specific patterns can be identified by corpus analysis. Firstly, the current project aims to develop an extensive and varied text corpus of AD scripts of Dutch audio-described films and series. Secondly, this text corpus will provide the basis for quantitative linguistic research, aiming to identify the prominent lexico-grammatical features of the text type. Finally, the quantitative analysis will be combined with a qualitative analysis of the (communicative) function of these features. In this last stage, special attention must be paid to the multimodal nature of the text type, since the AD-script only makes sense in combination with the dialogues, music and sound effects of the original film or series with which it forms a coherent whole. A qualitative analysis into the (communicative) function of the features will explore the unique interaction between the language of AD and the other channels of the audiovisual text. Ultimately, the project's ambition is to conduct an extensive linguistic audience design oriented analysis of the AD-discourse. This will allow us to identify the features that characterise the AD text type, will clarify how these linguistic and stylistic features are used to ensure maximum communicative efficiency, and how these features are related to the function and multimodal character of AD. The project presented here is a pioneer in the field: AD has become an international research topic recently but for Flanders and the Netherlands no study of AD is available yet. In addition, it can offer the basis for future application-oriented studies. AD in Flanders is in its infancy (public broadcaster VRT only started with its first audio-described series in January 2012). In brief, basic research projects like the one presented here support the development of a local AD tradition in Flanders that meets international quality standards.

Researcher(s)

Research team(s)

Antwerp Yiddish Noun Plurals (AYNP). 01/09/2012 - 31/08/2013

Abstract

The project will explore structure and acquisition in contemporary Yiddish used by the Jewish Ultra-Orthodox community in Antwerp, Belgium. This community lives in a unique multilingual situation that includes three main languages: Yiddish and Dutch - two living languages competing as native tongues, and Loshn Koydesh (Classical Hebrew) - restricted only for praying and not used for daily communication. Our window onto native Antwerp Yiddish is the system of noun plurals, whereby a singular noun takes on a plural suffix. The aim of the project is two-fold: first, to describe the system of noun plurals as it is currently used by adults, taking into account the intensive contact with Dutch, and second, to understand how this system is acquired by children from the same community.

Researcher(s)

Research team(s)

Automatic Compound Processing. 01/07/2012 - 31/12/2013

Abstract

The central problem to be addressed in this project concerns a multidisciplinary investigation into sharing of knowledge and resources between closely-related languages, specifically relating to the automatic processing of compounds. Specifically, we will explore the possibility to create new knowledge about closely- related languages, and efficiently develop additional, more advanced resources for (a) compound segmentation; and (b) the semantic analysis of compounds.

Researcher(s)

Research team(s)

Morphosyntactic language skills in deaf children with cochlear implant: a cross-linguistic study on Dutch and German (MORLAS). 01/07/2012 - 30/09/2013

Abstract

The proposed project will investigate speech and language skills of cochlear implant children at the onset of their school career. The project will focus on Cochlear Implant children's achievements in a major aspect of language, its morphosyntax.

Researcher(s)

Research team(s)

Abstract rules or statistical learning? The impact of lexical and sublexical homophony in spelling and reading homophonous verb forms. 01/01/2012 - 31/12/2015

Abstract

Homophone intrusions in the spelling of regularly inflected Dutch verb forms are used to address a central question in psycholinguistics – and cognitive science in general: do people rely on symbolic mental rules or on a knowledge base that captures the co-occurrence probabilities in the learning domain (statistical learning)? Earlier findings in our research group indicated an effect of homophone dominance in the pattern of intrusion errors when spelling homophonic verb forms: such errors occur more often when the target is the lower-frequency homophone and the intruder the higher-frequency homophone. This is compatible with a statistical learning view but cannot reject a rule-based account enriched with a frequency-sensitive mechanism. To disentangle the two accounts we will compare error patterns in the lexical and sublexical domains. An effect of homophone dominance at the sublexical level cannot be explained by a rule model. Errors in the lexical and sublexical domains will be studied in spelling and reading tasks. Finally, we will attempt to simulate the experimental patterns with two types of computational models: a symbolic model, using morphemes and rules, and a memory-based model, storing whole word forms only and using a similarity metric that can 'discover' patterns in its memory store. Together, the experimental and simulation data should enable us to formulate an answer to the question about mental rules.

Researcher(s)

Research team(s)

What masked words can do: Preactivation or Retrospective recruitment? 01/10/2011 - 30/09/2013

Abstract

The purpose of the present research proposal is to find out whether Bodner & Masson's view can be upheld. The general rationale that will guide all experiments is the question whether masked priming effects activate episodic memory traces when access to lexical memory (the mental lexicon) is sufficient for performing the experimental task. This general question will be approached in two ways: (i) can masked primes access episodic traces that were created in a training phase prior to the experiment and (ii) do masked primes themselves leave episodic traces?

Researcher(s)

Research team(s)

Understandable Dutch: the accessibility of the language of the news for different audiences. 14/09/2011 - 31/12/2012

Abstract

This project represents a formal research agreement between UA and on the other VRT. UA provides VRT research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Research team(s)

Child-directed speech and language development: hearing children of different SES backgrounds and deaf children with a cochlear implant. 01/01/2011 - 31/12/2014

Abstract

In this project we want to test the hypothesis that this relative poverty of the input is already manifest during the prelinguistic and early linguistic stages of language acquisition: particular aspects of the input make the discrimination of sounds more difficult, and make the segmentation of speech into sounds, and words, and phrases much more difficult.

Researcher(s)

Research team(s)

OPTI-FOX - Optimization of the automated fitting to outcomes expert with language-independent hearing-in-noise test battery and electro-acoustical test box for cochlear implant users. 01/11/2010 - 31/10/2012

Abstract

The main objectives of the research project are (i) to turn an existing theoretical automated fitting model into a clinical application by means of various techniques from statistics, machine learning and optimisation; (ii) to develop an evaluation tool to measure functional hearing capacities, in casu the ability to understand speech-in-noise, representative for day-to-day listening situations.

Researcher(s)

Research team(s)

AMICA - Automatic monitoring for cyberspace applications. 01/10/2010 - 30/09/2011

Abstract

This project represents a research agreement between the UA and on the onther hand IWT. UA provides IWT research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Research team(s)

Statistical Relational Learning of Natural Language. 01/01/2010 - 31/12/2013

Abstract

This project wants to investigate how techniques of statistical relational learning can be used for natural language processing. The focus will be on challenging natural langauge processing tasks, such as semantic role labeling, where syntac and semantic depedencies, structured and unstructured data, local and global models, and probabilistic and logical information must be combined with one another. For what concerns statistical relational learning, the emphasis will lie on probabilistic extensions of the programming language Prolog. The project does not only aim at obtaining improved natural language processing techniques but also better algorithms and systems for statistical relational learning.

Researcher(s)

Research team(s)

Adolescent chat language in Flanders: the language geography of Flemish (sub)standardization processes againts the background of the international chat scene. 01/01/2010 - 31/12/2013

Abstract

RESEARCH QUESTIONS: -To what extent do Flemish teenagers integrate morpho-syntactic and phonological features of the Brabantic regiolect in their chat language: what are the relative frequency scores of the Brabantic regiolect variants, the non-Brabantic regiolect variants and the Standard Dutch variants for several variables? -What is the impact of the independent variable 'hometown'? Is there a correlation between the relative representation of Brabantic regiolect features and the region where the chatters come from? To what extent do teenagers from the provinces of West-Flanders, East-Flanders and Limburg integrate morpho-syntactic and phonological features from the Brabantic regiolect in their chat language? In other words: to what extent do the data reveal an expansion of Brabantic features? -What is the impact of local versus supraregional communication? Do teenagers who do not live in the Brabantic dialect area use more morpho-syntactic and phonological Brabantic features in 'interregional' than in 'intraregional' or local chat communication? Do the answers to the previous questions confirm the hypothesis that the linguistic situation in Flanders is marked by an autonomous informal standardisation process which is marked by a generalization/increasing use of the Brabantic 'tussentaal' (regiolect, intermediate language)? What are the implications of this study with respect to the relevance/applicability of chat data for the study of language variation and language change in progress? -What is the pragmatic function of several varieties? What is the position and function of English in the linguistic repertoire of Flemish teenagers? How does the chat language of Flemish teenagers connect with the international chat culture? What kind of appropriation (localization processes) can be discriminated?

Researcher(s)

Research team(s)

A Safer Internet: (Semi)automatically Recognizing Internet Paedophilia in Multilingual Online Social Networks. 01/01/2010 - 31/12/2013

Abstract

In this project we on the one hand propose a methodology to (semi)automate the manual control of peer-to-peer networks and on the other hand a methodology for the automatic extraction and analysis of stylistic characteristics (associated to personality, age group and deceptive language usage) which we want to apply to both individual internet paedophiles and groups of paedophiles in chat rooms.

Researcher(s)

Research team(s)

Project website

Interpersonal communication training of natural language interaction with autonomous virtual characters (deLearyous). 01/01/2010 - 31/12/2012

Abstract

The goal of the deLearyous project is to develop an interactive serious 3D-game for training interpersonal communication skills in a professional context, e.g., employer-employee or customer-seller relations. The game allows trainees to interact woth virtual autonomous characters who react in a realistic and expressive way to the input of the trainee. In this way, the trainee can exercise different behavioural patterns and roles in a safe virtual environment. The role of CLiPS in the project is to develop algorithms and methods for emotion analysis of text, topic detection in text, and dialogue management.

Researcher(s)

Research team(s)

Project TST Tools for Dutch as Web services in a Workflow (TTNWW). 01/01/2010 - 30/09/2012

Abstract

This project represents a formal research agreement between UA and on the other hand the Flemish Public Service. UA provides the Flemish Public Service research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Research team(s)

A web service for stylometry and readability research for the Dutch language (STYLENE). 01/01/2010 - 31/12/2011

Abstract

The goal of this project is to implement a robust, modular system for stylometry and readability research on the basis of existing techniques for automatic text analysis and machine learning, and the development of a web service that allows researchers in the humanities and social sciences to analyze texts with this system. In this way, the project will make available to researchers recent advances in research on the computational modeling of style and readability.

Researcher(s)

Research team(s)

The computational learnability of morphologically complex languages. 01/10/2009 - 30/09/2012

Abstract

Goals of the project: Traditional spell checkers make use of an extensive word list. If a word does not occur in this list, it is marked as a spelling error. More recent systems (e.g. Németh 2009) approach the problem of spell checking for agglutinating languages from a different angle: a word is considered as a spelling error, if it cannot be generated by an underlying morphological model of the language. In this project, we investigate how such a spell checker can be used as a tool in the automatic induction of a morphotactic system for Swahili.

Researcher(s)

Research team(s)

Towards a synthesis of knowledge based and data-based methods in computer linguistics. 01/10/2009 - 30/09/2010

Abstract

Hybrid systems for natural language processing that combine deep analysis, based on linguistic insight, with inductive data-oriented methods, can provide a significant improvement of the accuracy and applicability of computational linguistics. There are, however, many different ways in which this kind of hybridisation can be achieved. In this project, I will look at cognitive science as an inspiration source for new hybrid approaches. This work will build on earlier work on memory-based language processing as a cognitively relevant model.

Researcher(s)

Research team(s)

Machine learning for data mining and its applications. 01/01/2009 - 31/12/2013

Abstract

The research community aims at strengthening and coordinating the Flemish research about machine learning for datamining in general, and important applications such as bio-informatics and textmining in particular. Flemish participants: Computational Modeling Lab (VUB), CNTS (UA), ESAT-SISTA (KU Leuven), DTAI (KU Leuven), ISLab (UA).

Researcher(s)

Research team(s)

The source of masked priming effects: lexical or episodic memory? 01/01/2009 - 31/12/2012

Abstract

Masked priming is a technique in which a word is presented so briefly that it cannot be consciously perceived, while at the same time it has an effect on the processing speed of a subsequently presented word. For this reason the technique is often used to investigate the nature of memory structures and processes underlying word recognition. However, recently the lexical nature of these masked priming effects has been called into question by Bodner and Masson (2003, 2004, 2006). Do these effects reflect the structure of the mental lexicon or do they reflect residual activation in episodic memory, where personal experiences are stored? A series of experiments is planned to investigate whether a lexical interpretation of the effect can be defended. Given the popularity of the technique the results of this research can have far-reaching consequences with respect to the theory formation on the mental lexicon.

Researcher(s)

Research team(s)

Project website

Artificial Creativity in visual communication and arts: an algorithm for inventive and evolving development of concepts and visualization of data. 01/10/2008 - 30/09/2012

Abstract

Using common techniques in Artificial Intelligence a software algorithm is developed that summarizes, interprets and processes textual content (or data sets). In an attempt to simulate human creativity the key concepts in this content are interrelated and recombined into creative and innovative graphical solutions and visualizations. The visual output evolves as the source data changes and expands.

Researcher(s)

Research team(s)

FlaReNet: Fostering Language resources Network. 01/09/2008 - 01/09/2011

Abstract

International cooperation and re-creation of a community are the most important drivers for a coherent evolution of the Language Resource (LR) area in the next years. FlaReNet will be a European forum to facilitate interaction among LR stakeholders. Its structure considers that LRs present various dimensions and must be approached from many perspectives: technical, but also organisational, economic, legal, political. The Network addresses also multicultural and multilingual aspects, essential when facing access and use of digital content in today's Europe.

Researcher(s)

Research team(s)

DUAL-PRO. Dual electric-acoustic speech processor with linguistic assessment tools for deaf individuals with residual low frequency hearing. 01/07/2008 - 30/06/2010

Abstract

To date, individuals with sensori-neural hearing loss may benefit from either acoustic stimulation (classical hearing aids) or electric stimulation (cochlear implants). Classical hearing aids are best suited for moderate and severe hearing losses and cochlear implants for profound losses. Cochlear implants enable profoundly deaf patients to reach high levels of speech intelligibility, but they are inadequate for the perception of music. The reason for this is that implants are conceived to code for the mid and high frequencies of sound ("spectral coding") since speech information is mainly contained in these frequencies. Implants are not performing well in the coding of low frequencies ("temporal coding"). These frequencies contain mainly information related to tonality, musicality, timbre, etc. Hearing aids perform much better in the temporal coding of low frequencies. Since most profoundly deaf persons have profound losses in the mid and high frequencies while they often have residual hearing in the low frequencies, the combination of the spectral coding of a cochlear implant with the temporal coding of a hearing aid, seems promising in improving the auditory performance of implant-wearers. In addition, temporal information seems of specific importance for the linguistic development in young children and it is anticipated that improving the low frequency perception may significantly enhance their linguistic capacities, thus decreasing their handicap and increasing the probability of mainstream integration. Main objectives of the proposed project: (i) to optimise deaf patients' hearing experience by developing a new hearing device which combines both types of stimulation in the same ear; (ii) to develop a test battery for prosody reception, i.e. the perception of language rhythm and melody; and (iii) to use this new prosody test battery as a quality measure for the current generation of cochlear implants and classical hearing aids, as well as for the newly developed hybrid electric-acoustic prototype.

Researcher(s)

Research team(s)

NEON: subtitling in Dutch. 01/06/2008 - 31/05/2009

Abstract

In this project, CNTS develops a system for automatic subtitling on the basis of the output of speech recognition. Such a system allows the simplification and shortening of sentences when needed without making them ungrammatical and without loosing their essential meaning. As a methodology, a combination of rule-based and statistical techniques was chosen. In the project, we cooperate among others with the Belgian and Dutch television and with the speech recognition research group of the University of Leuven.

Researcher(s)

Research team(s)

An evaluation corpus for automatic summarization. 01/01/2008 - 30/04/2009

Abstract

In this BOF project we aim to create an evaluation and development corpus for automatic summarization for Dutch. Automatically generated summaries can help to search and present large amounts of information. An important aspect in the development of automatic summarizers is the use of evaluation methods. We will construct an evaluation corpus consisting of 200 texts and minimally 5 different summaries per text.

Researcher(s)

  • Promotor: Hendrickx Iris

Research team(s)

Adolescent chat language in Flanders: the language geography of Flemish (sub)standardization processes. 01/01/2008 - 31/12/2008

Abstract

research questions: -To what extent are morpho-syntactic Brabantic regiolectic features integrated in the chat language of Flemish adolescents? -What is the impact of the regional background of the chatters? -Do the analyses confirm the hypothesis that Flanders is subject to an autonomous informal standardisation process characterized by an increasing supraregional use of Brabantic regiolect? -Does chat language offer us a new possibilities and challenges for the study of language variation and language change in progress?

Researcher(s)

Research team(s)

Mind your Syntax. Oral language development and development of Theory of Mind in deaf children with a cochlear implantaat. 01/12/2007 - 30/11/2010

Abstract

Research in the field of developmental cognitive neuroscience has shown that audition and language exhibit plasticity, i.e. the ability to modify pre-existing neural synaptic connections dedicated to particular cognitive systems, depending on the quantity and quality of the environmental stimuli during a specific developmental stage. However, there is very little consensus in the literature with respect to the precise limits of these windows of opportunity. In this project we will tackle the issue of plasticity of the auditory system and its effect on language and general cognitive development. Two main hypotheses will be tested (i) the development of sensory, language and higher cognitive systems is triggered by qualitatively and quantitatively sufficient stimuli within a well-determined time window; and (ii) language plays a crucial role in higher cognitive development, more particularly in Theory of Mind development. These hypotheses will be tested on populations of children that have been deprived from sound due to congenital deafness. Comparative cohort studies of oral Dutch deaf children who have received cochlear implants at different ages will enable us to answer the central question of this project, namely whether cochlear implantation early in life leads to better auditory perception, providing the redundancy necessary for incidental language learning and higher cognitive development.

Researcher(s)

Research team(s)

Publication of the monograph "Vocaalreductie in het Standaardnederlands in Vlaanderen en Nederland" (vocal reduction in the standard dialect in Flanders and the Netherlands). 09/10/2007 - 31/12/2007

What verbs want: an exemplar-based model of human sentence processing. 01/10/2007 - 30/09/2009

Abstract

Researcher(s)

Research team(s)

Price of the Research Council 2007. 01/08/2007 - 31/08/2007

Abstract

Researcher(s)

  • Promotor: Schauwers Karen

Research team(s)

Text Mining on heterogeneous knowledge bases. An application to optimised discovery of disease relevant genetic variants 01/07/2007 - 30/06/2011

Abstract

The project proposes a methodology for text mining with heterogeneous information sources and its application to molecular genetics/genomics and knowledge management. State of the art text analysis and graph-based data mining techniques will be extended to make the methodology possible, and the methodology will be applied in a biomedical application (ranking of candidate disease-causing genes) and a knowledge management application (person profiling from www information).

Researcher(s)

Research team(s)

Project website

Computational Techniques for Stylometry for Dutch. 01/01/2007 - 31/12/2010

Abstract

In this project we investigate a methodology for the automatic extraction and analysis of style that we want to apply to both individual authors (authorship attribution, both fiction and non-fiction) and groups of authors (extraction of stylistich characteristics associated to gender and age). This methodology covers several aspects: (1) Automatic linguistic analysis of documents by means of available text analysis tools on the level of morphological structure, part of speech, global syntactic structures and semantic roles (subject, object, temporal, location) for the construction of potentially relevant stylistic characteristics. (2) Unsupervised and supervised learning techniques for selecting characteristics with high information value and constructing a model of authorial style. (3) Evaluation of these models by (a) comparison with stylistic analyses in linguistics and literary science and (b) empiric testing of the predictive power of the models.

Researcher(s)

Research team(s)

Unlocking the teachers' room. Archiving and making available a collection of spoken Standard Dutch, produced by teachers of Dutch. 01/01/2007 - 31/12/2008

Abstract

The aim of this project consists in systematically archiving and making available 200 hours of spoken Standard Dutch, produced by 160 Flemish and Dutch teachers of Dutch. The speech collection concerned is highly valuable. With respect to the composition of the corpus several social and linguistic variables were taken into account. Furthermore, the recordings are of high (stereo) quality. Therefore this corpus can be used for phonetic, phonological as well as for sociolinguistic purposes.

Researcher(s)

Research team(s)

The Sawa Corpus ¿ a parallel corpus "English ¿ Kiswahili". 01/01/2007 - 31/12/2008

Abstract

This project aims to develop an aligned parallel corpus for the language pair English ¿ Kiswahili by means of semi-automatic annotation. This alignment not only facilitates research on statistical machine translation, but also enables projection of annotation between the two languages. In this project we investigate how dependency analyses can be projected from a source language (English) unto a target language (Kiswahili).

Researcher(s)

Research team(s)

Gravital: parsing and problem-solving in natural language as an engine for generating visual communication and art. 01/01/2007 - 31/12/2008

Abstract

The project addresses the application of parsing of natural language and problem solving as tools for the generation of visual communication and art. Within the context of the NodeBox application, we will adapt the MBSP shallow parser to the domain of design and visual communication and help integrating it into the NodeBox application.

Researcher(s)

Research team(s)

The influence of hearing on the early lexical development of deaf children with and without cochlear implants. 01/01/2007 - 31/10/2007

Abstract

In congenital deaf children with Cochlear Implants early language is acquired in two modalities, with both spoken words and signs; deaf children without CI normally acquire their language monolingually, namely by signs. By studying the early lexical acquisition of both groups longitudinally and by comparing the results with those of normally hearing children, this study will answer the question whether in children with CI a simultaneous acquisition with influence of one modality on the other is the case, or two separate developmental paths for both modalities.

Researcher(s)

Research team(s)

The acquisition of Romanian morphosyntax. 01/01/2007 - 31/10/2007

Abstract

Drawing on the first longitudinal corpus of child Romanian, the research project aims at a systematic analysis of the morphosyntactic development of Romanian monolingual children. Organized around the acquisition of the main functional domains within the clause, the research focusses on the relationship between the features of functional categories and syntactic operations.

Researcher(s)

  • Promotor: Coene Martine
  • Fellow: Avram Larisa

Research team(s)

Linguistic description of resource-scarce languages using machine learning techniques. 01/10/2006 - 30/09/2009

Abstract

Linguistically annotated corpora are an important tool in the development of Natural Language Processing (NLP) applications. For commercially interesting languages, these corpora can be used to induce accurate and robust NLP tools to process new data. If no such corpora exist, which is by definition the case for resource-scarce languages, the traditional data driven algorithms are largely useless. This project investigates the automated linguistic description of minority languages on the basis of alternative classification techniques. The algorithms researched in this project avoid the need for annotated data in the target language by automatically inducing a classification, either on the basis of free text (technique: "unsupervised learning") or by using existing annotated corpora in another language (technique: "knowledge transfer"). The methodology proposed in this project allows for a hitherto largely unexplored systematic comparison and evaluation of these techniques.

Researcher(s)

Research team(s)

Conceptual viewpoints: Elements of a cognitive account of English tense. 01/10/2006 - 31/03/2008

Abstract

The main objective of this project is to provide an abstract and comprehensive account of English tense, on the basis of cognitive mechanisms which may be independently motivated. The empirical work on tense, aspect, and modal markers in English will serve to inform this account, which aims at a level of explicitness deemed necessary for the purpose of modeling a language's tense system.

Researcher(s)

Research team(s)

DAESO - Detecting and exploiting semantic overlap. 01/06/2006 - 31/05/2009

Abstract

The well-known fact that similar information can be expressed in many different ways is one of the major challenges in building robust NLP applications. It is commonly assumed that such applications can be improved with knowledge of how natural language expressions relate to each other, for instance in terms of paraphrases (same semantic content, different wording) or entailments (one expression implied by the other). DAESO investigates the detection of semantic overlap between Dutch sentences and the exploitation of this knowledge in a range of NLP applications. For this purpose, tools will be developed for the automatic alignment and classification of semantic relations (between words, phrases and sentences) for Dutch, as well as for a Dutch text-to-text generation application which fuses related sentences into a single grammatical sentence, which may be a generalization, a specification or a reformulation of the input sentences. To facilitate development and testing of these tools, an annotated monolingual Dutch parallel/comparable corpus of 1M words will be developed, consisting of pairs of texts that express comparable information. The utility of the resources and tools will be demonstrated in the context of three applications: (1) question-answering systems (improved recall, more complete answers), (2) information extraction (improved recall), and (3) summarization (beyond extraction: sentence compression, sentence fusion, anaphora resolution).

Researcher(s)

Research team(s)

Project website

Literacy development in bilingual children: Evidence from French-English and French-Dutch Immersion programs. 01/06/2006 - 31/05/2008

Abstract

Literacy development in bilingual children: Evidence from French-English and French-Dutch Immersion programs.

Researcher(s)

Research team(s)

APRO: Annotation of gender agreement and of the pleonastic versus anaphoric use of pronouns in Dutch. 01/03/2006 - 31/12/2007

Abstract

This project focuses on two problems in Dutch pronominal anaphora resolution: the detection of the pleonastic and anaphoric use of pronouns and the detection of pronouns refering to the linguistic gender of their antecedent. This project aims at the annotation of Dutch text material with regard to these two specific problems. This newly annotated text material will be integrated in an existing system for reference resolution in Dutch.

Researcher(s)

  • Promotor: Hoste Veronique

Research team(s)

Computational Linguistics and Language and Speech Technology. 01/01/2006 - 31/12/2010

Abstract

CLIF is the Flemish organization for computational linguistics, language technology and speech technology. The goal of the association is to stimulate research cooperation among the groups and the development of tools en resources the development of which is impossible by individual participating groups.

Researcher(s)

Research team(s)

Speech and language acquisition in Dutch speaking children with different degrees of hearing: Hearing children and deaf children with a cochlear implant. 01/01/2006 - 31/12/2009

Abstract

The aim of this project is to investigate segmental, intrasyllabic and intersyllabic co-occurrence patterns in prelexical babbling, and the acquisition of phonological segments and patterns in the early lexical period. Longitudinal data of deaf children with a cochlear implant (implanted in the first/second year of life) will be compared with those of a hearing age matched cohort in order to establish if they develop language in the same sequence and according to the same patterns as hearing children, and whether the delay that older implanted children show in reaching language acquisition milestones, still exists for the very early implanted children.

Researcher(s)

Research team(s)

Acoustic phonetic analysis of the speech of very young children with a cochlear implant. 01/10/2005 - 30/09/2009

Abstract

The aim of this project is to investigate acoustic-phonetic characteristics of the speech of young congenitally deaf children who received a cochlear implant in their first year of life. In particular the acoustic characteristics of their babbling will be investigated in order to detect discrepancies with the babbling of hearing infants. In addition we will analyze spontaneous speech of these children at the age of six, and investigate whether it displays the typical characteristics of "deaf speech", and we will try to relate these characteristics to the infants' vocalizations in their first year of life.

Researcher(s)

Research team(s)

Variation in the pronounciation of Standard Dutch: schwa epenthesis in Flanders and The Netherlands. 01/10/2005 - 30/09/2008

Electropalatographic investigation of articulatory settings in geographically determined language varieties of Dutch. 01/10/2005 - 31/12/2007

Coreference Resolution for Extracting Answers. (STEVIN - COREA) 01/05/2005 - 31/10/2007

Abstract

Coreference resolution is a key ingredient for the automatic interpretation of text. It has been studied mainly from a linguistic perspective, with an emphasis on establishing potential antecedents for pronouns. Practical applications, such as Information Extraction (IE), summarization and Question Answering (QA), require accurate identification of coreference relations between noun phrases in general. Computational systems for assigning such relations automatically, require the availability of a sufficient amount of annotated data for training and testing. For Dutch, annotated data is scarce and coreference resolution systems are lacking. In COREA, a robust system for assigning such relations automatically will be developed, and we will investigate the effect of making coreference relations explicit on the accuracy of systems for IE and QA.

Researcher(s)

Research team(s)

Phonological segmentation processes in English- and Dutch-speaking kindergartners and beginning readers. The role of language and phonetic factors. 01/05/2005 - 31/12/2006

Abstract

This study examines how English- and Dutch-speaking prereaders segment speech at an unconscious level, more specifically which cohesion patterns they prefer to others. Important variables are language, phonetic characteristics of segments, letter knowledge and literacy. A further aim is to investigate whether individual differences in implicit segmentation processes are also reflected in the children's later early reading development.

Researcher(s)

  • Promotor: Geudens Astrid

Research team(s)

Morphosyntactic annotation of three Dutch child language databases. 01/05/2005 - 31/12/2006

Abstract

The goal of this project is to add morphosyntactic annotation to three child language databases: the Maarten database, the CLPF database and the CI database. We will add a high quality morphological coding, we will apply a consistent interpretation and annotation of filler syllables, and we will indicate all base NP's in the databases.

Researcher(s)

  • Promotor: Taelman Helena

Research team(s)

Exploitation of CGN annotation for portability to new information sources. 01/05/2005 - 31/12/2006

Lexical and morphosyntactic development in young children with a cochlear implant : A crosslinguistic study of Dutch and Hebrew. 01/01/2005 - 31/12/2008

Abstract

The first aim of this project is to study patterns of productive spoken language acquisition in children who received a CI early in their second year of life. The children's language acquisition will be compared with that of a matched group of normaly hearing (NH) chiljren. The second aim of the project is to study language acquisition crosslinguistically: language acquisition will be compared of chldren acquiring Dutch and Hebrew as their native language. In the language specific part of the project as well as in the crosslinguistic part, we will focus on the following aspects: -The study of early lexical and morphosyntactic development of children with a Cochear Implantation (inl)lantation age: between 1 ;0 and 1 ;06); -Comparison of CI children with normal hearing children of the sam age/level of language acquisition; -Comparison of CI children and NH children's development in two typologically different languages, viz. Dutch and Hebrew, which enables the testing of specific hypotheses concerning the mechanisms of language acquisition..

Researcher(s)

Research team(s)

The purpose and desirability of Dutch-Dutch subtitling of tv programmes in Flanders: an audience-focused investigation. 01/01/2005 - 31/12/2006

Abstract

This research project investigates a new development on Flemish television, i.e. the increasing occurrence of Dutch-Dutch subtitled programmes. It aims to investigate the desirability of this trend with respect to the way in which Flemish viewers experience their linguistic identities, that is, which 'Dutch' or 'Flemish' they consider to be their mother tongue, which variants are readily understood (and which are not), and which are experienced as 'foreign'.

Researcher(s)

Research team(s)

The link between implicit segmentation patterns and the development of explicit segmentation, reading, and writing skills. 01/10/2004 - 20/11/2007

Abstract

The longitudinal study examines how prereaders at an unconscious (implicit) and intentional (explicit) level and investigates whether individual differences in the early, implicit segmentation process are also reflected in the children's later development of explicit segmentation skills, early reading, and writing.

Researcher(s)

Research team(s)

Syntactic aspects of the impaired acquisition of determiners. 01/10/2004 - 30/09/2007

Abstract

The project aims at studying the developmental pattern of early morphosyntax in 3 groups of language-impaired children (children with SLI, classical hearing aids and a cochlear implant, CI) and to verify whether the results are related to the clinical characteristics of the children. We focus on one particular aspect of nominal syntax, i.e. the acquisition of determiners in SLI, HI and CI-children compared to a control group of normally developing hearing children. The following research questions will be addressed: (i) in which way does the acquisition of the determiner system in SLI-children differ from normally language developing children: is there a temporary or permanent delay in the projection of a syntactic D-level and if so, what is the cause for the delay?; (ii) does the syntactic development of CI-children surpass that of children who use conventional HA (cfr. Van den Broek 1998 contra Geers 2003 for speech perception and production)?; (iii) does the syntax of very early implanted CI-children develop at pace with that of a normal hearing control group or are there similarities with other language impairments which typically show grammatical deficits; (iv) which are the factors that positively influence the acquisition of determiner syntax in CI-children; (v) from a theory-internal point of view: is neurological maturation responsible for the projection of a D-position in syntax? Is it input-sensitive and therefore positively influenced by an increase in auditory perception?

Researcher(s)

Research team(s)

A constructivist analysis of 'fillers' in Dutch child language. 01/10/2004 - 30/09/2007

Abstract

Young children often insert 'fillers' in their first multiwordutterances: vocalizations that do not correspond to conventional words. For instance, it is hard to determine the meaning of the syllables [m] and [\] in utterance (a). Fillers often have the shape of a syllabic nasal or a schwa, as in utterances (a) and (b). But sometimes they consist of several syllables, as in utterance (c). (a) [m] pick ['] flowers (English learning boy, age 1;6; from Peters and Menn, 1993) (b) ['] oiseau ['] vole (Frensh learning girl, age 1; from Veneziano and Sinclair, 2000) (c) [lala] open door (English learning girl, age 1;10; from Feldman and Menn, 2003) Fillers typically occur at positions that are occupied by function morphemes in the adult language (like articles or pronouns). They are instantiations of an important language learning mechanism that has only recently been recognized as such: 'form-driven' learning. 'Form-driven' learning entails that the child first acquires the form, and gets full grips on the meaning and function of this form only later on. In other words, the child has discovered sound material at particular positions in the input, but has not yet analyzed the form and the function of this material accurately. Nevertheless, the child tries to integrate these elements in her own speech utterances. Little by little the child discovers the full distribution, function and shape of what turns out to be function morphemes. This learning mechanism contrasts with function-driven acquisition, as is proposed by nativist theories: morphosyntactic acquisition is interpreted as a self-unfolding plan of morphosyntactic functions that need to be stuffed with lexical material. Until now, fillers in Dutch child language have not yet been studied (except in the limited analysis of Wijnen et al., 1994). The aim of this research project is to investigate the role of fillers in the acquisition of Dutch, and to analyze the mechanism of 'form-driven' learning from a constructivist perspective on language acquisition.

Researcher(s)

Research team(s)

Semi-supervised learning of Information Extraction. 01/10/2004 - 31/12/2005

Abstract

Information Extraction (IE) is concerned with extracting relevant data from a collection of structured or semi-structured documents. Current systems are trained using annotated corpora that are expensive and difficult to obtain in real-life applications. Therefore in this project we want to focus on the development of IE systems using semi-supervised learning, a technique that makes use of a large collection of un-annotated and easily-available data.

Researcher(s)

Research team(s)

Situational Factors in Producing Inflected wordforms: a Psycholinguistic and Computational Approach. 01/01/2004 - 31/12/2007

Abstract

The production of inflected word forms like plural of past tenses is traditionally assumed to be a process that relies primarily on morphological, phonological and syntactical characteristics of the base form. Although descriptive grammars also mention metalinguistic factors in this context, they receive no attention in recent influential models of language production such as Steven Pinker's 1999 Words and Rules theory. However, in a recent experiment, we demonstrated that Dutch speakers do rely on metalinguistic information when producing plurals for Dutch pseudowords. Not only do these results undermine Pinker's assumption that Dutch has two default plurals that are applied solely on the basis of phonological information, but they also question whether models that have a rule-bases component are essentially capable of capturing metalinguistic information.

Researcher(s)

Research team(s)

Database of 14th century non literary Dutch texts. Construction and linguistic exploration. 01/01/2004 - 31/12/2007

Abstract

Researcher(s)

Research team(s)

PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning. 01/12/2003 - 29/02/2008

Abstract

Pattern Analysis, Statistical Modelling and Computational Learning. (PASCAL) The objective of this FP6 network of excellence is to build a Europe-wide Distributed Institute which will pioneer principled methods of pattern analysis, statistical modelling and computational learning as core enabling technologies for multimodal interfaces that are capable of natural and seamless interaction with and among human users. The role of CNTS in the network is the application of machine learning techniques to problems in natural language processing.

Researcher(s)

Research team(s)

Time and subjectivity: a cognitive and comparative inquiry into the conceptual status of aspect and tense categories in grammar 01/10/2003 - 30/09/2006

Abstract

This study investigates the relation between categories of tense (and certain manifestations of grammatical aspect) on one hand, and differences in the degree of 'subjectification' marking their semantic pole on the other. Besides their grammatical status as grounding predications, these categories are subject to additional processes of subjectification, operating on the products of grammaticalization and thus transcending the transformation of a lexical into a grammatical predication. They give rise to the development of nonreferential meanings for items containing distinct elements of temporal reference in their prototypical uses. The present study therefore concentrates on usage types that are removed from the description of objective relations in time, moving towards the expression of subjective concerns. It is anticipated that (clausal) grounding predications demonstrate subtle internal as well as external distinctions in subjectivity, and thus in semantic status. Despite the clear focus on English in the case studies that are proposed, these remarks should be construed as holding universally.

Researcher(s)

Research team(s)

Semantic activation in reading interlingual homographs. 01/10/2003 - 31/12/2005

Abstract

Dutch-English bilinguals are tested in English experiments to investigate to what extent they suppress their knowledge of Dutch while readling in English. The critical items are Dutch/English homographs (e.g. <step> meaning 'scooter' in Dutch) that are employed in a semantic priming paradigm to test whether their Dutch meaning is activated automatically. The participants are tested on single words and on complete sentences to study the effect of sentence context on lexical processing. Primes are presented visually or auditory. Control experiments are conducted with English Monolinguals.

Researcher(s)

  • Promotor: Martensen Heike

Research team(s)

Reduction Phenomena in present-day Standard Dutch in Flanders and the Netherlands. 01/10/2003 - 30/09/2005

Abstract

The aim of this project is the study of reduction phenomena in spontaneous (= non-read) Standard Dutch. Reduction is studied in mono-, bi- and trisyllabic words, especially in pronouns, suffixes and loan words. We use speech that is already collected, digitalized and transcribed for the Corpus Gesproken Nederlands (Spoken Dutch Corpus), and as a part of the VNC-project Variation in the pronunciation of Standard Dutch. The VNC-speech consists of interviews with teachers of Dutch. From the Corpus Gesproken Nederlands, three components are selected: speeches, (non-read) lectures and lessons from high school teachers (except for Dutch lessons). These three types of spontaneous speech are fully comparable: it is non-broadcast speech, produced by one speaker before an audience. A more specific aim of this project is to verify the claim that the pronunciation of highly educated speakers without linguistic training differs from the pronunciation of teachers of Dutch, who are often considered to be prototypical speakers of Standard Dutch. This project links up with the renewed interest in standard language, where variation patterns in Standard Dutch in Flanders and the Netherlands are studied from a perspective of convergence and divergence. This study is also in line with international research of variation in standard languages, e.g. in German (e.g. Germany, Austria, Switzerland) and in French (e.g. France, Canada, Belgium).

Researcher(s)

Research team(s)

The use of very large textcorpora in the automatic discovery of structure in natural language. 01/10/2003 - 28/02/2005

Abstract

Large repositories of language samples exist today. Some examples are the text on the internet, and texts and dictionaries in many languages. However, these corpora are not always used when examining language hypotheses, or fundamental language questions. This gap is in the process of being filled, and this research hopes to be part of this development. The general aim is to arrive at a better use of existing language technologies in order to test specific hypotheses about the structure and function of language and about language change and typology.

Researcher(s)

Research team(s)

The relevance of an onset-rime structure in implicit and explicit phonological awareness: A cross-linguistic study with English and Dutch speaking preschoolers and beginning readers. 01/03/2003 - 31/12/2005

Abstract

This study examines whether onset and rime are units in the child's developing phonological awareness. The onset-rime hypothesis is widely accepted bu mainly based on English research. Recent experiments in Dutch failed to support this hypothesis. To find out whether language differences account for this dissociation, a systematic cross-linguistic comparison will be conducted with English and Dutch preschoolers and first-graders. Tasks tapping into implicit and explicit phonological awareness will be used (e.g., recall task versus segmentation task).

Researcher(s)

Research team(s)

Are morphological representations in the mental lexicon modality-specific or modality-independent? An approach through masked cross-modal priming. 01/01/2003 - 31/12/2006

Abstract

The purpose of the current project proposal is to build on the existing knowledge from cross-modality effects in written and spoken word processing on the one hand and the priming literature on the other hand. There is a way to make one step forward if we can remove the shortcomings of intra-modal priming. Indeed, in the case of visual-visual-visual priming we cannot really address the issue of cross-modality integration as the phonological information is activated by the visually presented prime and not by an auditory stimulus presented to the participant. Using a different technique would allow us to better address the integration of information that is originally associated with different modalities (i.e., at stimulus input).

Researcher(s)

Research team(s)

Biological Text Mining (BioMinT). 01/01/2003 - 31/03/2006

Abstract

The goal of the BioMinT project is to develop a generic text mining tool that (1) interprets diverse types of query, (2) retrieves relevant documents from the biological literature, (3) extracts the required information, and (4) outputs the result as a database slot filler or as a structured report. The consortium consists of biologists (University of Manchester, Swiss Institute of Bioinformatics) and data/text mining groups (CNTS Antwerp, PharmaDM, Austrian research Institute for AI, University of Geneva AI Lab).

Researcher(s)

Research team(s)

Semi-supervised learning of Information Extraction. 01/01/2003 - 30/09/2004

Abstract

Information Extraction (IE) is concerned with extracting relevant data from a collection of structured or semi-structured documents. Current systems are trained using annotated corpora that are expensive and difficult to obtain in real-life applications. Therefore in this project we want to focus on the development of IE systems using semi-supervised learning, a technique that makes use of a large collection of un-annotated and easily-available data.

Researcher(s)

Research team(s)

FLaVoR : Flexible Large Vocabulary Recognition : Incorporating linguistic knowledge sources through a modular recogniser architecture. 01/10/2002 - 30/09/2006

Abstract

In this project we investigate whether the 'all-in-one' strategy currently used in speech recognizers, in which task-specific, syntactic, and lexical knowledge are fused into a single model based on simple formalisms, can be replaced by a modular architecture in which apart from acoustic-phonetic and intonational features, also generic and domain-specific linguistic information sources can be used.

Researcher(s)

Research team(s)

Functions of audiovisual prosody. 01/10/2002 - 30/09/2005

Abstract

This research proposal is concerned with a functional approach to verbal and visual prosody in spoken conversations. The problem to be addressed in the project is about the combined use of specific auditive cues (such as intonation, tempo, voice quality and pausing) and specific visual cues (such as facial expressions and specific body gestures) for marking different dialogue phenomena. First, we will explore how audiovisual prosody can be exploited to highlight the information status of words. Then, we will investigate how it can be used to signal whether or not the process of information exchange in a dialogue is going well. Next, we will explore how it can support the turn-taking mechanism in spontaneous interactions. Finally, we will see to what extent audiovisual prosody may reflect speakers' emotions and attitudes. The results of these different substudies will be integrated in one coherent, functional model of audiovisual prosody. All the questions will be tackled from the point of view of both the speaker and the listener, and from a crosslinguistic perspective. Insight into functional aspects of audiovisual prosody is relevant from both a theoretical and applied perspective. First, it is remarkable to observe that this important communicative device is still largely unexplored. Knowledge about how audiovisual prosody works may yield new insights into how people mark important words, deixis, turn-taking, discourse structure, etc. and more general into how languages can differ in the way they signal linguistic and paralinguistic phenomena. Second, there is an increasing interest in computer interfaces that rely on what is termed `embodied conversational agents', i.e., specific software components that appear to users as animated characters. To make these agents `believable' and `communicative', it is important to know in full detail how specific auditive and visual parameters contribute to speech communication.

Researcher(s)

Research team(s)

Children's acquisition of phonotactic and prosodic knowledge: an empirist, inductive alternative for current nativist, deductive approaches. 01/10/2002 - 30/09/2004

Abstract

Optimality Theory (OT) is the central paradigm in current theorizing about phonological acquisition. OT is a deductive model: (a priori) linguistic knowledge is represented in the child's linguistic (grammatical) competence. In this project we explore an empirist, inductive alternative for this approach. An empirist, inductive model is defined as a model in which the mental lexicon is central in acquisition. Linguistic knowledge is collected and stored in the lexicon. The contrast between grammatical system and lexicon will be developed in according to four core dimensions: 1. Rules versus analogy 2. Stages versus lexical diffusion 3. Minimal versus maximal role for input 4. Competence versus processing We focus on the acquisition of phonotactic and prosodic knowledge, because these two areas are often presented as examples of deductive acquisition.

Researcher(s)

Research team(s)

Multilingual subtitling of multimedia content (MUSA). 01/09/2002 - 28/02/2005

Abstract

MUSA aims at the creation of a multimodal multilingual system that converts audio streams into text transcriptions, translates the transcriptions in other languages and then generates subtitles from these translated transcriptions. MUSA will operate in English, French and Greek. A state-of-the-art Speech Recognition system will be enhanced and improved to meet the project settings. An innovative Machine Translation scenario will be designed that combines a Machine Translation engine with a Translation Memory and a Term Substitution module. The Antwerp group will be involved in sentence condensing for subtitle generation, performed by an automatic analysis of the linguistic structure of the sentence.

Researcher(s)

Research team(s)

Machine learning for data mining and its applications. 01/01/2002 - 31/12/2006

Abstract

The research community aims at strengthening and coordinating the Flemish research about machine learning for datamining in general, and important applications such as bio-informatics and textmining in particular. Flemish participants: Computational Modeling Lab (VUB), CNTS (UA), ESAT-SISTA (KU Leuven), DTAI (KU Leuven), ADReM (UA).

Researcher(s)

Research team(s)

Tonal dialects in Dutch : Structure, Perception and Function. 01/01/2002 - 31/12/2005

Abstract

This project investigates the phonetic and phonological nature of the Limburgian lexical tone distinction, its perceptibility, and its functioning in the interpretation of the information structure. Its aim is threefold. First, a broader data base, both phonological and phonetic, will be created by investigating two dialects in the Belgian province of Limburg, to complement existing Dutch data. Second, the variability in the phonetic salience of the tone contrast will be related to dialect's geographical proximity to nontonal dialect areas, to further our understanding of the nature of dialect contact and the erosion of the tone contrast during phonological change. Third, the extent to which the expression of the tone contrast depends on the expression of focus is to be investigated in two groups of dialects, a northern and a southern group, in order to establish the nature of the interaction between lexical tone distinctions and the possible expression of focus structure.

Researcher(s)

  • Promotor: Swerts Marc

Research team(s)

The interaction between phonology and orthography in the process of visual word recognition: does dependency cause unity? 01/01/2002 - 31/12/2005

Abstract

In most languages with an alphabetical writing system, the pronunciation of a word is not simply the sum of the pronunciations of all its letters. There are several cases where the pronunciation of one letter is determined by another letter. Compare, e.g. The Dutch words MOOT-MOET-MORT, in which the pronunciation of the letter O is determined by the following letter. Languages do differ in the extent that letters depend on each other for pronunciation. This research project is aimed to establish how such interdependencies between letters with respect to their pronunciation affect the processes in word-recognition. The innovating power of this approach is to place the dissociation of rime effects in English and Dutch in a broader perspective. If one letter's pronunciation is determined by another letter, how does this affect word-recognition? The onset-rime effects are only one specific manifestation of this more general question.

Researcher(s)

Research team(s)

Semaduct : combining deductive and inductive techniques for lexical semantics. 01/01/2002 - 31/12/2005

Abstract

Goal of the project is to confront and integrate deductive and inductive approaches to computational linguistics in the area of lexical semantics. Subprojects include the combination of supervised and unsupervised machine learning methods for semantic knowledge acquisition and disambiguation, the incorporation of linguistic semantic knowledge in inductive approaches, and the refinement of existing semantic tag sets with machine learning techniques.

Researcher(s)

Research team(s)

OntoBasis: Extraction of ontologies from text. 01/01/2002 - 31/12/2005

Abstract

The main goal of CNTS for this project is the application and adaptation of shallow parsing technology for (i) extraction of lexons (ontological relations from unstructured and semi-structured sources, (ii) evaluation of ontologies, and (iii) adaptation of ontologies (e.g. WordNet) to specific domains. A secondary goal is to investigate the use of ontologies to improve text analysis using shallow parsing.

Researcher(s)

Research team(s)

Incremental semantic processing of sentences: how do we arrive at specific interpretations? 01/10/2001 - 30/09/2004

Abstract

The goal of this proposal is to link notions from my own psycholinguistic research in semantic processing with the most recent linguistic theories in generative semantics. Eye-tracking experiments will be conducted that investigate linguistic principles that have been proposed to describe how enriched semantic interpretations are generated. This way, the Underspecification Model that I proposed for the processing of figurative language can be extended and refined. The ultimate aim is to arrive at a more general model of the on-line, incremental semantic processing of written texts.

Researcher(s)

Research team(s)

Psycholinguistics: processing and acquisition aspects of reading and spelling. 01/01/2001 - 31/12/2005

Abstract

The purpose of this scientific research network is to integrate the Flemish, Dutch, and international expertise in the study of (i) the acquisition of reading and spelling and (ii) the on-line processes in experienced readers and spellers. The central focus is the study of the reading and spelling of words (written word recognition and production), more particularly, the role of phonology and morphology and the importance of the way in which the spelling of the language represents these linguistic dimensions. Concrete goals are: the realisation of joint empirical work by several sub-teams of the research network (experiments, corpus analyses, simulation studies), more particularly within a cross-linguistic perspective, the exchange of expertise in the form of people and tools, and the organisation of workshops and one international conference.

Researcher(s)

Research team(s)

Psycholinguistics: Processing and Acquisition Processes of Reading and Spelling 01/01/2001 - 31/12/2005

Abstract

The purpose of this Scientific Research Community is to integrate the Flemish, Dutch, and international expertise in the study of (i) the acquisition of reading and spelling and (ii) the on-line processes in experienced readers and spellers. The central focus is the study of the reading and spelling of words (written word recognition and production), more particularly, the role of phonology and morphology and the importance of the way in which the spelling of the language represents these linguistic dimensions. Concrete goals are: the realisation of joint empirical work by several subteams of the Research Community (experiments, corpus analyses, simulation studies), more particularly within a cross-linguistic perspective, the exchange of expertise in the form of people and tools, and the organisation of workshops and one international conference.

Researcher(s)

Research team(s)

Text Analysis and Machine Learning for Prosody. 01/01/2001 - 31/12/2004

Abstract

The aim of the project is to perform empirical investigations to determine whether adequate prosody can be generated on the basis of two methods that have recently shown success in other language processing domains: (a) robust analysis of text by analyses and metrics from information retrieval and information extraction, and (b) advanced machine learning systems and meta learners.

Researcher(s)

Research team(s)

Language acquisition by children with cochlear implants: A longitudinal investigation 01/01/2001 - 31/12/2004

Abstract

In this project we study the auditory development, the speech and language acquisition in congenital deaf children with a cochlear implant (CI) implanted during their second year of life. Our aim is to systematically investigate the effect of the CI on different aspects of language and speech development: ? The effect of a CI on the auditory level; ? The effect of a CI on the articulatory level (the speech); ? The effect of a CI on language acquisition and communicative development. In essence, we want to investigate how access to the auditory information evolves and what impact that access to spoken language has on the child's own spontaneous speech and language. The scientific aims of the research proposal are (i) descriptive and (ii) fundamental. (i) Descriptive: a longitudinal description of the auditory development and speech-, language- and comminicative development after a CI. On the basis of this description we will be able to provide an answer to the following questions: Does language acquisition after a CI proceed in a qualitatively and/or quantitatively similar fashion as that in normal hearing babies? What is the level of spoken language development in CI-babies, as compared to normal hearing babies? Is there a qualitatively and/or quantitatively difference in the auditory development, speech- and language development between babies, depending on the age at which they receive a CI? (ii) Fundamental psycholinguistic aims: ? Study of the perception of segmental and supra-segmental characteristics of speech in relation to its production: ? Study of the phonological development on the segmental and suprasegmental level, focussing on the evolution of truncation patterns. ? Study of the lexical and morphosyntactic acquisition, focussing on the evolution of `function words' or closed class words with respect to open class words, an opposition related to perceptual salience. ? Study of communicative development, focussing on (1) the use and place of speech versus (conventional) signs, (2) the use of interactional means (attention seeking/fixing/'), (3) the magnitude and use of types of interaction turns by child and adult conversation partner.

Researcher(s)

Research team(s)

A computational psycholinguistic model of language acquisition. 01/10/2000 - 30/09/2012

Abstract

This project aims at developing a computational psycholinguistic model of children's primary language acquisition. Ultimately the model is meant to provide a computational psycholinguistic account of acquisition in a data-driven way, incorporating the structural aspects of input, the child's 'intake' of the input and the self-organizing mechanisms of the learner. The term 'computational psycholinguistic' is not only meant as a characterization of the type of theory to be developed, it also defines the methodology to be adopted: the acquisition of particular linguistic domains will be studied from a psycholinguistic perspective, viz. the investigation of child language corpora (and experimental testing of hypothesis), and from a computational perspective, viz. the use of artificial learning algorithms in simulations. Both methodologies will be implemented in an integrated fashion so as to maximize mutual informativeness and theoretical relevance. The relationship between the psycholinguistic and the computational perspective is twofold: (i) The articulation of a model of children's language acquisition in which structural aspects of the input language and the self-organizing mechanisms of the learner are related, acts as the unifying framework. (ii) Particular aspects of the acquisition of the phonology, lexicon and morphosyntax of Dutch will be studied both from a psycholinguistic and a computational perspective. Corpora will be used as primary data in psycholinguistic analyses and they will be used as input material for the artificial language learners. The performance of the latter can be evaluated using the actual acquisition patterns of the children studied.

Researcher(s)

Research team(s)

Atranos: automatic transcription and normalisation of speech 01/10/2000 - 30/09/2004

Abstract

The project aims at contributing to the development of better products for the automatic verbatim transcription of speech, and for the conversion of these transcriptions to a form that is better adapted to the needs of the end-user. One application which will be studied as a case study is the generation of subtitles for the benefit of hearing-impaired people. CNTS will investigate learning techniques for the transcription of out-of-vocabulary items, and statistical techniques for aligning and predicting subtitle text from transcriptions.

Researcher(s)

Research team(s)

Scientific research Community for Computational Linguistics and Language and Speech Technology 01/01/2000 - 31/12/2004

Abstract

The goal of this scientific research community (CLIF, Computational Linguistics in Flanders), is to bring together the academic research expertise on language and speech technology for Dutch present in Flanders. CLIF will promote and facilitate fundamental, multidisciplinary, and application-oriented research in this area and provide advice to users of language and speech technology.

Researcher(s)

Research team(s)

Corpus Spoken Dutch - Flemish part. 01/06/1998 - 30/11/2003

Abstract

The Dutch-Flemish project `Corpus Spoken Dutch' aims at collecting 10 million spoken words of present day (standard) Dutch. This corpus will have important technological applications since it will play an essential role in the development of automatic speech recognition, and in this way it will prove to be invaluable in safeguarding the position of Dutch as a (minority) language in multilingual Europe. The corpus will be important for other disciplines as well: lexicography, teaching, children's speech and language development, sociolinguistics, psycholinguistics, phonetics and phonology and conversational analysis.

Researcher(s)

Research team(s)

Computational psycholinguistics : natural and artificial language acquisition and processing. 01/01/1998 - 31/12/2003

Abstract

The issue of abstract representations in the domains of language acquisition and adult language processing is addressed in this project. Is it possible to learn a subdomain of language without prior linguistic knowledge in this domein '? Can one achieve the final learning stage (adult performance) without developing abstract representations ? A new methodology will be used to study these questions. The research will explicitly combine the techniques that are used in three separate disciplines: language acquisition research, psycholinguistics, and artificial intelligence. Whereas the former two take the real language learner/user as their object of study, the latter one studies the artificial language learner/user. Thus far artificial learning models have always been used to simulate effects observed in actual language use. Whereas simulation reveals the computational power of the learning system and suggests interesting hypotheses on the real language learner/user, it does not falsify hypotheses generated in, for instance, psycholinguistic work. In our research we want to use artificial language learners/users in a radically different way. Apart from having them simulate effects from real language use we want to isolate factors that affect the models behaviour and then study the effects of these same factors in psycholinguistic experiments and in language acquisition data. In case of a different outcome, the effects observed in real language users can then be used to adept the architecture of the artificial learning model and see whether its performance can eventually be matched to that of the language user. This method of relating the results from acquisition and psycholinguistic research to computational work and vice versa is essentially a heuristic for discovering properties of the representational architecture for language in the real language learner/user. This basic issue, and the methodology to study it, will be approached in two linguistic domains: phonology and inflectional morphology. In phonology, the linguistic representation of stress patterns, phonotactic restrictions, and syllable structure will be studied. In morphology, irregularity effects in the past tense forrnation in Dutch will be used to study the issue of the single-route versus dual-route architecture (i.e., rules for regular forms' a lexicon for the irregular ones). A study of the factors causing interference errors in the spelling of (highly regular) past tense forms in Dutch (regular forms affecting other regulars) will shed light on the issue.

Researcher(s)

Research team(s)