Date: 29 January 2016

Venue: Annexe, Building R - Lange Winkelstraat - 2000 Antwerpen

Time: 3:00 PM - 4:30 PM

Organization / co-organization: CLiPS Research Center

Short description: CLiPS Colloquium by Gertjan Van Noord (Rijksuniversiteit Groningen)

Improving Automatically Parsed Dutch Treebanks (Gertjan Van Noord)

TITLE - Improving Automatically Parsed Dutch Treebanks
Prof. Gertjan Van Noord, Center for Language and Cognition, Rijksuniversiteit Groningen

ABSTRACT – In this presentation, we will describe the efforts (some in vain) to improve the available automatically parsed Lassy Large treebank. We describe some aspects of the Alpino parser, and some recent attempts at improving the parser. Alpino is a hybrid system in which a hand-written grammar and large dictionary is combined with a statistical disambiguation component. The disambiguation component uses co-occurrence information extracted from large treebanks for improved disambiguation accuracy. We describe a recent experiment to add word embedding features to the disambiguation component.

We further zoom in on the part-of-speech annotation layer of the existing Lassy Large treebanks, suggesting that the part-of-speech labels, originally provided by a separate POS-tagger, are of questionable quality. We analyse some of the reasons for this, and describe our efforts to provide part-of-speech labels as a side-effect of parsing, and we provide some initial experimental results indicating a huge potential improvement of POS-tagging accuracy using the parser as a tagger.

BIO - Gertjan Van Noord is professor of Language Technologies at the Rijksuniversiteit Groningen (RUG), where he has been working since 1999. He obtained his M.A. in General Linguistics at the University of Utrecht with a major in Computational Linguistics, the subject which he further pursued in a PhD at the same university focusing on Reversibility in Language Processing. During his PhD he also spent one year at the University of Saarland, in Saarbrücken, working on Bidirectional Linguistic Deduction. In 1990, he was one of the initiators of the CLIN meetings, which he contributed to shape and that have been promoting the study of Computational Linguistics in the Low Countries over the past 26 years. He supervised many PhD students and post-docs which now work in top universities worldwide. Among the many conferences and workshops he chaired and organized, in 2006 he chaired the European Chapter of the Association for Computational Linguistics (EACL), and in 2009 he was elected in the Executive Board of the Association for Computational Linguistics (ACL), becoming Vice-President Elect in 2012 and President in 2014.

