Improving Automatically Parsed Dutch Treebanks
29 January 2016
Annexe, Building R - Lange Winkelstraat - 2000 Antwerpen
3:00 PM - 4:30 PM
Organization / co-organization:
CLiPS Research Center
CLiPS Colloquium by Gertjan Van Noord (Rijksuniversiteit Groningen)
Improving Automatically Parsed Dutch Treebanks (Gertjan Van Noord)
We are very pleased to invite you to the next CLiPS Colloquium.
On January 29, Gertjan Van Noord will present his work on how to leverage recent advances in Natural Language Processing to improve automatically parsed Dutch Treebanks. Below you can find more detailed information about the content of the talk and the speaker. We are looking forward to welcoming you on this occasion and are sure that it will be a great opportunity to discover the recent perspectives on a traditional problem of computational linguistics, presented by one of the leading scholars in the computational linguistic community of the Low Countries.
We ask you to confirm your participation at this link:
Date: Friday 29 January 2016, 15:00
Location: Annexe, Building R, Stadscampus; Rodestraat, 14 - Antwerpen
TITLE - Improving Automatically Parsed Dutch Treebanks
Prof. Gertjan Van Noord, Center for Language and Cognition, Rijksuniversiteit Groningen
ABSTRACT – In this presentation, we will describe the efforts (some in vain) to improve the available automatically parsed Lassy Large treebank. We describe some aspects of the Alpino parser, and some recent attempts at improving the parser. Alpino is a hybrid system in which a hand-written grammar and large dictionary is combined with a statistical disambiguation component. The disambiguation component uses co-occurrence information extracted from large treebanks for improved disambiguation accuracy. We describe a recent experiment to add word embedding features to the disambiguation component.
We further zoom in on the part-of-speech annotation layer of the existing Lassy Large treebanks, suggesting that the part-of-speech labels, originally provided by a separate POS-tagger, are of questionable quality. We analyse some of the reasons for this, and describe our efforts to provide part-of-speech labels as a side-effect of parsing, and we provide some initial experimental results indicating a huge potential improvement of POS-tagging accuracy using the parser as a tagger.
BIO - Gertjan Van Noord is professor of Language Technologies at the Rijksuniversiteit Groningen (RUG), where he has been working since 1999. He obtained his M.A. in General Linguistics at the University of Utrecht with a major in Computational Linguistics, the subject which he further pursued in a PhD at the same university focusing on Reversibility in Language Processing. During his PhD he also spent one year at the University of Saarland, in Saarbrücken, working on Bidirectional Linguistic Deduction. In 1990, he was one of the initiators of the CLIN meetings, which he contributed to shape and that have been promoting the study of Computational Linguistics in the Low Countries over the past 26 years. He supervised many PhD students and post-docs which now work in top universities worldwide. Among the many conferences and workshops he chaired and organized, in 2006 he chaired the European Chapter of the Association for Computational Linguistics (EACL), and in 2009 he was elected in the Executive Board of the Association for Computational Linguistics (ACL), becoming Vice-President Elect in 2012 and President in 2014.
Walter Daelemans, CLiPS Research director
Giovanni Cassani, organizer of CLiPS Colloquia
Contact email: email@example.com