The Mind-Bending Grammars project combines biggish data analysis with historical and theoretical linguistics. The project team welcomes interns from:

  1. MA in Artificial Intelligence (MAI)
  2. MA in Linguistics or related (MAL)

Apart from the contents of the internship, interns will also experience what it is like to be part of a prestigious team-based research project.

Topics for MAI students

  • Build a classifier with GUI training interface to assign cleft sentences (a complex type of syntactic structure)
    • Starting point is existing training data provided by in-house researcher
    • MAI intern combines a similarity assessment algorithm (to be selected at the start of the internship), syntactic parsing, and user feedback to increase accuracy
    • Acquiring familiarity with English historical text corpora and how to process them (e.g., spelling normalization)
  • Building a genre classifier for EMMA, a large corpus of historical texts, starting from a gold standard. Targets are
    • Automatic assignment of most probable genre to untagged texts
    • Top 3 of most probable genres for in-between cases
    • Automatic identification of parts of texts (and their boundaries) that represent different genres (e.g. a biography may contain both narrative prose, letters, and diary fragments)
  • Refining a within-text language identifier to identify foreign language passages in English historical texts
    • Refining an existing algorithm for the detection of multi-word passages in Latin/French
    • This includes training the data on a contemporary corpus of Latin & French texts
    • Including other foreign languages in the algorithm (e.g., Welsh)
  • We are open to other topics related to automatic enrichment and annotation of corpus data

Topics for MAL students

  • Interns will be introduced into constructionist corpus linguistics, including
    • Annotation of one of the case studies
    • Support in the compilation of the corpus