Measurements of Grammaticalization: Developing a quantitative index for the study of grammatical change

Date: 7 June 2019

Venue: Université de Neuchâtel - Espace Louis-Agassiz 1 - CH-2000 Neuchâtel, Switzerland

PhD candidate: David Correia Saavedra

Principal investigator: Martin Hilpert, Peter Petré

Short description: PhD defence David Correia Saavedra - Faculty of Arts


There is a broad consensus that grammaticalization is a process that is gradual and largely unidirectional: lexical elements acquire grammatical functions, and grammatical elements can undergo further grammaticalization (Hopper and Traugott 2003). This consensus is based on substantial cross-linguistic evidence. While most existing studies present qualitative evidence, this dissertation discusses a quantitative approach that tries to measure degrees of grammaticalization using corpus-based variables, thereby complementing existing qualitative work. The proposed measurement is calculated on the basis of several parameters that are known to play a role in grammaticalization (Lehmann 2002, Hopper 1991), such as token frequency, phonological length, collocational diversity, colligate diversity, and dispersion. These variables are used in a binary logistic regression model which can assign a score to a given linguistic item that reflects its degree of grammaticalization.

Grammaticalization can be conceived in synchrony and in diachrony. The synchronic view of grammaticalization is concerned with the fact that some items are more grammaticalized than others, which is commonly referred to as gradience. The diachronic view is concerned with the development of grammatical elements over time, whereby elements become increasingly more grammatical through small incremental steps, which is also known as gradualness. This dissertation proposes studies that deal with each of these views.

In order to quantify the gradience of grammaticalization, data from the British National Corpus is used. 264 lexical and 264 grammatical elements are selected in order to train a binary logistic regression model. This model can rank these items and determine which ones are more lexical or more grammatical. The results indicate that the model makes successful predictions overall. In addition, generalizations regarding grammaticalization can be supported, such as the relevance of key variables (e.g. token frequency, diversity to the left of a given item) or the ranking of morphosyntactic categories as a whole (e.g. adverbs are on average in between the lexical and grammatical categories).

The gradualness of grammaticalization is investigated using a selection of twenty elements that have grammatical and lexical counterparts in English (e.g. keep). The Corpus of Historical American English (1810s-2000s) is used to retrieve the relevant data. The aim is to check how the different variables and the grammaticalization scores develop over time. The main theoretical value of this approach is that it can offer an empirically operationalized way of measuring unidirectionality in grammaticalization, as opposed to a more qualitative observation based on individual case studies.