Title: "Complexity in complementation: understanding long-term change in verb complementation in terms of inter- and intra-individual variation"

Abstract: For many linguists – regardless of their theoretical framework – linguistic variation occurs at the level of speech communities. Some have even ventured to say that the individual is “reduced below the level of linguistic significance” (Labov 2012: 265) regarding language change. This view leaves important questions unanswerable: how/why does variation spread? How do individuals accommodate this change in their understanding/use of the language? To answer these questions, the behaviour of the individual must be studied in detail.

This paper aims at providing such a detailed study of sixty writers across three generations born between the 1650s and the 1850s. It provides an investigation of the changes in their use of the competing variants of finite/nonfinite complement clauses (CCs) with a select group of complement taking predicates (CTPs). An example of the variation at issue is given in (2)

(2)   a. They believed that the Bible was the word of God. (1821, CLMET) 

         b. They believed the Bible to be the word of God. (adapted from (2a)) 

         c. The Bible was believed to be the word of God. (adapted from (2a))

In this type of stable variation the newer variant coexists with the older variant, thus complementing the variationist literature focused on replacement of the older counterpart (e.g. Nevalainen et al. 2011). It has also been theorised that syntactic change often resides below the level of awareness (Labov 2001:28), making this an ideal case to study the role of cognitive representations and their flux due to the lowered influence of social variables. With this analysis of an unstudied type of syntactic change from a new perspective, we aim to add to Fonteyn & Nini’s (2020:18), usage-based model of Individual variation. Further, in studying stable variation we seek to contribute to a theory of language as a complex adaptive system (Beckner et al. 2009).

Data consist of over 500,000 words per individual, annotated for CCs featuring a selection of CTPs falling within two semantic clusters (private and public factual verbs). Each instance is coded on eight functional variables (semantic, structural and discourse). Multifactorial classification models (conditional inference tree and random forest algorithms) are then employed to determine which language-internal factors an individual uses to condition the variation in their linguistic output, and to compare the relative importance of the constraints across individuals. An important advantage of the proposed statistical methods is that they are robust even with a relatively small amount of data (Fonteyn & Nini 2020). To expand the scope of the project, near synonyms of the selected CTPs are identified using distributional models (Budts & Petré 2020) . A BERT (Devlin et al. 2019) model is then trained to semi-automatically identify the complementation patterns of these near-synonyms, and broadly assess to what extent the hypothesis holds that more closely related verbs have more similar complementation patterns, and for which individuals.


