corpus Archives - Prof Pérez-Paredes

Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective

08/03/202306/03/2023 by perezparedes

Abstract This research explores the POS-tag sequences that shape the transition from upper intermediate (B2 CEFR) to near-native proficiency (C2 CEFR) in a corpus of essays (n=32,410) from the Cambridge Learner Corpus. Gilquin (2018) and others have shown that POS tag sequences offer a holistic approach to extracting the most commonly used patterns without a … Read more

Structural equation modelling & corpus linguistics

23/10/2022 by perezparedes

With Tove Larsson and Gregory Hancock Chech this out: Larsson, T., Plonsky, L., & Hancock, G. R. (2021). On the benefits of structural equation modeling for corpus linguists. Corpus linguistics and linguistic theory, 17(3), 683-714.

John Sinclair and language theory

22/09/2022 by perezparedes

The following is an extract form Hunston (2022, p. 256). Hunston, S. (2022). Corpora in applied linguistics. Cambridge University Press. Sinclair made a number of generalisations in the 1980s (Sinclair 1991, 2004; see also Francis 1993; Hoey 2005; Hunston 2002; Stubbs 2001) which might be summarised as follows: • In describing the meanings of a word, … Read more

Corpus of North American Spoken English (CoNASE)

12/08/2021 by perezparedes

The Corpus of North American Spoken English (CoNASE), a 1.25-billion-word corpus of geolocated automatic speech-to-text transcripts, is now available in a beta version. URL http://cc.oulu.fi/~scoats/CoNASE.html for more information. The corpus was created from 301,847 ASR transcripts from 2,572 YouTube channels, corresponding to 154,041 hours of video. The size of the corpus is 1,252,066,371 word tokens. … Read more

Incorporating corpora in teaching symposium, Mittuniversitetet, Sweden

21/10/2020 by perezparedes

Check out the programme here.

5 recent papers on language complexity and learner language

04/06/202004/06/2020 by perezparedes

Bulté, B., & Roothooft, H. (2020). Investigating the interrelationship between rated L2 proficiency and linguistic complexity in L2 speech. System, 102246. Abstract This study investigates the relationship between nine quantitative measures of L2 speech complexity and subjectively rated L2 proficiency by comparing the oral productions of English L2 learners at five IELTS proficiency levels. We carry … Read more