Corpus of North American Spoken English (CoNASE)

The Corpus of North American Spoken English (CoNASE), a 1.25-billion-word corpus of geolocated automatic speech-to-text transcripts, is now available in a beta version. URL http://cc.oulu.fi/~scoats/CoNASE.html for more information. The corpus was created from 301,847 ASR transcripts from 2,572 YouTube channels, corresponding to 154,041 hours of video. The size of the corpus is 1,252,066,371 word tokens. … Read more

TAALES 2.2 is out : automatic analysis of lexical sophistication, Windows and Mac

From the TAALES website: Kyle, K. & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly 49(4), pp. 757-786. doi: 10.1002/tesq.194 TAALES is a tool that measures over 400 classic and new indices of lexical sophistication, and includes indices related to a wide range of sub-constructs. TAALES indices have … Read more

The Conference on #NLP KONVES new deadline

KONVENS 2016 http://www.linguistics.rub.de/konvens16/ The Conference on Natural Language Processing (“Konferenz zur Verarbeitung natürlicher Sprache”, KONVENS) aims at offering a broad perspective on current research and developments within the interdisciplinary field of natural language processing. It allows researchers from all disciplines relevant to this field of research to present their work. The conference will take place … Read more

CFP Language and the new (instant) media

2016 PLIN Day, hosted by the Linguistics Research Unit of UCLouvain in Belgium. After last year’s successful edition on Lexical complexity, this year’s topic is ‘Language and the new (instant) media’. The PLIN Day will take place on 12 May 2016 in Louvain-la-Neuve. More information and registration (free for all Belgian participants) The main objective … Read more

Adam Kilgarriff: a selection of papers and talks

Some readings to remember one of the most indisputably influential corpus linguists in the 20 and 21st centuries. Using corpora for language research https://www.sketchengine.co.uk/documentation/attachment/wiki/AK/Papers/SkE_for_lingResearch2013.ppt?format=raw Googleology is bad science http://www.kilgarriff.co.uk/Publications/2007-K-CL-Googleology.pdf Grammar is to meaning as the law is to good behaviour. Corpus Linguistics and Linguistic Theory 3 (2): 195-198. http://www.kilgarriff.co.uk/Publications/2007-K-CLLT-grammarlaw.doc