Categories
CFP

Corpus linguistics in the south 13: presentations #cls13

 

img_20161126_103041

Corpus Linguistics on the South 13

The Hillary Clinton emails: corpus linguistics meets the real world

Rachele de Felice, University College London

Link

AntFileConverter: Pdf to plain text

Redactions. Materially annoying, so much is deleted that POS tagging becomes problematic.

Lots of repeated materials because of e-mail chains.

What about names and entities? H, HC, Hillary, MS is the same referent.

Which device was used? iPad, desktop, phone?

What about times?

Unique Ids for everything (G. Garretson): each e-mail given unique id

 

Grain and scale: Looking at small data sets in broader sociocultural contexts

Colleen Cotter, Lisa McEntee-Atalianis and Danniella Samos
Queen Mary University of London and (LMA) Birkbeck, University of London

Tracing the migrant voice in UK media

Voice: linguistic construction of social personae (Keane 2003: 268)

They looked at quotations.

Jan-Dec 2015; migrants & Calais; Lexis Nexis

Investigating obesity

Corpus linguistics and news representations: a corpus-assisted framing analysis of mental health and arts participation messages in the British press
Dimitrinka Atanasova and Nelya Koteyko, Queen Mary University of London

Framing analysis: QUAL Framing is selecting aspects of reality and make them more salient (Entman 1993)

Keywords play a role in both framing analysis and corpus linguistics

Recovery was the most prominent frame

CL gives more opportunities to look at a wider range of articles (i.e. local newspapers cater for smaller communities).

 

 

 

 

 

 

Categories
corpora corpus linguistics learner corpus learner language text analysis text tools Tools

Graphic Online Language Diagnostic

 

Graph-Magnifier-icon

The Graphic Online Language Diagnostic (“GOLD”) is a corpus tool that allows language educators to submit and analyze language data. GOLD was developed by the Center for Advanced Language Proficiency Education and Research (“CALPER”) at The Pennsylvania State University (“PSU”), University Park, PA, USA under a grant from the U.S. Department of Education (Title VI, P229A060003 and P229A020010).

Link here: http://gold.gwserver1.net

Categories
Sketch Engine software text tools Traducción

Some resources for the building of parallel corpora

 

parallel_

Michael Barlow’s site (Link).

European Parallel Corpus (Link)

OPUS (Link)

Parallel corpora in Sketch Engine (Link)

AntPConc software (Link)

 

Categories
analysis of language applied linguistics automated tools computational linguistics English English Language Knowledge engineering language analysis linguistics text analysis text tools vocabulary

TAALES 2.2 is out : automatic analysis of lexical sophistication, Windows and Mac

From the TAALES website:

Kyle, K. & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly 49(4), pp. 757-786. doi: 10.1002/tesq.194

TAALES is a tool that measures over 400 classic and new indices of lexical sophistication, and includes indices related to a wide range of sub-constructs. TAALES indices have been used to inform models of second language (L2) speaking proficiency, first language (L1) and L2 writing proficiency, spoken and written lexical proficiency, genre differences, and satirical language.

Starting with version 2.2, TAALES provides comprehensive index diagnostics, including text-level coverage output (i.e., the percent of words/bigrams/trigrams in a text covered by the index) AND individual word/bigram/trigram index coverage information.

TAALES takes plain text files as input (it will process all plain text files in a particular folder) and produces a comma separated values (.csv) spreadsheet that is easily read by any spreadsheet software.

 

You can find all the info here. Windows and Mac versions available for free.