Categories
AELINCO analysis of language corpus linguistics

Corpora: a superfluous ceremony

Bergenholtz, Henning (University of Aarhus, Denmark):

A corpus analysis is a superfluous ceremony and a complete waste of your time and the government’s money.

la foto (3)

 

This presentation is an interesting review of attitudes towards corpus-based analyses of language.

 

Categories
AELINCO

Big data and corpus linguistics

AELINCO 2015 Conference, U. Valladolid, Spain

la foto

Andre Hardie Keynote 

What follows is my own notes and understanding of Hardie’s keynote.

How big is big data?

N= ALL?

Non manual curation of the database

Must be mined or statistically summarised (manual not posssible)

Pattern finding: trend modelling, data mining & machine learning

Language big data: Google n-gram

A revolutionary change for language and linguistics?

Textual big data studies sone by non-linguistic specialists

Limitations of Google when used with no language training

Michel et al. Quantitative analysis if Culture. Science 331 (2011). Culturomics. What is there?

Quantitative findings, otherwise pretty predictable and very much frequency counts. In actual fact, the study was not backed by any expert in corpus linguistics. Steven Pinker was involved in the paper and the whole thing was treated as if they invented the wheel.

Borin et al. papers trying to “salvage” the whole cultoromics movement from its ignorance.

New “happiness” analyses are trendy, but what do they have to offer? Lots of problems attached and shortcomings.  I think that corpus analysis is becoming mainstream and it is more visible in specialized journal. The price of fame?

Linguistically risibly naive research done by non-linguists

la foto (1)

 

Paul Rayson keynote

Larger corpora available from Brown in the 1960’s

Mura Nava’s resource. An interesting timeline of corpus analysis tools.

SAMUELS : Semantic Annotation and mark-up for enhancing lexical searches

Overcoming problems when doing textual analysis: fused forms, archaic forms, apostrophe, and many many others…. Searching for words is a challenge > frequencies split by multiple spellings.

VARD

USAS semantic tagger

Full text tagging (as opposed to trends in “textual big data” analysis).

Modern & historical taggers

Disambiguation methods are essential

Paul discusses the Historical Thesaurus of English

The whole annotation system:

la foto (2)

 

I guess this is the missing part in big data as practised by non-linguists.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Categories
applied linguistics English Language language language analysis

Everyday Expressions Borrowed From the Bible

Read the entry on mentalfloss.com

 

Categories
Art Painting

Burne-Jones

 

Two of my favourite paintings by Edward Burne -Jones.

The Hours

Burne_Jone_The_Hours

 

The wheel of fortune

304px-Edward_Burne-Jones_-_The_Wheel_of_Fortune