Extracting n word phrases in large texts

This is a summary of resources posted on [Corpora-List] early 2014

CMU-Cambridge Statistical Language Modeling toolkit

http://mi.eng.cam.ac.uk/~prc14/toolkit.html

Sketch Engine

http://www.sketchengine.co.uk/documentation/wiki/SkE/NGrams

Lawrence Anthony’s AntConc 

http://www.antlab.sci.waseda.ac.jp/software.html

kfNgram

http://www.kwicfinder.com/kfNgram/kfNgramHelp.html

Colibri

Software for the extraction of n-grams as well as patterns that are not consecutive (skipgrams). The software is written in C++ for speed and memory efficiency but comes with a Python binding for usage from Python script. It also has a standalone CLI tool that can do what you want.

https://github.com/proycon/colibri-core

http://proycon.github.io/colibri-core/doc/ f

Maarten van Gompel

GnuPG key: 0x1A31555C  XMPP: proycon@anaproy.nl

Extracting n word phrases in large texts

This is a summary of resources posted on [Corpora-List] early 2014

CMU-Cambridge Statistical Language Modeling toolkit

http://mi.eng.cam.ac.uk/~prc14/toolkit.html

Sketch Engine

http://www.sketchengine.co.uk/documentation/wiki/SkE/NGrams

Lawrence Anthony’s AntConc 

http://www.antlab.sci.waseda.ac.jp/software.html

kfNgram

http://www.kwicfinder.com/kfNgram/kfNgramHelp.html

Colibri

Software for the extraction of n-grams as well as patterns that are not consecutive (skipgrams). The software is written in C++ for speed and memory efficiency but comes with a Python binding for usage from Python script. It also has a standalone CLI tool that can do what you want.

https://github.com/proycon/colibri-core

http://proycon.github.io/colibri-core/doc/ f

Maarten van Gompel

GnuPG key: 0x1A31555C  XMPP: proycon@anaproy.nl

ICAME35: 2nd Call for Papers: deadline Dec 15


The Centre for Research in Applied Linguistics (CRAL) at the University of Nottingham is hosting the 35th ICAME conference.

The theme of the conference is: Corpus Linguistics, Context and Culture.

** Date: 30 April  to 4 May 2014.
Venue:  University of Nottingham, UK, University Park Campus.

The main conference will be opened with a talk by

Ronald Carter (University of Nottingham)

and there will be a wine reception sponsored by John Benjamins on the Wednesday night.

Keynote speakers:

Beatrix Busse (University of Heidelberg)

Susan Hunston (University of Birmingham)

Tony McEnery (University of Lancaster)

Ute Roemer (Georgia State University)

Wolfgang Teubert (University of Birmingham)

The conference aims to explore English corpus linguistics and its intersections with other fields, as well as its applications in a range of contexts of language use. We invite submissions of abstracts for papers, work-in-progress reports, posters and software demonstrations on any topic relevant to the conference theme.  Areas for submissions can include – but are not limited to:

– corpus and discourse analysis
– corpus linguistics and its theoretical implications
– diachronic corpus studies
– corpora and new media
– varieties of English
– contrastive linguistics
– sociolinguistics
– mixed methods approaches in corpus linguistics
– corpus stylistics
– corpora in English language education

** Deadline for the submission of abstracts: 15 Dec 2013

Please submit your abstract through the conference website.
Proposals for pre-conference workshops should be sent directly to the organizers at ICAME2014@nottingham.ac.uk

For more details please see the conference website:

http://www.nottingham.ac.uk/conference/fac-arts/english/icame-35/index.aspx

We are looking forward to seeing you in Nottingham in 2014.

The ICAME 35 Team
Michaela Mahlberg, Gavin Brookes, Kathy Conklin, Rachele De Felice, Dave Evans, Kat Gupta, Kevin Harvey, Tony Fisher, Lorenzo Mastropierro, Rebecca Peck, Ana Pellicer-Sánchez, Viola Wiegan

ICAME35: 2nd Call for Papers: deadline Dec 15


The Centre for Research in Applied Linguistics (CRAL) at the University of Nottingham is hosting the 35th ICAME conference.

The theme of the conference is: Corpus Linguistics, Context and Culture.

** Date: 30 April  to 4 May 2014.
Venue:  University of Nottingham, UK, University Park Campus.

The main conference will be opened with a talk by

Ronald Carter (University of Nottingham)

and there will be a wine reception sponsored by John Benjamins on the Wednesday night.

Keynote speakers:

Beatrix Busse (University of Heidelberg)

Susan Hunston (University of Birmingham)

Tony McEnery (University of Lancaster)

Ute Roemer (Georgia State University)

Wolfgang Teubert (University of Birmingham)

The conference aims to explore English corpus linguistics and its intersections with other fields, as well as its applications in a range of contexts of language use. We invite submissions of abstracts for papers, work-in-progress reports, posters and software demonstrations on any topic relevant to the conference theme.  Areas for submissions can include – but are not limited to:

– corpus and discourse analysis
– corpus linguistics and its theoretical implications
– diachronic corpus studies
– corpora and new media
– varieties of English
– contrastive linguistics
– sociolinguistics
– mixed methods approaches in corpus linguistics
– corpus stylistics
– corpora in English language education

** Deadline for the submission of abstracts: 15 Dec 2013

Please submit your abstract through the conference website.
Proposals for pre-conference workshops should be sent directly to the organizers at ICAME2014@nottingham.ac.uk

For more details please see the conference website:

http://www.nottingham.ac.uk/conference/fac-arts/english/icame-35/index.aspx

We are looking forward to seeing you in Nottingham in 2014.

The ICAME 35 Team
Michaela Mahlberg, Gavin Brookes, Kathy Conklin, Rachele De Felice, Dave Evans, Kat Gupta, Kevin Harvey, Tony Fisher, Lorenzo Mastropierro, Rebecca Peck, Ana Pellicer-Sánchez, Viola Wiegan