Reading concordances is not a trivial task

The methodological transfer from the CL research area to the applied ring of language learning and teacher underwent no adaptation, and thus learners were presented with the same tools, corpora and analytical tasks as well-trained and professional linguists.

[…]

Reading concordances is, by no means, a trivial task. Sinclair (1991) recommends a complex procedure which involves five distinct stages. Let us review very briefly what they entail. The first stage is
that of initiation. Learners here will look to the left and to the right of the nodes and determine the dominant pattern. Then, learners are prompted to interpret and hypothesize about what it is that these
words have in common. Thirdly, the consolidation stage, where students are to corroborate their hypothesis by looking more closely at variations of their hypotheses. After this, these findings have to be reported and, finally a new round of observations starts. Although typically reduced in language classrooms, this procedure is common in the possibilities scenario and certainly characterises the so-called bottom-up approach (Mishan, 2004: 223). A recent analysis (Kreyer, 2008) deconstructs the idea of corpus competence in different skills, namely, interpreting corpus data, knowledge about corpus design, knowledge about resources in the Internet, some linguistic background, knowledge about how to use concordances and, finally, some corpus linguistics background. This is a positive effort in the
right direction as the author admits the need to create the conditions for the use of corpora in the language classroom or, in other words, the Kreyer recognizes that pedagogic mediation is necessary if we want to turn the corpus into a learning tool. Notwithstanding, the challenges are significant.

Pérez-Paredes, P. (2010). Corpus Linguistics and Language Education in Perspective: Appropriation and the Possibilities Scenario. In T. Harris & M. Moreno Jaén (Eds.), Corpus Linguistics in Language Teaching (pp. 53-73). Peter Lang.

Reading concordances is not a trivial task

The methodological transfer from the CL research area to the applied ring of language learning and teacher underwent no adaptation, and thus learners were presented with the same tools, corpora and analytical tasks as well-trained and professional linguists.

[…]

Reading concordances is, by no means, a trivial task. Sinclair (1991) recommends a complex procedure which involves five distinct stages. Let us review very briefly what they entail. The first stage is
that of initiation. Learners here will look to the left and to the right of the nodes and determine the dominant pattern. Then, learners are prompted to interpret and hypothesize about what it is that these
words have in common. Thirdly, the consolidation stage, where students are to corroborate their hypothesis by looking more closely at variations of their hypotheses. After this, these findings have to be reported and, finally a new round of observations starts. Although typically reduced in language classrooms, this procedure is common in the possibilities scenario and certainly characterises the so-called bottom-up approach (Mishan, 2004: 223). A recent analysis (Kreyer, 2008) deconstructs the idea of corpus competence in different skills, namely, interpreting corpus data, knowledge about corpus design, knowledge about resources in the Internet, some linguistic background, knowledge about how to use concordances and, finally, some corpus linguistics background. This is a positive effort in the
right direction as the author admits the need to create the conditions for the use of corpora in the language classroom or, in other words, the Kreyer recognizes that pedagogic mediation is necessary if we want to turn the corpus into a learning tool. Notwithstanding, the challenges are significant.

Pérez-Paredes, P. (2010). Corpus Linguistics and Language Education in Perspective: Appropriation and the Possibilities Scenario. In T. Harris & M. Moreno Jaén (Eds.), Corpus Linguistics in Language Teaching (pp. 53-73). Peter Lang.

Extracting n word phrases in large texts

This is a summary of resources posted on [Corpora-List] early 2014

CMU-Cambridge Statistical Language Modeling toolkit

http://mi.eng.cam.ac.uk/~prc14/toolkit.html

Sketch Engine

http://www.sketchengine.co.uk/documentation/wiki/SkE/NGrams

Lawrence Anthony’s AntConc 

http://www.antlab.sci.waseda.ac.jp/software.html

kfNgram

http://www.kwicfinder.com/kfNgram/kfNgramHelp.html

Colibri

Software for the extraction of n-grams as well as patterns that are not consecutive (skipgrams). The software is written in C++ for speed and memory efficiency but comes with a Python binding for usage from Python script. It also has a standalone CLI tool that can do what you want.

https://github.com/proycon/colibri-core

http://proycon.github.io/colibri-core/doc/ f

Maarten van Gompel

GnuPG key: 0x1A31555C  XMPP: proycon@anaproy.nl

Extracting n word phrases in large texts

This is a summary of resources posted on [Corpora-List] early 2014

CMU-Cambridge Statistical Language Modeling toolkit

http://mi.eng.cam.ac.uk/~prc14/toolkit.html

Sketch Engine

http://www.sketchengine.co.uk/documentation/wiki/SkE/NGrams

Lawrence Anthony’s AntConc 

http://www.antlab.sci.waseda.ac.jp/software.html

kfNgram

http://www.kwicfinder.com/kfNgram/kfNgramHelp.html

Colibri

Software for the extraction of n-grams as well as patterns that are not consecutive (skipgrams). The software is written in C++ for speed and memory efficiency but comes with a Python binding for usage from Python script. It also has a standalone CLI tool that can do what you want.

https://github.com/proycon/colibri-core

http://proycon.github.io/colibri-core/doc/ f

Maarten van Gompel

GnuPG key: 0x1A31555C  XMPP: proycon@anaproy.nl