This is a summary of resources posted on [Corpora-List] early 2014
CMU-Cambridge Statistical Language Modeling toolkit
http://mi.eng.cam.ac.uk/~prc14/toolkit.html
Sketch Engine
http://www.sketchengine.co.uk/documentation/wiki/SkE/NGrams
Lawrence Anthony’s AntConc
http://www.antlab.sci.waseda.ac.jp/software.html
kfNgram
http://www.kwicfinder.com/kfNgram/kfNgramHelp.html
Colibri
Software for the extraction of n-grams as well as patterns that are not consecutive (skipgrams). The software is written in C++ for speed and memory efficiency but comes with a Python binding for usage from Python script. It also has a standalone CLI tool that can do what you want.
https://github.com/proycon/colibri-core
http://proycon.github.io/colibri-core/doc/ f
Maarten van Gompel
GnuPG key: 0x1A31555C XMPP: proycon@anaproy.nl