Categories
Academic discourse COCA corpora corpus linguistics text analysis text tools writing

Videos: Corpus Contemporary American English

This is a follow-up to our post Writing tools for researchers.

The basics

Using POS tags

Collocations

 

BNC & COCA Basic Query Syntax

COCA-basic-query-syntax

Categories
applied linguistics corpora corpus corpus linguistics data language analysis MAC Manipulating text resources software text analysis text tools

How to Batch Convert Text Files to Other Formats in Mac via the Terminal

terminalBatchtextutil

Source: www.maketecheasier.com

Categories
COCA corpora corpus linguistics

Full-text data for the two largest BYU corpora

I have received this through the CORPORA List:
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

At http://corpus.byu.edu/full-text/ you can now download full-text data for the two largest BYU corpora:

Corpus of Contemporary American English (COCA). 440 million words of downloadable text; the largest, most up-to-date, publicly-available corpus of English that is balanced for genre (spoken, fiction, magazine, newspaper, and academic).

The corpus of Global Web-Based English (GloWbE). 1.8 billion words of downloadable text; divided into groups from twenty different English-speaking countries (US, UK, Canada, Australia, India, etc). About 60% from blogs, for very informal language.

With this full-text data, you will have the actual corpora on your computer, and you can search the data in any way that you’d like. You can generate your own frequency data, collocates, n-grams, or concordance lines; you can search by word, lemma, and part of speech; and you can carry out complex syntactic and semantic searches offline. You can even modify the lexicon and sources tables to search the corpora in ways that are not possible via the standard web interfaces.

The data comes in three different formats (see samples): data for relational databases (info), word/lemma/PoS (vertical), and linear text (horizontal). When you purchase the data, you purchase the rights to any and all of these formats.

Categories
COCA corpora corpus linguistics

Full-text data for the two largest BYU corpora

I have received this through the CORPORA List:
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

At http://corpus.byu.edu/full-text/ you can now download full-text data for the two largest BYU corpora:

Corpus of Contemporary American English (COCA). 440 million words of downloadable text; the largest, most up-to-date, publicly-available corpus of English that is balanced for genre (spoken, fiction, magazine, newspaper, and academic).

The corpus of Global Web-Based English (GloWbE). 1.8 billion words of downloadable text; divided into groups from twenty different English-speaking countries (US, UK, Canada, Australia, India, etc). About 60% from blogs, for very informal language.

With this full-text data, you will have the actual corpora on your computer, and you can search the data in any way that you’d like. You can generate your own frequency data, collocates, n-grams, or concordance lines; you can search by word, lemma, and part of speech; and you can carry out complex syntactic and semantic searches offline. You can even modify the lexicon and sources tables to search the corpora in ways that are not possible via the standard web interfaces.

The data comes in three different formats (see samples): data for relational databases (info), word/lemma/PoS (vertical), and linear text (horizontal). When you purchase the data, you purchase the rights to any and all of these formats.