Corpus of North American Spoken English (CoNASE)
The Corpus of North American Spoken English (CoNASE), a 1.25-billion-word corpus of geolocated automatic speech-to-text transcripts, is now available in a beta version. URL http://cc.oulu.fi/~scoats/CoNASE.html for more information. The corpus was created from 301,847 ASR transcripts from 2,572 YouTube channels, corresponding to 154,041 hours of video. The size of the corpus is 1,252,066,371 word tokens. … Read more