SACODEYL stands for System Aided Compilation and Open Distribution of European Youth Language. It was an EU Socrates Minerva project running from 2005 to 2008, focused on creating an ICT based language learning platform built around spoken youth corpora. Its core idea was to collect authentic speech produced by European teenagers and make it usable for language teaching and learning.
SACODEYL focuses on spoken interviews with British, French, German, Italian, Lithuanian, Rumanian and Spanish teenagers between 13 and 18 years of age. The interview transcripts are stored in online corpora pedagogically annotated and enriched for language learning and teaching purposes.
In terms of academic lineage, SACODEYL sits within early European work on spoken pedagogic corpora and Data Driven Learning. Its value was that it moved corpus work closer to school language learning by using authentic teenage speech, multimodal data and classroom oriented annotation, rather than treating corpora only as research databases.
SACODEYL adopts a "small pedagogic corpus" approach. Each of the seven corpora — English, French, German, Italian, Lithuanian, Rumanian and Spanish — contains 20 to 25 video-recorded interviews of about 10 minutes each.
To ensure thematic comparability, a common set of interview questions was used, covering a wide range of topics including personal information, home and family, present and past living routines, hobbies and interests, holidays, school and education, job experiences, plans for the future, open discussion topics.
A SACODEYL corpus consists of orthographical interview transcripts in XML format. Each transcript is structured on the basis of short thematic sections and annotated with regard to pedagogically relevant characteristics, e.g. topic, grammatical and lexical properties, discourse markers and CEF level.
You can download the XML corpora here:
https://perezparedes.es/sacodeyl-xml-corpora/
A rationale for the annotation can be found in this paper:
This tool lets you explore authentic spoken English from British and American teenagers aged 13–18. All the interviews come from the SACODEYL corpus. Here is what each section does.
Type any word or phrase into the search box and click Search. The tool will show you every time that word appears in the corpus, with a few words of context on each side — this is called a concordance line.
- Case — tick this if you want the search to be case-sensitive (e.g. only find "I" not "i").
- Exact — tick this to find only the exact word. Untick it to find any word that contains your search (e.g. searching "go" would also find "going", "goes").
- ±5 / ±8 / ±12 — controls how many words of context you see on each side of the keyword.
- +15 button — on any concordance line, click +15 to expand it and see 15 more words of context. Click − to collapse it back.
- ✦ Explain this — click this after searching to generate a ready-made analysis prompt. The prompt is copied to your clipboard automatically. Then click "Open Claude.ai", paste with Ctrl+V (or Cmd+V), and Claude will group the concordance lines by topic and give you learner tips.
This shows you the most common words in the corpus, ranked from most frequent to least. For each word you can see how many times it appears and what percentage of the total corpus it makes up.
- The bar chart on the right gives you a quick visual sense of how frequent each word is relative to the top word.
- Click any word in the list and the tool will automatically search for it and show you its concordance lines in the KWIC tab.
After searching for a word, switch to this tab to see which words tend to appear near it. Words that appear frequently and consistently near your search word are called collocates.
- Left collocates — words that appear to the left of your search word.
- Right collocates — words that appear to the right.
- freq — how many times that collocate appeared near your word.
- MI (Mutual Information) — a score showing how strongly attracted two words are to each other. A higher MI means the words are more likely to appear together than by chance.
- Coll ±2 / ±3 / ±4 / ±5 — the window size: how many words away from the keyword to look for collocates.
- ✦ Explain this — generates a prompt for Claude.ai to group the collocates by meaning and give you learner tips.
After searching for a word, this tab shows you where in the corpus it appears — which interviews it occurs in, and whereabouts in each interview.
- Each row is one interview. The tick marks show you the position of each hit within that interview.
- A word spread evenly across many interviews has a high D score (close to 1.0), meaning it is used consistently by many speakers.
- A word concentrated in just one or two interviews has a low D score, meaning it may be a personal or topic-specific preference.
This is a unique feature of SACODEYL. The corpus has been hand-annotated by linguists — specific words and phrases have been tagged with grammatical, lexical, and topic labels. This tab lets you browse all those annotations.
- The tree is organised into six categories: Topics, Grammatical Features, Lexical Features, Textual Organisation, Variety / Style, and CEF Level.
- Click any category name with a blue badge to see the corpus sections tagged with that annotation. The tagged words are highlighted in yellow.
- Use the toggle buttons at the top of the results to show or hide specific annotation types within a section.
- Select sections using the checkboxes, then click ⬇ Download .txt to save them to your computer.
The ✦ Explain this button (found in the KWIC and Collocates tabs) connects the Corpus Explorer to Claude.ai, Anthropic's free AI assistant, for deeper analysis.
- Search for a word and click ✦ Explain this.
- A green message will appear: "Prompt copied to clipboard!"
- Click Open Claude.ai ↗ — this opens Claude.ai in a new tab.
- In the Claude.ai chat box, paste the prompt with Ctrl+V (Windows/Linux) or Cmd+V (Mac) and press Enter.
- Claude will analyse the concordance lines or collocates and give you a structured explanation with topic groups and learner tips.
You need a free Claude.ai account to use this feature. No payment is required.