Transcription

Here you can find some useful resources to carry out your transcription project.

MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. 3rd Edition. Mahwah, NJ: Lawrence Erlbaum Associates.

Brian MacWhinney (2019) Tools for Analyzing Talk. Part 1: The CHAT Transcription Format. URL:
https://childes.talkbank.org/

Leech (2004): types of annotation

phonetic annotation e.g. adding information about how a word in a spoken corpus was pronounced.


prosodic annotation — again in a spoken corpus — adding information about prosodic features such as stress, intonation and pauses.

syntactic annotation —e.g. adding information about how a given sentence is parsed, in terms of syntactic analysis into such units such phrases and clauses

semantic annotation e.g. adding information about the semantic category of words — the noun cricket as a term for a sport and as a term for an insect belong to different semantic categories, although there is no difference in spelling or pronunciation.


pragmatic annotation e.g. adding information about the kinds of speech act (or dialogue act) that occur in a spoken dialogue — thus the utterance okay on different occasions may be an acknowledgement, a request for feedback, an acceptance, or a pragmatic marker initiating a new phase of discussion.
discourse annotation e.g. adding information about anaphoric links in a text, for example connecting the pronoun them and its antecedent the horses in: I’ll saddle the horses and bring them round. [an example from the Brown corpus]


stylistic annotation e.g. adding information about speech and thought presentation (direct speech, indirect speech, free indirect thought, etc.)
lexical annotation adding the identity of the lemma of each word form in a text — i.e. the base form of the word, such as would occur as its headword in a dictionary (e.g. lying has the lemma LIE).

Online services:

https://transcribe.wreally.com/

https://otranscribe.com/

BRAT: http://brat.nlplab.org/introduction.html

Backbone Transcriptor. URL

Gate: https://gate.ac.uk/teaching.html

Folia: https://proycon.github.io/folia/

Metadata for corpus work: http://users.ox.ac.uk/~lou/wip/metadata.html

Annotation on Sketch Engine: https://www.sketchengine.eu/guide/annotating-corpus-text/

TEI by example website: https://teibyexample.org/modules/TBED02v00.htm

Usage based in a nutshell (Ellis 2012)

UB Approaches some references

Usage-based theories of language hold that learners acquire constructions in a similar fashion—from the statistical abstraction of patterns of form-meaning correspondence in their usage experience—and that the acquisition of linguistic constructions can be understood in terms of the cognitive science of concept formation following the general associative principles of the induction of categories from experience of the features of their exemplars. In natural language, the Zipfian-type token-frequency distributions of the occupants of each of these construction islands, their prototypicality and generality of function in these use, roles and the reliability of mappings between these together conspire to make language learnable. Phrasal teddy bears, formulaic phrases with routine functional purposes, play a large part in this experience, and the analysis of their
components gives rise to abstract linguistic structure and creativity.
Is the notion of language acquisition being seeded by formulaic phrases and yet learner language being formula-light having your cake and eating it too?

Ellis, N. (2012). Formulaic Language and Second Language Acquisition: Zipf and the Phrasal Teddy Bear. 32, 17-44.

Five tenets that shape usage-inspired L2 instruction (Tyler & Ortega, 2018)

The folowing is from Tyler, A., Ortega, L. (2018) Usage-inspired L2 instructionAn emergent, researched pedagogy. In Tyler, A., Ortega, L., Uno, M., & Park, H. I. (eds). Usage-inspired L2 Instruction : Researched Pedagogy. Amsterdam: John Benjamins Publishing Company.

There is little question that learning language is one of the most complex accomplishments humans achieve. This is true for the first language learner and perhaps even more so for the second language learner.

There is no one, definitive usage-based model of language and language learning; rather a usage-based perspective encompasses a family of linguistic and language developmental approaches – including cognitive linguistics, emergentism, con-structionism, and complex dynamic systems theory. They are united by their em-phasis on the notion that actual language use is a primary shaper of linguistic form and the foundation for language learning.

Tenet 1: language learning is meaning based

A first usage-based tenet is that language and language learning are meaning based. The centrality of meaning in grammar is acknowledged in most contempo-rary thinking about communicative language teaching (Larsen-Freeman, 2012). However, in usage-based theories it is taken to a new radical level of theoretical commitment. First, contrary to the traditional axiom that the linguistic sign is arbitrary, in usage-based theories a large proportion of the connections between a form and its meaning are understood to be motivated. An example is the traditional position that lexical forms with more than one meaning are understood as unrelated homophones, whose many meanings simply have to be memorized. […] For second language learners, understanding that nearly all words have multiple meanings and that the meanings are systematically related into polysemy networks can provide powerful tools for learning vocabulary (e.g., Tyler, 2012). […] Second, not just words, but all units of grammar are said to be meaningful beyond the sum of the meanings of their parts (Langacker, 1991). For instance, syntactic patterns such as English ‘Noun-Verb-Noun-Noun’ (Homer gave Bart a puppy) are seen not to receive their meaning from the verb, but to convey the abstract constructional meaning ‘Someone Causes Someone to Receive Something,’ with several extended senses organized around this central meaning in a polysemy network (Goldberg, 1995). […] At the broadest level, the centrality of meaning tenet posits that linguistic structure cannot be fully understood if isolated from the study of how language is employed to create meaning.

Tenet 2: meaning is embodied

A second usage-based tenet posits that meaning is grounded in the physical world and is embodied (Barsalou, 2016) and therefore language and language learn-ing are too. Namely, basic human interactions with the physical world provide a foundation for human conceptual and cognitive representations, which are in turn reflected in language. To illustrate, across languages, vision verbs (look at, see) are more frequent than other sensing verbs (of hearing, touching, tasting, or smelling) because sight is the dominant human sense and human cognition orients univer-sally to visual phenomena, for example engaging brain activity for up to 50% of the cortex (San Roque et al., 2015). Another oft-cited illustration is that humans’ physical experience of upright stance and gravity shapes metaphors pervasive in everyday language involving the two orientations ‘up’ and ‘down’ as positive and negative, respectively (e.g., keep up the good work!, I feel down) (Lakoff & Johnson, 1980). In the usage-based family of theories, cognitive linguistics (CL) has made the deepest commitment to experientially grounded and embodied meaning (e.g., Lakoff & Johnson, 1980; Langacker, 1991)

Tenet 3: language learning is contextualized interaction

The third usage-based tenet is that language and language learning are critically situated in contextualized social interactions. Actual language use is culturally, socially, and contextually embedded, because all usage events are tied to particular speech communities. Natural language always occurs in context, and the user’s choices in crafting an utterance are influenced by an array of contextual factors. Context itself is complex and multidimensional and gives rise to subtle, interacting linguistic reflexes. For instance, all usage-based models have recognized the audience or the participants in an interaction as a major aspect of context, be it in relation to genre (Martin & Rose, 2008), listener expectations (Gumperz, 1982; Tyler, 1994a, 1994b, 2012), or ground, a technical term in CL that posits participants make mental contact by coorienting to a shared construal in which one concept, the ground, is anchor for another concept, the figure (Langacker 1991; Taylor, 2002). Each syntactic pattern or construction is analyzed as serving to present a particular perspective or speaker stance. While the notion of the importance of context in language production and interpretation is, of course, not unique to usage-based approaches, the search for linguistic reflexes of context and the view of syntax as constructional templates replete with pragmatic information are unique technical operationalizations of the tenet, beyond just asserting that context is important. Speakers craft their message by choosing from an array of subtle resources gleaned from the surrounding discourse community. Subtle changes in the relationship between the speaker and the audience result in changes in the speaker’s language choices and, conversely, subtle changes in language choices can change the rela-tionship between the speaker and the audience. For multilingual as for monolin-gual users, creating (and learning) language is a social, purpose-driven endeavor (Douglas Fir Group, 2016)

Tenet 4: language learning emerged from domain-general mechanisms

Language and language learning emerge from the same general cognitive mechanisms involved in all aspects of learning, driven by various aspects of input, particularly frequency. In the usage-based family of theories, constructionism, CL, and emergentism have made the deepest commit-ment to these general cognitive mechanisms (e.g., pattern finding, abstraction, induction, schematization) and to frequency-driven statistical learning from the input (Saffran, 2003). As Nick Ellis and his colleagues have shown (Ellis, Römer, & O’Donnell, 2016), statistical learning constrains all language learning (including the learning of second languages), because humans are delicately sensitive to the frequencies and contexts in which they have encountered linguistic units. Much of language learning is thought to take place implicitly, and implicit and incidental learning are considered to represent a substantial portion of language learning in children as much as in adults. For this reason, usage-based approaches can often be assumed to accord little theoretical status to explicit learning.

Tenet 5: language learning is open to variability

The fifth and final usage-based tenet we submit for consideration is that language and language learning are open to variability and change all throughout the life span. Nonlinearity and variability have always been acknowledged in interlanguage theory. A usage-based perspective goes further by questioning the assumption, as do Dąbrowska (2012) for L1 and Larsen-Freeman (2006) for L2, that certain aspects of the language are categorically acquired without variation by L1 users, on the one hand, and variably and perhaps impossibly learned by L2 users, on the other. Moreover, also under scrutiny is the notion of developmental sequences that are valid for all learners and learning trajectories (Lowie & Verspoor, 2015). Fine-tuned, corpus based exploration of constructions within a language help reveal subtle, regular variation in grammatical patterns, such as articulation or omission of that complementizers under particular, systematic conditions (e.g., Wulff, Lester, & Martinez-Garcia, 2014) and many other phenomena (Ellis, Römer, & O’Donnell, 2016). Studies show that as they advance in proficiency L2 learners change their production from patterns that more closely match those of the L1 to those they hear with sufficient frequency in the meaningful surrounding linguistic ambiance. Even at highly advanced levels, learners continue to be sensitive to the frequencies with which they hear patterns in the target language and implicitly adjust their production accordingly. Further, as no linguistic unit is ever produced exactly the same in exactly the same context, the input itself is constantly variable. Thus, language learning is ever open to change and variation. Since lan-guage is thought to be inseparable from the users and the usage events that bring it about, as long as there is use, there can be learning (Larsen-Freeman, 2006). This is true all along the life span, and for all the languages and language varieties of multilingual and monolingual users.

Tyler, A., Ortega, L. (2018) Usage-inspired L2 instructionAn emergent, researched pedagogy. In Tyler, A., Ortega, L., Uno, M., & Park, H. I. (eds). Usage-inspired L2 Instruction : Researched Pedagogy. Amsterdam: John Benjamins Publishing Company.