SyntagNet

This information from corpora-bounces@uib.no

SyntagNet, a resource with 88,000 lexical-semantic combinations, is now out!

We are proud to announce that SyntagNet 1.0 (http://syntagnet.org) is now available for download at http://syntagnet.org/download. Developed at the Sapienza NLP group (http://nlp.uniroma1.it), the multilingual Natural Language Processing group at the Sapienza University of Rome, SyntagNet is a manually-curated large-scale lexical-semantic combination database which associates pairs of concepts with pairs of co-occurring words. The goal of SyntagNet is to capture sense distinctions evoked by syntagmatic relations (e.g. mouse.n.1 and squeak.v.1 vs mouse.n.2 and click.n.4), hence providing information which complements the essentially paradigmatic knowledge shared by currently available Lexical Knowledge Bases such as WordNet. Its main features are:

  • Wide coverage, with 78,000 noun-verb and noun-noun lexical combinations extracted from the English Wikipedia and the British National Corpus.
  • High-quality, fully manual disambiguation for all of the lexical combinations, according to the WordNet 3.0 sense inventory.
  • A resulting Lexical Knowledge Base made up of 88,019 semantic combinations linking 20,626 WordNet 3.0 unique synsets with a relation edge.
  • A user-friendly web interface for looking up terms and their lexical-semantic combinations, with complete linkage to BabelNet 4.0.

And much more! Please check out our EMNLP 2019 paper:

M. Maru, F. Scozzafava, F. Martelli, R. Navigli. SyntagNet: Challenging Supervised Word Sense Disambiguation with Lexical-Semantic Combinations, Proc. of EMNLP-IJCNLP 2019

or http://syntagnet.org for more details!

SyntagRank, a state-of-the-art knowledge-based Word Sense Disambiguation system which uses SyntagNet to perform disambiguation in five languages (English, French, German, Italian and Spanish) is also available from the same website (will be demoed at ACL 2020!).

SyntagNet is an output of the MOUSSE ERC Consolidator Grant No. 726487 and of the ELEXIS project No. 731015 under the European Union’s Horizon 2020 research and innovation programme. Babelscape proudly developed the online interface and API, and provides the infrastructure for maintaining the service. 

The Sapienza NLP group

Baby TaLC 2020 postponed

URL: https://langident.hypotheses.org/talc2020/baby-talc2020-e-grapheles

En fonction des dernières informations concernant le coronavirus et les recommandations officielles en France, nous ne pourrons plus organiser en présentiel le colloque au mois de mai. L’organisation envisagée avec des moyens technologiques est également compromise dans la mesure où les restrictions concernent l’ensemble des personnels et des locaux. En ce sens, nous projetons de reporter ce colloque au mois d’octobre (16-18 octobres, à confirmer). Cette solution avait été évoquée lors du dernier message (02.03.20). Concernant les inscriptions, nous proposons aux personnes qui ont déjà payé en ligne, de conserver cette somme pour le mois d’octobre, si cela leur convient (merci de nous contacter). Pour les personnes qui ont choisi l’option « NO TRAVEL » mais qui voudront cependant faire le déplacement pour octobre, il leur sera demandé de régler la différence (merci de nous contacter). Les frais d’inscriptions pourront également être remboursés, à condition que la demande soit faite avant le 28 mai, date à laquelle le colloque aurait dû se tenir.Merci de votre compréhension. Le colloque de juillet est pour l’heure maintenu.

Según las últimas informaciones sobre el coronavirus y las recomendaciones oficiales en Francia, ya no es posible organizar para nosotros la conferencia prevista inicialmente en mayo. La organización que habíamos planeado con medios tecnológicos también se ve comprometida en la medida en que las restricciones afectan tanto al personal como los locales. Así que planeamos posponer la conferencia hasta el mes de octubre (del 16 al 18 de octubre, fechas por confirmar) como ya lo habíamos mencionado en nuestro último mensaje (02.03.20). En cuanto a las inscripciones, les proponemos a los que ya han pagado en línea, mantener este importe para el mes de octubre, si les conviene (por favor, contáctenos). Para aquellos que han elegido la opción “NO TRAVEL” pero que deseen viajar en octubre, se les pedirá que paguen la diferencia (por favor, contáctenos). Los derechos de inscripción también pueden ser reembolsados, siempre que la solicitud se haga antes del 28 de mayo, fecha en la que la conferencia debía realizarse.Gracias por su comprensión. Por el momento, la conferencia de julio se mantiene.


Segons la informació més recent sobre el coronavirus i les recomanacions oficials a França, ja no podem organitzar la conferència al mes de maig com inicialment previst. L’organització que habiem plantejat amb mitjans tecnològics també es veu compromesa ja que les restriccions afecten tant al personal i com als llocs. En aquest sentit, tenim previst ajornar aquesta conferència al mes d’octubre (del 16 al 18 d’octubre, per confirmar) com ja habiem evocat al nostre darrer missatge (02.03.20). Pel que fa a les inscripcions, podem proposar a les persones que ja han pagat en línia, guardar aquest import pel mes d’octubre, si us convé (poseu-vos en contacte amb nosaltres). Per a les persones que hagin triat l’opció “NO TRAVEL” però que vulguin fer el viatge al mes d’octubre, se’ls demanarà que paguin la diferència (poseu-vos en contacte amb nosaltres). També es poden reemborsar les despeses d’inscripció, sempre que la sol·licitud es faci abans del 28 de maig, data en què s’habia de fer la conferència.Gràcies per la seva comprensió. Actualment es manté la conferència de juliol.

V online summer school: Writing science in English

  • 08/09/2020 – 11/09/2020
  • Online

You can join this course here. Inscripciones: URL.

2019 edition, unforgettable memories!!!!!!!!!

Writing for the masses. A practical introduction to writing scientific -academic English in blogs – September 8

Dissemination and impact of our research have become as important as research itself in many ways. Doing science in the 21st century cannot be understood as a practice that is situated exclusively in highly specialised journals. Writing for the ‘public’ out there is absolutely essential these days to make sure our research is both visible and, most importantly, fundable. However, blog writing is rarely discussed as a soft skill. In this workshop we will use ad-hoc blog data to discuss the differences between research article writing and blog writing. Particularly, we will look at stance-making and positioning in blogs. During this session, students will write their own posts.

Pascual Pérez-Paredes is a Professor in Applied Linguistics and Linguistics, U. Murcia, Lecturer in Research in Second Language Education at the University of Cambridge and Salvador de Madariaga Research Fellow, NAU, US. His main research interests are learner language variation, the use of corpora in language education and corpus-assisted discourse analysis. He has published extensively in journals such as CALL, Discourse & Society, English for Specific Purposes, Journal of Pragmatics, Language, Learning & Technology, System, ReCALL and the International Journal of Corpus Linguistics. He was the Overall Coordinator of the MEd Research Methods Strand at the University of Cambridge. He is an Official Translator, Intérprete Jurado, appointed by the Ministerio de Asuntos Exteriores de España. He is the Assistant Editor of Cambridge University Press ReCALL (Q1 31 out of 187 in Linguistics).

Niall Curry is a Lecturer in Academic Writing at the Centre for Academic Writing at Coventry University. His research is interdisciplinary and centres on language pedagogy and the application of corpus linguistic approaches to different areas of applied linguistics. Among these areas is a focus on corpus-based studies of academic writing and metadiscourse in English, French, and Spanish, corpus-based contrastive linguistics, corpus-based studies of English language and language change, and corpus linguistics for TESOL and language teaching materials development. For further details on his background, areas of interest, projects, and current research, see his website.

Conference and paper abstracts – September 8

This session will be devoted to present what abstracts in general are, how abstracts are organised from the point of view of genre analysis, paying special attention to the organization in moves and steps. Next, we will focus on how the language can help as a tool to identify their different parts. We will analyse different types of abstracts, such as structured and non-structured abstracts. Our last step will bring us to study in detail how conference abstracts are organised and written. We will also study examples of conference abstracts and will propose ways to improve them to be successful.

Pilar Aguado-Jiménez joined the English Department at the University of Murcia in 1990. She taught business English (1990/99) and general English (1990/–). She took her PhD in 1997. She has been a language advisor for the CAGE Panel, Cambridge University Press (2003/05). Her main current areas of research are TEFL, and ESP. She has been involved in several international projects as TELLOP (Erasmus+ 2020), VGCLIL and VGCLIL for Migrants.

The research article: the literature review – September 9

In the first part of the session, the structure of the research paper will be analysed in detail considering the perspective of genre analysis, as the analysis of language accounts for, not only the way a text is constructed, but also for the way it is likely to be interpreted, used and exploited in specific contexts to achieve specific goals. Within this genre perspective particular attention will be paid to the steps or “moves” that perform a particular communicative function associated with the writer’s purpose. This “move” perspective will be applied to all sections of the research paper. In the second part, an analysis of the different  “moves” or steps will be carried out in the Literature Review. A theoretical and practical approach Will be combined in the two sessions. 

Purificación Sánchez is a Lecturer at the Department of English Studies, Faculty of Humanities, University of Murcia. Her main areas of research are corpus-assisted discourse analysis and English for Specific Purposes. She has published in RESLA, Ibérica, System, English Text Construction,  Higher Education in Europe and Discourse & Society.

How to write good academic /scientific English using corpora – September 9

This workshop centres on the construction and analysis of academic writing corpora using AntCorGen, TagAnt, and AntConc. Initially, we will focus on academic writing with the intention to highlight practices and foci within the field. Next, we will consider academic writing corpora, highlighting corpus construction considerations and tagging processes necessary for effective corpus-based analyses of academic writing. Using a bespoke academic corpus, we will analyse both untagged and tagged corpus data to learn more about writing and disciplinary conventions. The workshop will close with a brief analysis of what is called metadiscourse in academic writing with participants having the opportunity to present key insights they have gleaned from their own analyses.

Niall Curry is a Lecturer in Academic Writing at the Centre for Academic Writing at Coventry University. His research is interdisciplinary and centres on language pedagogy and the application of corpus linguistic approaches to different areas of applied linguistics. Among these areas is a focus on corpus-based studies of academic writing and metadiscourse in English, French, and Spanish, corpus-based contrastive linguistics, corpus-based studies of English language and language change, and corpus linguistics for TESOL and language teaching materials development. For further details on his background, areas of interest, projects, and current research, see his website.

The research article: results and discussion – September 10

This session is designed for multilingual students who grew up speaking a language other than English. It will help you develop your ability to express the results and discussion of your research effectively. You will become aware of your most troublesome aspects and learn strategies for improving vocabulary use, discourse options, paragraph format and organization. In academic English, your research results and discussion will be assessed by readers by your control of certain conventions, which may change depending on your audience and purpose. In this sense, this course will give you some tips to improve the way to express results and comment data. This course is divided into 4 units which deal with the written characteristics of the results and discussion of academic English. At the end of every unit you will have to do a compulsory test as part of the evaluation of this course.

María Luisa Carrió-Pastor is a Professor of English language at Universitat Politècnica de València, Spain. Currently she is the head of the Department of Applied Linguistics and the coordinator of the Doctorate degree “Languages, Literature, Culture and, their applications”. Her research areas are contrastive linguistics, pragmatics and the study of academic and professional discourse both for second language acquisition and discourse analysis

Writing for the reader – September 10

This workshops puts the reader at the centre of writing. We’ll look at signposting, use of headings,  consistency, formatting, style, referencing.  We’ll identify what you can do before you start, while you’re in the different draft stages of your writing and how to polish your final piece with detailed editing and proof-reading.

Geraldine Mark has over 30 years’ experience as editor and author researcher, and lecturer, drawn from applied linguistics, language teaching and learning, language analysis and materials development. Principal interests in corpus linguistics and its pedagogical applications.

Writing grant proposals – September 11

Grant proposals have a great impact on the scientific and academic community, since getting funding is paramount for our scientific progress and social development. Thus, knowing the essentials of a grant proposal may help applicants to succeed in their goal. The current course will give a general picture of what we understand by grant proposals and the types of grant proposals we might find. I will also provide participants with some resources to work autonomously on their future grant proposals. Some course materials include Pandadoc template grant proposals, research Grants on Education Spencer Foundation

Recommended book:
Carlson M. and O’Neal-Elrath, T. (2002) Winning grants Step by Step. John Wiley and Sons, New York: USA.

Begoña Bellés Fortuño, PhD, is a senior lecturer in the Department of English Studies at Universitat Jaume I, Spain, where she lectures English Studies degree students as well as in the degree of Medicine: She also supervises MA projects in the Secondary Education, Vocational Training and Language Teaching Master degree. She is currently the Director of the Interuniversity Institute of Modern Applied Languages (IULMA) at Universitat Jaume I. She is the Editor-In-Chief of Language Value journal and one of the executive directors of IBERICA journal. She has reviewed articles for JEAP, System or Language and Communication among other journals.

Writing for an international audience: the anglo tradition – September 11

It is a fact that the best science is ‘heavily biased’ toward journals published in English from English-speaking countries (Lillis and Curry 2010), which requires that all authors master the craft of writing in such contexts. The Saxonic intellectual style (Galtum, 1985) is characterized by an intensive use of data typically in ‘what is often a team effort’. This style is interested in hypothesis generation, actively engages in ‘dialogue with their peers and seeks to smooth out divergences of opinion’. Other styles use radically different angles to approach writing and reading science. In what is described here as the anglo tradition, reader orientation and essay form is actually of utmost importance to academic culture (Hermanns, 1985). In this session we will examine the fundamental tenets of this tradition: criticality, development of argument, rhetorical transfer and clarity.

Pascual Pérez-Paredes is a Professor in Applied Linguistics and Linguistics, U. Murcia, Lecturer in Research in Second Language Education at the University of Cambridge and Salvador de Madariaga Research Fellow, NAU, US. His main research interests are learner language variation, the use of corpora in language education and corpus-assisted discourse analysis. He has published extensively in journals such as CALL, Discourse & Society, English for Specific Purposes, Journal of Pragmatics, Language, Learning & Technology, System, ReCALL and the International Journal of Corpus Linguistics. He was the Overall Coordinator of the MEd Research Methods Strand at the University of Cambridge. He is an Official Translator, Intérprete Jurado, appointed by the Ministerio de Asuntos Exteriores de España. He is the Assistant Editor of Cambridge University Press ReCALL (Q1 31 out of 187 in Linguistics).

Transcription

Here you can find some useful resources to carry out your transcription project.

MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. 3rd Edition. Mahwah, NJ: Lawrence Erlbaum Associates.

Brian MacWhinney (2019) Tools for Analyzing Talk. Part 1: The CHAT Transcription Format. URL:
https://childes.talkbank.org/

Leech (2004): types of annotation

phonetic annotation e.g. adding information about how a word in a spoken corpus was pronounced.


prosodic annotation — again in a spoken corpus — adding information about prosodic features such as stress, intonation and pauses.

syntactic annotation —e.g. adding information about how a given sentence is parsed, in terms of syntactic analysis into such units such phrases and clauses

semantic annotation e.g. adding information about the semantic category of words — the noun cricket as a term for a sport and as a term for an insect belong to different semantic categories, although there is no difference in spelling or pronunciation.


pragmatic annotation e.g. adding information about the kinds of speech act (or dialogue act) that occur in a spoken dialogue — thus the utterance okay on different occasions may be an acknowledgement, a request for feedback, an acceptance, or a pragmatic marker initiating a new phase of discussion.
discourse annotation e.g. adding information about anaphoric links in a text, for example connecting the pronoun them and its antecedent the horses in: I’ll saddle the horses and bring them round. [an example from the Brown corpus]


stylistic annotation e.g. adding information about speech and thought presentation (direct speech, indirect speech, free indirect thought, etc.)
lexical annotation adding the identity of the lemma of each word form in a text — i.e. the base form of the word, such as would occur as its headword in a dictionary (e.g. lying has the lemma LIE).

Online services:

https://transcribe.wreally.com/

https://otranscribe.com/

BRAT: http://brat.nlplab.org/introduction.html

Backbone Transcriptor. URL

Gate: https://gate.ac.uk/teaching.html

Folia: https://proycon.github.io/folia/

Metadata for corpus work: http://users.ox.ac.uk/~lou/wip/metadata.html

Annotation on Sketch Engine: https://www.sketchengine.eu/guide/annotating-corpus-text/

TEI by example website: https://teibyexample.org/modules/TBED02v00.htm