Tono Linguistic feature extraction #cefr #cl2015

Yukio Tono

Linguistic feature extraction and evaluation using machine learning to identify “criterial” grammar constructions for the CEFR levels

IMG_20150722_160026

 

L2 learner profile

English Profile – CEFR for Englsih

Criterial features: Hawkins & Filipovic 2012

CEFR-J RLD Project: aim prepare list of vocabulary and grammar item to be taught and assessed at each CEFR level

CEFR Coursebook Corpus

IMG_20150722_160504

Weka format 3.6.12

158 features

Attribute selection

 

 

 

 

 

 

#cfp @fetlt2015 Future and Emerging Trends in Language Technologies

Through the AESLA mail list

:::::::::::::::::::::::::::::::::::::::

iPad-icon

Workshop on Future and Emerging Trends in Language Technologies

Universidad de Sevilla, 19-20 November 2015

http://www.glc.us.es/fetlt2015/

 

The Workshop ‘Future and Emerging Trends in Language Technology‘ has been conceived as a meeting point where experts and professionals in the fields of language technologies and other converging areas will discuss the state of the art, as well as the emerging trends in this sector. The main objective of this workshop is to serve as a bridge between academia and industry, as well as representatives of agencies that coordinate research and innovation policies. The workshop thus guarantees a multidisciplinary identical spirit in which experts will be able to present and analyze the trends that will shape the immediate future in this sector.

Following this approach, the organization of the workshop welcomes the reception of papers under the following categories:

NEW APPLICATIONS OF KEY CONSOLIDATED APPROACHES: Authors can submit their paper on new strategies, models and consolidated techniques at the academic or industrial level that are being used right now to tackle any issue in the field of Language Technology. Papers under this category must provide a brief explanation of the foundations of the approaches proposed and the areas and applications for which those techniques are useful in the present.

EMERGING RESEARCH: Authors can submit their paper under this category when they have preliminary results obtained from ongoing research projects. Papers must describe the motivation of the approach, as well as the scientific, methodological and/or technological approach chosen. Papers must also analyze the advantages and benefits derived from such approaches for a broad application in the field of Language Technology.

CHALLENGE PAPERS: Authors can submit a paper on different fields and convergent areas related to Language Technology describing the occurrence of new and constant challenges for both the academic and the industrial areas. These papers must indicate which areas and specific problems are currently posing a concrete technological and/or methodological challenge. Papers under this category must include the reasons why present-day techniques should be considered insufficient to tackle the issues at hand by the presentation of preliminary research/development results as a justification. Additionally, articles in this section should propose research strategies that can be considered promising to provide sound solutions to the problems defined, with a sound and clear scientific and technical argumentation.
—————————————————————
LIST OF TOPICS
—————————————————————

Topics should be related to any area of Speech Technology, including those studies that can be considered coming from convergent areas or even industrial applications.
Topics of interest include, but are not restricted to:

Core areas of interest

A.1) Speech recognition:
Speech assistants, Voice search
A.2) Information retrieval,
Information extraction and Text mining
Topic spotting and classification
Entity extraction
Spoken document retrieval
A.3) Semantics and Ontologies
A.4) Dialog Modelling and Management
Open domains, Incrementality, Statistical DM,
Hybrid models, World knowledge, Metacognition
A.5) Machine Translation
Fully-automated MT services in Global Business and
Government Services
Speech-to-speech MT
A.6) Development Frameworks
A.7) Multimodality
A.8) Multilinguality
A.9) Mathematical foundations
A.10) Language resources and Evaluation
Multilingual resources
Metadata, annotation, tools

Convergent areas of interest:
B.1) Mobile Devices
B.2) Robotics and Vision
B.3) Machine Learning
B.4) Games & Social Networks
B.5) Brain-computer Interfaces
B.6) Technology background: Mobile, Cloud,
Social Media, and Big Data
B.7) The Internet of Things (IoT)

Industrial areas of interest:
Integration of state-of-the-art LT in support of multilingual global business applications:
C.1) Speech-to-Speech Translation
C.2) Cross-lingual Information Retrieval
C.3) Multilingual global marketing
C.4) Sentiment analysis
Applications to industrial sectors
C.5) Healthcare and BioMedicine NLP
C.6) Social Media
C.7) Smart Cities
C.8) Opinion mining
C.9) Public Administration
C.10) Instruction & Teaching
C.11) Communications
LT in the Web World
C.12) Crowdsourcing for LT

—————————————————————
IMPORTANT DATES
—————————————————————

Paper submission deadline 25th July 2015
Acceptance notification 15th September 2015
Paper final version submission 1st October 2015
Early Registration Deadline 1st October 2015
Workshop dates 19th – 20th November 2015

—————————————————————
LOCATION
—————————————————————

FETLT-2015 will be held at the University of Seville, Spain.
For more information, please visit: http://www.glc.us.es/fetlt2015/
—————————————————————
SUBMISSION PROCEDURE
—————————————————————

Authors are invited to submit non-anonymized papers in English presenting original and unpublished research, not currently submitted elsewhere.

Regular papers should not exceed 12 single-spaced pages (including eventual appendices) and should be formatted according to the standard format for Springer Verlag LNCS series (see http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0).

Files must be sent via https://www.easychair.org/conferences/?conf=fetlt2015

Papers submitted must identify the category as well as up to 3 of the main topics aforementioned.

—————————————————————
INVITED SPEAKERS
—————————————————————

Nuria Bel (University Pompeu Fabra)
Asunción Gómez, Polytechnic University of Madrid
Sebastian Moeller, TU Berlin, Telekomm
Steve Renals, University of Edinburg
Giuseppe Riccardi, University of Trento
Pierre-Paul Sondag, European Commission
Steve Young, University of Cambridge

PROGRAM COMMITTEE AND ADVISORY GROUP

Alex Acero (Apple)
Roberto Basili (University of Rome)
Nuria Bel (University Pompeu Fabra)
Johan Bos (University of Groningen)
Nicoletta Calzolari (CNR-ILC)
Khalid Choukri (ELDA)
Walter Daelemans (University of Antwerp)
Thierry Declerck (DFKI)
Marc Dymetman (Xerox Research Centre Europe)
Antonio Ferrandez (University of Alicante)
Ana García-Serrano (UNED)
Jesús Giménez (Nuance Communications)
Xavier Gómez-Guinovart (University of Vigo)
Gregory Grefenstette (Inria)
Veronique Hoste (University of Ghent)
Eduard Hovy (Carnegie Mellon University)
Rebecca Jonson (Artificial Solutions)
Alon Lavie (Carnegie Mellon University)
Ramón López-Cózar (University of Granada
Teresa López-Soto (University of Seville)
Roberto Manione (AlliumTech)
Daniel Marcu (USC)
Joseph Mariani (LIMSI-CNRS and IMMI)
Patricio Martí­nez-Barco (University of Alicante)
Ruslan Mitkov (University of Wolverhampton)
Antonio Moreno-Sandoval (Autonomous University of Madrid)
Sergei Nirenburg (Rensselaer Poytechnic Institute)
Mirko Plitt (Modula Language Automation)
Massimo Poesio (University of Essex; U. of Trento)
Andrei Popescu-Belis (Idiap Research Institute)
Jose F. Quesada (University of Seville)
Manny Rayner (University of Geneva)
Steve Renals (University of Edinburg)
Giuseppe Riccardi (University of Trento)
Francisco J. Salguero (University of Seville)
Kepa Sarasola (University of the Basque Country)
Javier Sastre (Ateknea Solutions)
Marc Steedman (University of Edinburgh)
David Suendermann-Oeft (ETS)
Khiet Truong (University of Twente)
Alfonso Ureña (University of Jaen)
Jason D. Williams (Microsoft Research)
PROGRAM CHAIR
Jose F Quesada, University of Seville

ORGANIZING COMMITTEE
Joaquín Borrego-Díaz (University of Seville)
Juan Galán-Páez (University of Seville)
Diego Jiménez (University of Seville)
Teresa López-Soto (University of Seville)
Francisco J. Martín-Mateos (University of Seville)
Ángel Nepomuceno (University of Seville)
José F. Quesada (University of Seville)
Francisco J. Salguero (University of Seville)

 

CFP Posters on late-breaking results June 15 deadline

Through the corpora list

:::::::::::::::::::::::::::::::::
CORPUS LINGUISTICS 2015

The CL2015 organising committee is pleased to issue a call for posters on late-breaking results on any of the topics in the conference’s scope. By “late-breaking” we mean research which was not at a sufficiently advanced stage for an abstract submission to be made in the main submission cycle, but which has now reached that point.

We anticipate that the research in question will still be in its earliest phases. “Late-breaking results” include – but are not necessarily limited to – pilot study results, corpus creation activities currently in hand, newly-developed software, and so on.

· Abstracts should be 400-750 words in length. They must be formatted using the conference stylesheet (available to download from http://ucrel.lancs.ac.uk/cl2015/call.php )

· We especially encourage submission of abstracts from early-career researchers, including postgraduate research students and postdoctoral researchers.

· Abstracts which were previously submitted for the January deadline, and not accepted, are NOT eligible to be resubmitted.

· Abstracts should be submitted by email to cl2015@lancaster.ac.uk by 15th June 2014.

· As with all presentations, at least one author of any late-submission poster must attend the conference.
For more details see http://ucrel.lancs.ac.uk/cl2015

An archive copy of the previously-circulated CL2015 Call for Participation may be found here: http://ucrel.lancs.ac.uk/cl2015/doc/CL2015-CallParticipation.pdf

Andrew Hardie, Tony McEnery, Amanda Potts, Vaclav Brezina, and Paul Rayson
The CL2015 Organising Committee

Corpus Linguistics 2015: @UCREL_Lancaster registration open

From the Corpora List
:::::::::::::::::::::::::::::::::::::::::::::::::::

Corpus Linguistics 2015: In honour of the life and work of Geoffrey Leech

 
 


The eighth international Corpus Linguistics conference (CL2015) will be held at Lancaster University from Tuesday 21st July 2015 to Friday 24th July 2015. The main conference will be preceded by a workshop day on Monday 20th July.

This series of conferences began in 2001 with an event celebrating the career of Professor Geoffrey Leech, on the occasion of his retirement. In August of 2014, we reported with great sadness Geoff’s sudden death.

By dedicating this eighth conference in the Corpus Linguistics series once again to a celebration of Geoff’s life, his career, and his truly remarkable influence on the field, we once more pay tribute to, and commemorate, a remarkable intellect and a sorely-missed colleague and friend.

Conference themes and topics

The goals of the conference are:

. To gather together current and developing research in the study and application of corpus linguistics; . To push the field forwards by promoting dialogue among the many different users of corpora across interconnected sub-disciplines of linguistics – be they descriptive, theoretical, applied or computational; . To explore new challenges both within corpus linguistics, and in the extension of corpus approaches to new fields of study.

CL2015 will have three thematic streams and a general programme.

Stream A: A tribute to Geoffrey Leech

For this stream we invite contributions using corpus methods in any of the branches of linguistics with which Geoffrey Leech’s research was especially closely associated, namely:

. Pragmatics
. Stylistics
. Description of English grammar and grammatical change . Grammatical annotation of corpus texts

Stream B: Discourse, Politics and Society

For this stream we invite contributions in the following areas:

. The use of corpora in discourse analysis . Corpus approaches to the study of new media . Applications of corpus approaches in the social sciences and humanities

Stream C: Language learning and teaching

For this stream we invite contributions in the following areas:

. Learner corpus research
. Corpus-based work in English language teaching, including ESP and EAP . Use of corpora in second language acquisition studies . Data-driven learning . Development of learner materials

General Programme

For the general programme, we invite contributions on as broad and inclusive a basis as possible. The areas in which we particularly welcome submissions include but are not limited to:

. Corpus methodology:
o Critical explorations of existing measures and methods in corpus linguistics; o New methods and techniques in corpus development, annotation and analysis; o New tools and techniques developed in corpus-based computational linguistics; o Advances in quantitative techniques.
. Theoretical corpus linguistics:
o The interface between corpus and linguistic theory; o Syntax, morphology, semantics; o Psycholinguistic and cognitive explorations; o Multi-lingual comparative and contrastive analysis; o Historical linguistics.
. Lexis and lexicon:
o Lexicography;
o Collocation and meaning in context.
. Sociolinguistics, language variation and applied linguistics:
o Regional and social variation in language; o Code-switching and bilingualism; o Forensic linguistics; o Genre, register and textual variation.

Plenary speakers

We are delighted to announce that the following speakers have accepted our invitation to give plenary lectures at CL2015:

. Douglas Biber (Northern Arizona University, USA) . Sylviane Granger (Université catholique de Louvain, Belgium) . Michaela Mahlberg (University of Nottingham, UK) . Alan Partington (Università di Bologna, Italy)

Call for pre-conference workshops

As noted above, CL2015 will include a workshop day on Monday 20th July 2015. We hereby issue a call for workshop proposals on any theme relevant to the conference.

“Workshops” may take two main forms.

The first type is the colloquium-style workshop, which operates as a mini-conference with its own programme committee and call for papers to be presented: proposals for this type of workshop should specify the scope of the workshop, who its organisers will be, and whether the creation of workshop proceedings is envisaged. Proposals should also provide an initial version of the text of the call for papers.

The other main type of workshop is a practical or applied workshop providing a demonstration of or training in some particular corpus linguistic technique or piece of software. In this case the proposal must explain the content of the workshop, provide an initial version of the text of a call for participation, and give an indication of the workshop’s IT requirements, if any.

We are also happy to consider innovative forms of workshop intermediate between colloquium-style workshop and practical workshop.

All proposals must in addition specify the proposed running time. Our timetable allows for the following lengths of workshop:

. Full-day workshop – up to 7 hours (plus lunch/breaks) . Half-day workshop – up to 3.5 hours (plus break) . Short workshop – up to 2 hours (single session)

There is no fixed format for workshop proposals, as long as they include all the details specified above. Proposals should be sent by email to Andrew Hardie by 15th December. We are happy to respond to informal expressions of interest in advance of formal submission of a proposal.

Call for papers, posters and panels

We invite submission of abstracts for papers, posters and panels on any topic relevant to the conference themes.

For this conference, we are requesting extended abstracts (750-1500 words), as we do not plan to produce a volume of conference proceedings. All abstracts will be peer-reviewed by the conference programme committee.

Paper presentations will consist of a 20 minute talk followed by 10 minutes for questions and discussion. Please note: paper submissions should present either complete research, or research in progress where at least some substantial results have been achieved. Work in progress which has yet to produce results can instead be submitted as a poster abstract.

Submissions for panel discussions should take the form of a single 1500 word abstract on behalf of all speakers to be on the panel. The abstract should include a note to specify whether the panel is intended to be 1 hour or 1.5 hours in length.

Submissions for poster presentations should be shorter (400-750 words). We especially welcome poster abstracts that (a) report on innovative research that is in its very earliest phases (b) report on new software or corpus data resources.

We especially encourage abstract submissions from early-career researchers, including postgraduate research students and postdoctoral researchers.

All abstracts must be submitted via the conference website; the submission system is now live (see http://ucrel.lancs.ac.uk/cl2015/call.php ). Details on how to submit an abstract to a specific conference stream are available on the website.

Key dates

. End October 2014 – call for papers; call for proposals for pre-conference workshops . 
7th January 2015 – deadline for abstract submission . 
16th January 2015 – earlybird registration opens . 
24th January 2015 – all abstract review outcomes will be returned by this date . 
30th March 2015 – end of earlybird registration (rates rise) .
 21st June 2015 – end of main registration (late registration not guaranteed, though we’ll try) . 
21st June 2015 – final deadline for cancellation with refund of registration fees . 
20th July 2015 – pre-conference workshop day . 
21st July to 24th July 2015 – main conference

General information

For information on registration, accommodation travel etc., see the conference website: http://ucrel.lancs.ac.uk/cl2015 ; email: cl2015@lancaster.ac.uk

The conference is hosted by the UCREL research centre (http://ucrel.lancs.ac.uk), which brings together the Department of Linguistics and English Language (http://www.ling.lancs.ac.uk/) with the School of Computing and Communications (http://www.scc.lancs.ac.uk/).

Local organising committee of CL 2015: Andrew Hardie (chair), Tony McEnery, Paul Rayson.

1st Intl. NLP for Informal Text- Deadline 17/4

Graph-Magnifier-icon

The 1st International Workshop on Natural Language Processing for Informal Text (NLPIT 2015)
In conjunction with The International Conference on Web Engineering(ICWE 2015)
June 23, 2015, Rotterdam, The Netherlands
http://wwwhome.cs.utwente.nl/~badiehm/nlpit2015/

Overview
The rapid growth of Internet usage in the last two decades adds new challenges to understand the informal user generated content (UGC) on the Internet. Textual UGC refers to textual posts on social media, blogs, emails, chat conversations, instant messages, forums, reviews, or advertisements that are created by end-users of an online system. A large portion of language used on textual UGC is informal. Informal text is the style of writing that disregard language grammars and uses a mixture of abbreviations and context dependent terms. The straightforward application of state-of-the-art Natural Language Processing approaches on informal text typically results in significantly degraded performance due to the following reasons: the lack of sentence structure; the lack of enough context required; the seldom entities involved; the noisy sparse contents of users’ contributions; and the untrusted facts contained. It is the aim of this work- shop to bring the attention of researchers to the opportunities and challenges involved in informal text processing. In particular, we are interested in discussing informal text modeling, normalization, mining, and understanding in addition to various application areas in which UGC is involved.

Topics

We invite submissions on topics that include, but are not limited to, the following core NLP approaches for informal UGC: language identification, classification, clustering, filtering, summarization, tokenization, segmentation, morphological analysis, POS tagging, parsing, named entity extraction, named entity disambiguation, relation/fact extraction, semantic annotation, sentiment analysis, language normalization, informality modeling and measuring, language generation, handling uncertainties, machine translation, ontology construction, dictionary construction, etc.

Submission

Authors are invited to submit original work not submitted to another conference or workshop. Workshop submissions could be a full paper or short paper. Paper length should not exceed 12 pages for full papers and 6 pages for short papers. All papers should follow the Springer’s LNCS format. Papers in PDF can be sent via the EasyChair Conference System https://easychair.org/conferences/?conf=nlpit2015. Each submission will receive, in addition to a meta-review, at least 2 peer double-blind reviews. Each full paper will get 25 minutes presentation time. Short papers will get 5 minutes presentation time in addition to a poster. Beside papers, we also plan to have an invited talk by a renowned scientist on a topic relevant for the workshop. Workshop proceedings will be published as part of the ICWE2015 workshop proceedings. To contact the NLPIT 2015 organization team, please send an e-mail to: nlpit2015@easychair.org.

Deadlines

– Submission deadline: April 17, 2015
– Notification deadline: May 17, 2015
– Camera-ready version: May 24, 2015
– Workshop date: June 23, 2015

Msg. distributed through the corpora list

CFP Terminology & Artificial Intelligence 2015

​ TIA 2015: FIRST CALL FOR PAPERS

—————————————————–
Terminology and Artificial Intelligence 2015
4 November – 6 November 2015
University of Granada, Spain

​http://lexicon.ugr.es/tia2015/Home.html​

Terminology and Artificial Intelligence (TIA) 2015 will highlight the close connection between multilingual terminology, ontologies, and the representation of specialized knowledge. Knowledge, as regarded in Terminology, is something more complex than a simple hierarchy or a thesaurus-like structure. In this sense, ontologies, understood as a shared conceptualization of a domain that can be communicated between people and/or systems, are better suited for accounting for multilinguality and contextual constraints. The link between Terminology and knowledge representation has been widely acknowledged with the advent of multilingual ontologies.

This is particularly relevant since today’s networked society has generated an increasing number of contexts where multilingualism challenges current knowledge representation methods and techniques. To meet these challenges, it is necessary to deal with semantics since information can be organized, presented, and searched, based on meaning and not just text. Ideally, this would mean that language-independent specialized knowledge could be accessed across different natural languages. There is thus the urgent need for high-quality multilingual knowledge resources that are able to bridge communication barriers, and which can be linked and shared.

Such issues can only be successfully addressed with creative collaborative solutions within disciplines, such as knowledge engineering, terminology, ontology engineering, cognitive sciences, corpus lexicology, and computational linguistics. Accordingly, the TIA 2015 Conference will provide a forum for interdisciplinary research that focuses on the intersection of different disciplines dealing with terminology, multilingualism, lexicology, ontology, and knowledge representation. Papers may address both theoretical questions and methodological aspects on these issues, as well as interdisciplinary approaches developed to facilitate convergence and co-operation in terminological aspects of importance to an increasingly multilingual society.

TIA 2015 solicits both regular papers (8 pages), which present significant work, and short papers (4 pages), which typically present work in progress or a smaller, focused contribution. Regardless of the language of the paper( English, Spanish, or French), all paper presentations will be in English. The submission deadline is June 15. See the conference webpage for more specific submission details.

TOPICS
1. Terminology and ontology acquisition and management
· Applying pattern recognition to enriching terminological resource
· Lexicons, thesauri and ontologies as semantic resources
· Lexicons and ontologies as means for knowledge transfer
· Reusing, standardizing and merging terminological or ontological resources
· Multilingual terminology extraction
· Multilinguality and multimodality in terminological resources
· Management of language resources
1. Terminology and knowledge representation
· Ontological semantics and linguistic
· Ontology localization
· Development of multimedia terminological resources
· Terminology alignment in parallel corpora and other lexical resources
· Representation of terms and conceptual relations in knowledge-based applications
· Comparative studies of terminological resources and/or ontological resources
· Terminological resources in the 21st century
· Harmonization of format and standards in terminological resources
1. Terminology and ontologies for applications
· Interoperability and reusability in knowledge-based tools and applications
· Models and metamodels in annotating semantic and terminological resources
· New R&D directions in terminology for industrial uses and needs
· Terminology for machine translation and natural language processing

Featured plenary speakers
Paul Buitelaar, National University of Ireland, Galway, Ireland
Ricardo Miral Usón, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain

TIA 2015 CHAIRS
Pamela Faber, University of Granada
Thierry Poibeau, CNRS

The PROGRAMME COMMITTEE members are distinguished experts from all over the world.

SUBMISSION INFORMATION
See the TIA 2015 website: http://lexicon.ugr.es/tia2015/Submission.html

IMPORTANT DATES
Paper submissions (long and short papers): 15 June 2015
Notification to authors: 4 September 2015
Final camera-ready paper: 24 September 2015
Conference: 4-6 November 2015

VENUE:
University of Granada,
Faculty of Translation and Interpreting,
18071 Granada, Spain

Contact information: termai2015@gmail.com​

Mensaje distribuido a través de la lista de (AESLA)