Description

In natural languages, there are many ways to express complex human thoughts and ideas. This can be achieved by exploiting compositionality, i.e. concatenating simplex elements of language and thus yielding a more complex meaning that can be computed from the meaning of the original parts and the way they are combined. However, non-compositional phrases are also very frequent in any human language. These complex phrases can often be decomposed into single meaningful units, but the meaning of the whole phrase cannot (or can only partially) be computed from the meaning of its parts. Such phrases are often called multiword expressions (MWEs) and display lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasies (Baldwin & Kim 2010). In addition to idiomatic constructions, MWEs encompass closely related linguistic constructs such as light verb constructions, rhetorical figures and institutionalised phrases or collocations (Sag et al. 2002). MWEs pose problems for linguistic processing, especially in language learning and natural language processing (NLP), for instance, in machine translation, syntactic and semantic parsing, just to name a few applications.

Researchers from several disciplines such as computer science, linguistics and psychology have been jointly working on MWE modeling and processing. For instance, designing guidelines for the annotation of MWEs in corpora, and prominently in treebanks, has been undertaken in various languages and linguistic frameworks (Rosén et al. 2015). Lexical resources with MWEs in dozens of languages exist and are still being developed (Losnegaard et al. 2016). Many papers describe methods to discover new MWEs in texts, applying a wide variety of tools and techniques such as association measures, distributional methods and machine learning. Interactions of MWE processing with deeper levels of linguistic analysis, notably parsing and semantic processing, are being increasingly investigated (e.g. in SEMEVAL 2016 task 10 - DiMSUM). Special issues on MWEs have been published by leading journals (CSL in 2005, LR&E in 2010, ACM TSLP in 2013). Several funded projects focusing on MWEs are indicative of the growing importance of the field within the NLP community. For instance, the EU H2020 program currently supports the COST Action PARSEME (2013-2017), that addresses the role of MWEs in parsing and gathers more than 200 researchers from 33 countries covering 30 languages. It also inspired several national spin-off projects on MWEs.

Many of these advances are described and published in the annual MWE workshop. It attracts the attention of an ever-growing community working on a variety of languages and linguistic phenomena. The workshop has been held since 2001 in conjunction with major computational linguistics conferences (ACL, COLING, LREC, EACL). It represents an important venue for the community to interact, share resources and tools, and collaborate on efforts for advancing the computational treatment of MWEs.

We call for papers on major challenges in MWE processing, both from the theoretical and the computational viewpoint, focusing on original research related (but not limited) to the following topics:

Manually and automatically constructed lexical resources
MWE representation in lexical resources
MWE annotation in corpora and treebanks
MWEs in non-standard language (e.g. tweets, forums, spontaneous speech)
Original MWE discovery methods (e.g. using word embeddings, parallel corpora)
Original MWE in-context identification methods (e.g. using deep learning, topic models)
MWE processing in syntactic frameworks (e.g. HPSG, LFG, TAG, universal dependencies)
MWE processing in semantic frameworks (e.g. WSD, semantic parsing)
MWE processing in end-user applications (e.g. summarization, machine translation)
Orchestration of MWE processing with respect to applications
Evaluation of MWE processing techniques
Models of first and second language acquisition of MWEs
Theoretical and psycholinguistic studies on MWEs
Crosslinguistic studies on MWEs

This year, we propose to have an extension to the traditional workshop by including a special track for shared task papers. The shared task will compare and evaluate systems for the automatic identification of verbal MWEs in sentences (see below). Participants will have the opportunity to submit shared task system description papers and present their approach and results at the workshop.

The organizers are planning the edition of selected papers from the workshop in a volume of the Phraseology and Multiword Expressions book series, whose creation has been accepted by Language Science Press.

Special Track: PARSEME Shared Task on Automatic Verbal MWE Identification

In addition to the main scientific program, we propose to have a special track dedicated to the shared task on automatic verbal MWE identification. A separate session will be allocated for the special track within the workshop. Language teams will present their systems and summarize their results in a poster/demo session. Authors may submit papers either to the special track or to the regular workshop. They should follow common submission instructions, based on those of the main conference. The special track is endorsed and partly funded by PARSEME.

Submission modalities

The main track will feature long and short papers:

Long papers (8 content pages + references): Long papers should report on solid and finished research including new experimental results, resources and/or techniques.
Short papers (4 content pages + references): Short papers should report on small experiments, focused contributions, ongoing research, negative results and/or philosophical discussion.

The shared task track will feature system description papers:

System description papers (4 content pages + references): System description papers briefly describe the approach implemented to solve the problem. They may include references and links to more detailed descriptions in other documents.

There is no limit on the number of reference pages. Authors will be granted an extra page for the final version of their papers.

For the main track, submission will be double-blind, the reported research should be substantially original and the papers will be presented orally or as posters. The decision as to which papers will be presented orally and which as posters will be made by the program committee based on the nature rather than on the quality of the work.

The shared task system description papers will go through a separate reviewing process. Like in SEMEVAL, submissions will be double-blind and will be reviewed by the shared task organizers and participants according to the schedule below. The selected papers will be presented as posters. Participants of the shared task are not required to submit system description papers, and their acceptance depends on the quality of the paper rather than on the results obtained in the shared task.

For all types of submission, the EACL 2017 LaTeX templates should be used. In accordance to EACL 2017 submission policy, this is a condition for accepting the paper for the reviewing process. Final versions of accepted papers will be submitted both in PDF and source LaTeX formats.

All papers should be submitted via the START space. Please choose the appropriate submission type, according to the category of your paper.