7th Workshop on the Representation and Processing of Sign Languages:

Corpus Mining

LREC logo

CALL FOR PAPERS

Abstracts are invited for a full day workshop on sign language resources, to take place following the 2016 LREC conference on May 28th, 2016. Recent technological developments allow sign language researchers to create relatively large video corpora of sign language use that were unimaginable ten years ago. Several national projects are currently underway, and more are planned. This workshop aims to share experiences from current and past efforts. What are the problems that were encountered and the solutions created? What are the linguistic decisions taken? How have the data been analyzed?

The special focus of this workshop is on Corpus Mining. If one counts Big Data by the storage capacities needed, sign language corpora do qualify as Big Data. It is a different story, however, when you count by any linguistic means, such as tokens. But even then, many people working on sign language corpora have the feeling that there is much more in their data than they are able to squeeze out now that there is much more material than one person can have an intimate knowledge of. Thus, there is an increasing demand for methods to detect interesting data within sign language corpora. There are at least three dimensions to address:

  • traditional linguistic as well as statistical and machine learning approaches on the basis of hand-made annotation,
  • computer vision operating on the sign language video data, and,
  • in the case of translated material, language processing on the spoken language side identifying areas of interest in the original sign language.

We see the first applications drawing synergies from combining these methods.

The workshop will discuss methodologies, best-practice examples, linguistic data, and also applications of corpora within and beyond sign language linguistics. For sign language technologies, five areas will be in the focus:

  • Large-scale data visualization
  • Statistical analysis of corpus content
  • Integration of supervised and unsupervised machine learning into corpus environments
  • Sign language recognition (video image processing) leading to (semi-)automatic annotation
  • Synergies between analysis on the manually created annotation, computer vision, and mix-ins from spoken language technologies

It is expected that two out of four sessions will be devoted to the focus topics, whereas the other two will cover more general sign language corpus issues. So we invite abstracts for 20-minute papers or posters (with or without demonstrations) on the following topics:

Corpus Mining

  • Tagging to detect structure
  • Large-scale data visualization
  • Statistical analysis of corpus content
  • Integration of supervised and unsupervised machine learning into corpus environments
  • Sign language recognition (video image processing) leading to (semi-)automatic annotation
  • Synergies between analysis on the manually created annotation, computer vision, and mix-ins from spoken language technologies
  • User interface design to integrate new approaches into corpus linguistics workbenches that sign language researchers work with

General Issues on Sign Language Corpora and Tools

  • Experiences in building sign language corpora
  • Proposals for standards for linguistic annotation
  • Elicitation methodology appropriate for corpus collection
  • Proposals for standards for linguistic annotation or for metadata descriptions
  • Experiences from linguistic research using corpora
  • Use of (parallel) corpora and lexicons in translation studies and machine translation
  • Language documentation and long-term accessibility for sign language data
  • Video compression and streaming for sign language
  • Tool development
  • Linking corpora and lexicons
  • Integrated presentation of corpus and dictionary contents
  • Avatar technology as a tool in sign language corpora and corpus data feeding into advances in avatar technology

Papers (4-8 pages) of both oral/signed presentations and poster presentations of this workshop will be published as workshop proceedings published on the conference website.

Please submit your abstract through the LREC START system not later than Feb 6th, 2016 Feb 12th, 2016, indicating whether you prefer an oral/signed or a poster presentation. In the latter case, please also indicate whether you plan to combine the poster with a demo.

When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.), to enable their reuse, replicability of experiments, including evaluation etc.