sign-lang@LREC Anthology

Sign language corpora and the problems with ELAN and the ECHO annotation conventions

Herrmann, Annika ORCID button Herrmann, Annika


Volume:
Proceedings of the LREC2008 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora
Venue:
Marrakech, Morocco
Date:
1 June 2008
Pages:
68–73
Publisher:
European Language Resources Association (ELRA)
License:
CC BY-NC
sign-lang ID:
08015

Content Categories

Languages:
German Sign Language, Irish Sign Language, Sign Language of the Netherlands
Corpora:
Corpus Herrmann
Editors:
ELAN

Abstract

Large corpus projects require logistic, technical and personal expertise and most importantly a conventionalized annotation system. In addition, relatively small projects with a definite set of data can also be an invaluable contribution to linguistic sign language research and therefore should use the same technical methods and annotation conventions for comparative reasons. The poster will present the process of building a corpus that is needed for a cross-linguistic study currently undertaken and focuses on the problems that arise with regard to annotation. The respective solutions shall be suggestions towards a unified convention.
In this project, elicited data from three European sign languages and altogether 20 informants provide a set of approx. 900 sentences and short dialogues. Metadata information about participants and the recording situation will be edited in the IMDI metadata set. ELAN provides the most adequate annotation system for my purposes as the main interest of the study lies in the use of nonmanuals. The tool is widely used for sign language annotation and I try to guarantee for comparability by mainly adopting the ECHO annotation system with a few necessary adaptations.
Problems listed below include repeatedly asked questions that are still not defined clearly yet:
a) How are the on- and offsets of signs determined? Shall we annotate the separate signs or the signing stream integrating the transition period?
b) How should pointing signs or constructions with many meaning components be transcribed?
c) Despite more or less clear definitions of what each tier should be used for, the GLOSS-tier is sometimes intertwined with external information not fitting the tier. How can these problems be avoided?
d) What kinds of disadvantages occur, if the eye gaze and eye blink annotations are not accurate?
Possible Solutions:
a) Even though the on- and offsets of signs can be defined more precisely than for words, the sign syllable not always has clear boundaries. Signing should be annotated as a streaming process that is interrupted when there is a hold or a significant pause. The transition from one sign to the other is often clearly visible through handshape change, which seems to be the more adequate marker for annotation. (The only problem left being sign duration, which cannot entirely be solved by the vague separate sign annotation either.)
b) Proposal for a more detailed distinction of pointing signs without being theoretical (at least IX-1 for signer, IX-dual (excl., incl.) e.g.) and poly-meaning constructions (e.g. BE-LOCATED-CL:vehicle instead of (p-)vehicle-be-located; BLEAK instead of (p-)bleaking sheep when SHEEP is already introduced, decision between HOLD-CL:potato and HOLD-CL:round object).
c) The GLOSS tier should only be used for manual signs or gestures, nonmanuals should not be included (*WALK- PURPOSEFUL). An additional tier is useful: other NMFs/look/other facial expressions
d) Continuous eye gaze and eye aperture annotation is necessary to exactly determine eye gaze change with or without an eye blink and the duration and timing of blinks. This can especially be relevant for prosodic analysis.

Document Download

Paper PDF Poster BibTeX File+ Abstract

BibTeX Export

@inproceedings{herrmann:08015:sign-lang:lrec,
  author    = {Herrmann, Annika},
  title     = {Sign language corpora and the problems with {ELAN} and the {ECHO} annotation conventions},
  pages     = {68--73},
  editor    = {Crasborn, Onno and Efthimiou, Eleni and Hanke, Thomas and Thoutenhoofd, Ernst D. and Zwitserlood, Inge},
  booktitle = {Proceedings of the {LREC2008} 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora},
  maintitle = {6th International Conference on Language Resources and Evaluation ({LREC} 2008)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Marrakech, Morocco},
  day       = {1},
  month     = jun,
  year      = {2008},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/08015.pdf}
}
Something missing or wrong?