sign-lang@LREC Anthology

Adding value to, and extracting of value from, a signed language corpus through secondary processing: implications for annotation schemas and corpus creation

Johnston, Trevor ORCID button Johnston, Trevor


Volume:
Proceedings of the LREC2010 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies
Venue:
Valletta, Malta
Date:
22 and 23 May 2010
Pages:
137–142
Publisher:
European Language Resources Association (ELRA)
License:
CC BY-NC
sign-lang ID:
10002

Content Categories

Languages:
Auslan
Corpora:
Auslan Corpus

Abstract

A basic signed language (SL) corpus is created through primary processing of video recordings using multi_media annotation software. Primary processing entails the tokenization and identification of SL units. For the purposes of linguistic research a corpus also needs secondary processing. Secondary processing entails appending tags for specific linguistic features to primary annotations. I draw on the experience from the Auslan corpus project to describe how primary and secondary processing can be used in corpus-based SL research. In particular, I show how the tier structure of ELAN can be used to tag SL units in a variety of ways, and how this information can be used to glean new information from the corpus which can then be added as new annotations to the corpus. Value-adding by principled and systematic primary and secondary processing of digital recordings is thus not only essential for corpus creation ('machine-readability'), it also enables further enriching of the corpus so that even more value can be extracted. I conclude by discussing the implications for annotation software and standardized annotation schemas used in the creation of SL corpora.

Document Download

Paper PDF BibTeX File+ Abstract

BibTeX Export

@inproceedings{johnston:10002:sign-lang:lrec,
  author    = {Johnston, Trevor},
  title     = {Adding value to, and extracting of value from, a signed language corpus through secondary processing: implications for annotation schemas and corpus creation},
  pages     = {137--142},
  editor    = {Dreuw, Philippe and Efthimiou, Eleni and Hanke, Thomas and Johnston, Trevor and Mart{\'i}nez Ruiz, Gregorio and Schembri, Adam},
  booktitle = {Proceedings of the {LREC2010} 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies},
  maintitle = {7th International Conference on Language Resources and Evaluation ({LREC} 2010)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Valletta, Malta},
  day       = {22--23},
  month     = may,
  year      = {2010},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/10002.pdf}
}
Something missing or wrong?