Adding value to, and extracting of value from, a signed language corpus through secondary processing: implications for annotation schemas and corpus creation
Johnston, Trevor 
- Volume:
- Proceedings of the LREC2010 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies
- Venue:
- Valletta, Malta
- Date:
- 22 and 23 May 2010
- Pages:
- 137–142
- Publisher:
- European Language Resources Association (ELRA)
- License:
- CC BY-NC
- sign-lang ID:
- 10002
Content Categories
- Languages:
- Auslan
- Corpora:
- Auslan Corpus
Abstract
A basic signed language (SL) corpus is created through primary processing of video recordings using multi_media annotation software. Primary processing entails the tokenization and identification of SL units. For the purposes of linguistic research a corpus also needs secondary processing. Secondary processing entails appending tags for specific linguistic features to primary annotations. I draw on the experience from the Auslan corpus project to describe how primary and secondary processing can be used in corpus-based SL research. In particular, I show how the tier structure of ELAN can be used to tag SL units in a variety of ways, and how this information can be used to glean new information from the corpus which can then be added as new annotations to the corpus. Value-adding by principled and systematic primary and secondary processing of digital recordings is thus not only essential for corpus creation ('machine-readability'), it also enables further enriching of the corpus so that even more value can be extracted. I conclude by discussing the implications for annotation software and standardized annotation schemas used in the creation of SL corpora.Document Download
Paper PDF BibTeX File + Abstract
Cite as
Citation in ACL Citation Format
Trevor Johnston. 2010. Adding value to, and extracting of value from, a signed language corpus through secondary processing: implications for annotation schemas and corpus creation. In Proceedings of the LREC2010 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, pages 137–142, Valletta, Malta. European Language Resources Association (ELRA).BibTeX Export
@inproceedings{johnston:10002:sign-lang:lrec, author = {Johnston, Trevor}, title = {Adding value to, and extracting of value from, a signed language corpus through secondary processing: implications for annotation schemas and corpus creation}, pages = {137--142}, editor = {Dreuw, Philippe and Efthimiou, Eleni and Hanke, Thomas and Johnston, Trevor and Mart{\'i}nez Ruiz, Gregorio and Schembri, Adam}, booktitle = {Proceedings of the {LREC2010} 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies}, maintitle = {7th International Conference on Language Resources and Evaluation ({LREC} 2010)}, publisher = {{European Language Resources Association (ELRA)}}, address = {Valletta, Malta}, day = {22--23}, month = may, year = {2010}, language = {english}, url = {https://www.sign-lang.uni-hamburg.de/lrec/pub/10002.pdf} }