We report on a method used to develop a sizable parallel corpus of English and American Sign Language (ASL). The effort is part of the Gallaudet University Documentation of ASL (GUDA) project, which is currently coordinated by an interdisciplinary team from the Department of Linguistics and the Department of Interpretation and Translation at Gallaudet University. Creation of the parallel corpus makes use of the available SRT (SubRip Subtitle) files of ASL videos that have been interpreted into or from English, or captioned into English. The corpus allows for one-way searches based on the English translation or interpretation, which is useful for translators, interpreters, and those conducting comparative analyses. We conclude with a discussion of important considerations for this method of constructing a parallel corpus, as well as next steps that will help to refine the development and utility of this type of corpus.
Keywords
Language documentation and long-term accessibility for sign language data
Experiences in building sign language corpora
Experiences from linguistic research using corpora
Annotation and Visualization Tools
Use of (parallel) corpora and lexicons in translation studies and machine translation
Rafael O. Treviño, Julie A. Hochgesang, Emily P. Shaw, Nic Willow. 2020. One Side of the Coin: Development of an ASL-English Parallel Corpus by Leveraging SRT Files. In Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives, pages 224–230, Marseille, France. European Language Resources Association (ELRA).
BibTeX Export
@inproceedings{trevino:20038:sign-lang:lrec,
author = {Trevi{\~n}o, Rafael O. and Hochgesang, Julie A. and Shaw, Emily P. and Willow, Nic},
title = {One Side of the Coin: Development of an {ASL-English} Parallel Corpus by Leveraging {SRT} Files},
pages = {224--230},
editor = {Efthimiou, Eleni and Fotinea, Stavroula-Evita and Hanke, Thomas and Hochgesang, Julie A. and Kristoffersen, Jette and Mesch, Johanna},
booktitle = {Proceedings of the {LREC2020} 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives},
maintitle = {12th International Conference on Language Resources and Evaluation ({LREC} 2020)},
publisher = {{European Language Resources Association (ELRA)}},
address = {Marseille, France},
day = {16},
month = may,
year = {2020},
isbn = {979-10-95546-54-2},
language = {english},
url = {https://www.sign-lang.uni-hamburg.de/lrec/pub/20038.pdf}
}