sign-lang@LREC Anthology

An Arabic Sign Language Corpus for Instructional Language in School

Almohimeed, Abdulaziz | Wald, Mike | Damper, Robert


Volume:
Proceedings of the LREC2010 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies
Venue:
Valletta, Malta
Date:
22 and 23 May 2010
Pages:
7–10
Publisher:
European Language Resources Association (ELRA)
License:
CC BY-NC
sign-lang ID:
10025

Content Categories

Languages:
Arabic Sign Languages
Corpora:
Arabic Sign Language Corpus

Abstract

Machine translation (MT) technology has made significant progress over the last decade and now offers the potential for Arabic sign language (ArSL) signers to access text published in Arabic. The dominant model of MT is now corpus based. In this model, the accuracy of translation correlates directly with size and coverage of the corpus. The corpus is a collection of translation examples constructed from existing documents such as books and newspapers; however, no written system for sign language (SL) comparable to that used for natural language has yet been developed. Hence, no SL documents exist, complicating the procedure for constructing an SL corpus. In countries such as Ireland and Germany, a number of corpora have already been developed from scratch and used for MT. There is no ArSL corpus for MT, requiring the creation of a new ArSL corpus for language instruction. The goal of building this corpus is to develop an automatic translation system from Arabic text to ArSL.
This paper presents the ArSL corpus for instructional language constructed for use in schools, and the methodology used to create it. The corpus was collected at the College of Computer and Information Sciences at Imam Muhammad bin Saud University in Riyadh, Saudi Arabia. A group of interpreters and native signers with backgrounds in education were involved in this work.
The corpus was constructed by collecting instructional sentences used daily in schools for the deaf. The syntax and morphology of each sentence were then manually analysed. Each sentence was individually translated, recorded on video, and stored in MPEG format. The corpus contains video data from three native signers. The videos were then annotated using an ELAN annotation tool. The annotated video data contain isolated signs accompanied by detailed information, such as manual and non-manual features. The last procedure in constructing the corpus was to create a bilingual dictionary from the annotated videos.
The corpus comprises two main parts. The first part is the annotated video data, comprising isolated signs with detailed information, accompanied by manual and non-manual features. It also contains the Arabic translation script, including syntax and morphology details. The second part is the bilingual dictionary, delivered with the annotated videos.

Document Download

Paper PDF BibTeX File+ Abstract

BibTeX Export

@inproceedings{almohimeed:10025:sign-lang:lrec,
  author    = {Almohimeed, Abdulaziz and Wald, Mike and Damper, Robert},
  title     = {An {Arabic} {Sign} {Language} Corpus for Instructional Language in School},
  pages     = {7--10},
  editor    = {Dreuw, Philippe and Efthimiou, Eleni and Hanke, Thomas and Johnston, Trevor and Mart{\'i}nez Ruiz, Gregorio and Schembri, Adam},
  booktitle = {Proceedings of the {LREC2010} 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies},
  maintitle = {7th International Conference on Language Resources and Evaluation ({LREC} 2010)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Valletta, Malta},
  day       = {22--23},
  month     = may,
  year      = {2010},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/10025.pdf}
}
Something missing or wrong?