sign-lang@LREC Anthology

The Corpus NGT: an online corpus for professionals and laymen

Crasborn, Onno ORCID button Crasborn, Onno | Zwitserlood, Inge


Volume:
Proceedings of the LREC2008 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora
Venue:
Marrakech, Morocco
Date:
1 June 2008
Pages:
44–49
Publisher:
European Language Resources Association (ELRA)
License:
CC BY-NC
sign-lang ID:
08003

Content Categories

Languages:
Sign Language of the Netherlands
Corpora:
Corpus NGT

Abstract

The Corpus NGT is an ambitious effort to record and archive video data from Sign Language of the Netherlands (NGT), guaranteeing online access and long-term availability. In this presentation, we share our experiences in building this corpus, viz. preparing for comparable data, both elicited and (semi)spontaneous, the recording set-up and procedure, processing of the data, annotation, metadata, licenses and publishing.
Initially aiming to record 24 native signers using two variants of NGT, and providing annotations of a large amount of the data, the plan changed into recording many more signers (100) using all five reported variants of NGT. This much larger collection of data ensures a good sample of the current state of the language, and, since participants are from various ages, we can also include its older stages (facilitating the study of language change). The consequence is that there is less time for making annotations. However, it will be easier to add annotations later than to make new recordings that are comparable in every respect to the initial recordings.
The project strives towards a completely open access policy: not only the video data and annotations will be available to everyone, but also the workflows and manuals for tools that have been used. Use and reuse of the data are protected by Creative Commons licenses. For now, the corpus will be published by the Max Planck Institute for Psycholinguistics, as part of their growing set of language corpora. We follow their IMDI standard for creating metadata descriptions and corpus structuring. The extension of their annotation tool ELAN as well as the integration of ELAN and IMDI (the data and metadata domains) formed a substantial part of the project.
The Corpus NGT project is funded by the Dutch Science Foundation to facilitate linguistic research. However, since there is a dire need for NGT data among several groups of people, we now are happy to include everyone in our target audience. Other interested scientists may be psychologists, educators, and those involved in constructing (sign) dictionaries. Deaf and hearing professionals in deaf schools and in the Deaf community are interested, including teachers of NGT, developers of teaching materials, and interpreters. Many hearing learners of NGT will benefit from open access to a large set of data in their target language. Deaf people themselves may be interested in the discussion on deaf issues that forms part of every recording session.
Participants were recorded in pairs. They performed several language tasks (producing narratives, prompted discussions, but also non-elicited signing), resulting in ±1.5 hours of useable signed data per pair. Both upper body and a top view were recorded of each signer. In combination, these recordings approximate a three-dimensional view of the signing. For extra information of the facial expressions, MPEG-1 movies showing only the face are extracted from the recordings of the body (shot in full HD resolution).
Due to time and budget limitations, it was only possible to make crude gloss annotations in ELAN of a small subset of the data. In order to make as much of the data set accessible to a large audience, a voice-over done by interpreters is provided with most of the data.

Document Download

Paper PDF Poster BibTeX File+ Abstract

BibTeX Export

@inproceedings{crasborn:08003:sign-lang:lrec,
  author    = {Crasborn, Onno and Zwitserlood, Inge},
  title     = {The {Corpus} {NGT}: an online corpus for professionals and laymen},
  pages     = {44--49},
  editor    = {Crasborn, Onno and Efthimiou, Eleni and Hanke, Thomas and Thoutenhoofd, Ernst D. and Zwitserlood, Inge},
  booktitle = {Proceedings of the {LREC2008} 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora},
  maintitle = {6th International Conference on Language Resources and Evaluation ({LREC} 2008)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Marrakech, Morocco},
  day       = {1},
  month     = jun,
  year      = {2008},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/08003.pdf}
}
Something missing or wrong?