sign-lang@LREC Anthology

Open access to sign language corpora

Crasborn, Onno ORCID button Crasborn, Onno


Volume:
Proceedings of the LREC2008 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora
Venue:
Marrakech, Morocco
Date:
1 June 2008
Pages:
33–38
Publisher:
European Language Resources Association (ELRA)
License:
CC BY-NC
sign-lang ID:
08037

Content Categories

Languages:
Sign Language of the Netherlands
Corpora:
Corpus NGT

Abstract

One of the ongoing developments on internet is the increasing attention for open content: data of all kinds, whether text, images or video, are made publicly available. While there may be restrictions on the type of use that s allowed, selling content and strictly protecting it under copyright laws appears not desirable necessary for some types of content. This development is sometimes characterised as a change from copyright to ‘copyleft’: rather than stating that “all rights are prohibited”, people are encouraged to use materials for their own benefit. This presentations sketches this development and explores how it can apply to sign language corpora. As a case study, the Corpus NGT project is characterised, which publishes a large systematic collection of sign language data online. A total of 100 signers is being recorded, leading to over 75 hours of material in 2,000 video segments. The wish to publish this material not only for research purposes (cf. the Dutch Science Foundation’s funding) stems from its large possible value for various parties in the Netherlands: deaf signers themselves, second language learners of sign language, interpreting students, etc.
One of the problems in publishing sign language data online is privacy protection. As sign language movies inevitable contain visual information about the identity of the signer, together with the actual content of the language production signers reveal more of themselves than uni-modal speech or text corpora. In the Corpus NGT, we try to protect the privacy of the informants in several ways: we urge people to not reveal too much personal information about themselves or about others in their stories and discussions, we limit the amount of metadata that we publish online (leaving out many of the standard fields from the IMDI metadata standard), and nowhere mention or refer to the name of the signers.
The way we aim to protect the use of the material is by publishing all materials under a Creative Commons license. Creative Commons is an international organisation that was set up especially as a bridge between national copyright laws and open content material on internet. Of the different types of licenses that are available, we chose to apply the ‘BY-NC- SA’ license. This license states that people may re-use the material provided they refer to the authors, that no commercial use be made, and that (modifications of) the material are distributed under the same conditions. The Creative Commons licenses are attractive because they are made available in various forms: a plain language statement (as in the previous sentence), a formal legal text, and a machine-readable version for use by software. The plain language version is attached to every movie in the Corpus NGT by a short text preceding and following every movie file, thus allowing relatively easy replacement should future changes in policy require so.
Finally, a few ethical questions are raised in relation to publishing sign language materials as open access data: although the permission for open access publication is requested of the signers in the corpus, to what extent can they foresee the consequences at that point in time? Will future technologies allow easy face recognition on the basis of movies and obliterate the privacy protection measures that have been taken? What will the (normative) effect of publishing signing of a group of 100 signers from a small community be? There is a clear risk in the publication of sign language data without an answer to these questions. The solution taken in the Corpus NGT project is to invest substantial time and energy in publicity within the deaf community, to explain the goal and nature of the corpus and to encourage use by deaf people.

Document Download

Paper PDF Poster BibTeX File+ Abstract

BibTeX Export

@inproceedings{crasborn:08037:sign-lang:lrec,
  author    = {Crasborn, Onno},
  title     = {Open access to sign language corpora},
  pages     = {33--38},
  editor    = {Crasborn, Onno and Efthimiou, Eleni and Hanke, Thomas and Thoutenhoofd, Ernst D. and Zwitserlood, Inge},
  booktitle = {Proceedings of the {LREC2008} 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora},
  maintitle = {6th International Conference on Language Resources and Evaluation ({LREC} 2008)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Marrakech, Morocco},
  day       = {1},
  month     = jun,
  year      = {2008},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/08037.pdf}
}
Something missing or wrong?