Publishing DGS Corpus Data: Different Formats for Different Needs

Jahn, Elena | Konrad, Reiner | Langer, Gabriele | Wagner, Sven | Hanke, Thomas

Volume:: Proceedings of the LREC2018 8th Workshop on the Representation and Processing of Sign Languages: Involving the Language Community
Venue:: Miyazaki, Japan
Date:: 12 May 2018
Pages:: 83–90
Publisher:: European Language Resources Association (ELRA)
Licence:: CC BY-NC 4.0
sign-lang ID:: 18018
ISBN:: 979-10-95546-01-6

Content Categories

Projects:: DGS-Korpus project
Languages:: German Sign Language
Corpora:: DGS Corpus
Dictionaries:: DW-DGS
Lexical Databases:: DGS Corpus types list

Abstract

In 2010-2012, the DGS-Korpus project collected a large corpus of German Sign Language (DGS). Now, a substantial subset of the data is published, namely the Public DGS Corpus. We describe the considerations and decisions taken regarding what part of the data is to be made public, the necessary quality assurance measures to the data preparation as well as the formats of the published data. The corpus is published in three different ways in order to fulfil the needs of a variety of different users. First of all, the data is made available to the language community whose members allowed us to share their recorded language. In addition, we hope that a large number of non-scientific users with various backgrounds will find the data useful. Last but not least, we aim to make the data attractive for users with a scientific background and provide the possibility to conduct studies based on it, irrespective of whether they are familiar with DGS or not.

Keywords

Language documentation and long-term accessibility for sign language data

Document Download

Paper PDF Poster BibTeX File + Abstract

Cite as

Citation in ACL Citation Format

Elena Jahn, Reiner Konrad, Gabriele Langer, Sven Wagner, Thomas Hanke. 2018. Publishing DGS Corpus Data: Different Formats for Different Needs. In Proceedings of the LREC2018 8th Workshop on the Representation and Processing of Sign Languages: Involving the Language Community, pages 83–90, Miyazaki, Japan. European Language Resources Association (ELRA).

BibTeX Export

@inproceedings{jahn:18018:sign-lang:lrec,
  author    = {Jahn, Elena and Konrad, Reiner and Langer, Gabriele and Wagner, Sven and Hanke, Thomas},
  title     = {Publishing {DGS} {Corpus} Data: Different Formats for Different Needs},
  pages     = {83--90},
  editor    = {Bono, Mayumi and Efthimiou, Eleni and Fotinea, Stavroula-Evita and Hanke, Thomas and Hochgesang, Julie A. and Kristoffersen, Jette and Mesch, Johanna and Osugi, Yutaka},
  booktitle = {Proceedings of the {LREC2018} 8th Workshop on the Representation and Processing of Sign Languages: Involving the Language Community},
  maintitle = {11th International Conference on Language Resources and Evaluation ({LREC} 2018)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Miyazaki, Japan},
  day       = {12},
  month     = may,
  year      = {2018},
  isbn      = {979-10-95546-01-6},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/18018.html}
}