In 2010-2012, the DGS-Korpus project collected a large corpus of German Sign Language (DGS). Now, a substantial subset of the data is published, namely the Public DGS Corpus. We describe the considerations and decisions taken regarding what part of the data is to be made public, the necessary quality assurance measures to the data preparation as well as the formats of the published data. The corpus is published in three different ways in order to fulfil the needs of a variety of different users. First of all, the data is made available to the language community whose members allowed us to share their recorded language. In addition, we hope that a large number of non-scientific users with various backgrounds will find the data useful. Last but not least, we aim to make the data attractive for users with a scientific background and provide the possibility to conduct studies based on it, irrespective of whether they are familiar with DGS or not.
Keywords
Language documentation and long-term accessibility for sign language data
@inproceedings{jahn:18018:sign-lang:lrec,
author = {Jahn, Elena and Konrad, Reiner and Langer, Gabriele and Wagner, Sven and Hanke, Thomas},
title = {Publishing {DGS} {Corpus} Data: Different Formats for Different Needs},
pages = {83--90},
editor = {Bono, Mayumi and Efthimiou, Eleni and Fotinea, Stavroula-Evita and Hanke, Thomas and Hochgesang, Julie A. and Kristoffersen, Jette and Mesch, Johanna and Osugi, Yutaka},
booktitle = {Proceedings of the {LREC2018} 8th Workshop on the Representation and Processing of Sign Languages: Involving the Language Community},
maintitle = {11th International Conference on Language Resources and Evaluation ({LREC} 2018)},
publisher = {{European Language Resources Association (ELRA)}},
address = {Miyazaki, Japan},
day = {12},
month = may,
year = {2018},
isbn = {979-10-95546-01-6},
language = {english},
url = {https://www.sign-lang.uni-hamburg.de/lrec/pub/18018.pdf}
}