DGS Corpus Project – Development of a Corpus Based Electronic Dictionary German Sign Language / German

Prillwitz, Siegmund | Hanke, Thomas | König, Susanne | Konrad, Reiner | Langer, Gabriele | Schwarz, Arvid

Volume:: Proceedings of the LREC2008 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora
Venue:: Marrakech, Morocco
Date:: 1 June 2008
Pages:: 159–164
Publisher:: European Language Resources Association (ELRA)
Licence:: CC BY-NC 4.0
sign-lang ID:: 08018

Content Categories

Projects:: DGS-Korpus project
Languages:: German Sign Language
Corpora:: DGS Corpus
Dictionaries:: DW-DGS
Lexical Databases:: DGS Corpus types list

Abstract

The poster introduces a 15-year project accepted for funding by the Hamburg Academy of Sciences. The proposed project aims to combine the collection of a large corpus with the development and production of a comprehensive, corpus based electronic dictionary of German Sign Language (DGS).
To this aim, a corpus of approximately 350–400 hours from 250–300 informants will be collected in a variety of elicitation settings. This is, in size and scope, comparable to large spoken language corpora. The design allows the use of the corpus for various tasks. These are, amongst others: (i) the validation by corpus data of a basic vocabulary compiled from different published sources; (ii) research on DGS grammar based on detailed transcription data; (iii) identification of different meanings and collocations of a sign by appropriate contexts. Furthermore, the design anticipates a comparative sociolinguistic study comparable in kind and quality to Lucas et al. (2001) and Schembri/Johnston (2004). The corpus thus provides a starting point for research deep into the structure and lexicon of German Sign Language as well as into the visual-gestural mode of sign languages in general. Parts of the annotated corpus, i.e. transcription files with English translations, will be made available online to the international linguistic community.
The corpus data will undergo two stages of transcription. First, a basic transcription serves to segment utterances and to identify lexical items and thus provides a first access to the data. Second, approximately 50 % of the transcriptions will be transcribed again in more detail. This serves the purpose of clarifying grammatical questions for the dictionary grammar as well as dealing with lexicological and lexicographic issues. The annotation of the corpus will be closely intertwined with the requirements of lexical analysis. A high quality of transcription will be achieved through continuous verification by native signers. A relational database (iLex, cf. Hanke/Storz) supports this process, especially the consistency of type- token matching.
Lexical analysis and lexicographic decisions concerning for example lexical status, language change, and lemma selection will be continuously validated by a deaf focus group and a general voting web interface which will be open for all interested members of the deaf community.
The dictionary will be entirely based on the corpus with respect to the list of lemmas to be included but decidedly exceed a conglomeration of corpus references. Rather, we will systematically abstract from the references to obtain a generalized description of lexical items. Examples of sign uses will be taken directly from the corpus.
For cross-linguistic research and comparability of results across projects, we consider it essential to push standardisation or at least compatibility of annotation and transcription conventions. To reach this, we have arranged cooperations with some other national corpus projects and look forward to cooperate with more projects currently in preparation.
References
Lucas, Ceil / Bayley, Robert / Valli, Clayton (2001): Sociolinguistic Variation in American Sign Language. Washington, DC: Gallaudet Univ. Press.
Schembri, Adam / Johnston, Trevor (2004): Sociolinguistic variation in Auslan (Australian Sign Language). A research project in progress. In: Deaf Worlds 20 (1), 78-90.

Document Download

Paper PDF Poster BibTeX File + Abstract

Cite as

Citation in ACL Citation Format

Siegmund Prillwitz, Thomas Hanke, Susanne König, Reiner Konrad, Gabriele Langer, Arvid Schwarz. 2008. DGS Corpus Project – Development of a Corpus Based Electronic Dictionary German Sign Language / German. In Proceedings of the LREC2008 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora, pages 159–164, Marrakech, Morocco. European Language Resources Association (ELRA).

BibTeX Export

@inproceedings{prillwitz:08018:sign-lang:lrec,
  author    = {Prillwitz, Siegmund and Hanke, Thomas and K{\"o}nig, Susanne and Konrad, Reiner and Langer, Gabriele and Schwarz, Arvid},
  title     = {{DGS} {Corpus} Project -- Development of a Corpus Based Electronic Dictionary {German} {Sign} {Language} / {German}},
  pages     = {159--164},
  editor    = {Crasborn, Onno and Efthimiou, Eleni and Hanke, Thomas and Thoutenhoofd, Ernst D. and Zwitserlood, Inge},
  booktitle = {Proceedings of the {LREC2008} 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora},
  maintitle = {6th International Conference on Language Resources and Evaluation ({LREC} 2008)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Marrakech, Morocco},
  day       = {1},
  month     = jun,
  year      = {2008},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/08018.html}
}