The Sign Language Dataset Compendium


The Sign Language Dataset Compendium

Welcome to the Sign Language Dataset Compendium, an overview of digital resources for signed languages suitable for research. The compendium covers both corpora and lexical resources. It also provides an overview of commonly used data collection tasks and in which corpora they were used. For those looking for datasets for a specific language, a language index is provided.

The compendium is also available as a downloadable PDF. For archival copies of all PDF releases, visit https://doi.org/10.25592/uhhfdm.12016.

Should you know of additional resources, know of information that is missing from an entry, spot inaccuracies or wish to provide us any other feedback please contact us at sldc@dgs-korpus.de

About

The information provided in the compendium is compiled from public resource documentation, research articles, inspection of public data and personal correspondence with resource creators. Each compendium entry consists of a free-form text description, a structured info table and a list of references. As we follow the terminology of each individual resource differences in terminology, such as different size indication (sign, token, type) or the use of deaf vs. Deaf, may occur. Where possible we use consistent terminology, enriched with comments if needed. All entries are interconnected, providing links between related resources, between languages and resources and between tasks and corpora. Resources can be filtered using keywords.

How to Cite

To credit the compendium, please cite the following paper:

Kopf, M., Schulder, M., & Hanke, T. (2022). The Sign Language Dataset Compendium: Creating an Overview of Digital Linguistic Resources. Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources, pp. 102–109.

BibTeX

@inproceedings{kopf:22025:sign-lang:lrec,
  author    = {Kopf, Maria and Schulder, Marc and Hanke, Thomas},
  title     = {The {Sign} {Language} {Dataset} {Compendium}: Creating an Overview of Digital Linguistic Resources},
  pages     = {102--109},
  editor    = {Efthimiou, Eleni and Fotinea, Stavroula-Evita and Hanke, Thomas and Hochgesang, Julie A. and Kristoffersen, Jette and Mesch, Johanna and Schulder, Marc},
  booktitle = {Proceedings of the {LREC2022} 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources},
  publisher = {European Language Resources Association (ELRA)},
  address   = {Marseille, France},
  year      = {2022},
  isbn      = {979-10-95546-86-3},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/22025.pdf}
}