The Sign Language Dataset Compendium


More sources of information

Couldn't find what you were looking for? There are a number of other reports and repositories that might help you. Several of them we used ourselves when we compiled the Compendium.

Surveys

There are a number of manually curated surveys on or including sign language datasets. They summarise information either as freeform text or as a structured table. What datasets they cover and which information they provide for them depends on the purpose and research area for which they were written.

  1. Konrad (2012) provides a detailed tabular overview of 17 sign language corpora, identifying various linguistic properties of each corpus.
  2. The survey article by Schmaling (2012) provides a detailed overview of dictionaries for African sign languages. It focuses on print-media dictionaries, but also describes two resources providing video materials.
  3. The CLARIN Sign Language Resources page provides a list of corpora and lexical resources, both those hosted within the CLARIN infrastructure and outside of it. Apart from links and a brief description, they also provide information on size, annotations and licence where possible.
  4. The website Sign Language Processing by Moryossef and Goldberg (2021) provides an overview of the state of natural language processing for sign languages for computer scientists, including a discussion of relevant resources and a table of 47 datasets (as of March 2025) with information regarding their size, licence, primary reference and data location.
  5. The website of the African Sign Language Resource Center provides information on sign languages used in African countries. While some parts of the website are still empty (as of March 2025), it does contain profiles for 54 countries, offering general information on their deaf populations and used sign languages. In several cases, the profiles identify existing language resources, although not necessarily where to find them.
  6. Hartzell (2022) created an informal compilation of language resources for minority languages in Egypt, including eight resources for Egyptian Sign Language.

Repositories

Information on sign language datasets can also be found in a number of online archives and repositories. Some of these host the datasets themselves, while others are metadata repositories that describe the datasets and link to where they can be found, like the Compendium does.

  1. The Language Archive (TLA), hosted by the Max Planck Institute for Psycholinguistics in Nijmegen.
  2. Endangered Languages Archive (ELAR), run by the Berlin-Brandenburg Academy of Sciences and Humanities.
  3. Open Language Archives Community (OLAC)
  4. CLARIN Virtual Language Observatory (VLO)
  5. Meta-Share
  6. European Language Grid (ELG)
  7. LRE Map

References