The Sign Language Dataset Compendium


Corpora

The compendium contains 42 linguistic corpora from around the world.

To be included, a corpus must contain (semi-)spontaneous signing, provide transcriptions or translations for at least some of its content, contain at least 10 hours of sign language recordings and fulfill the general curation criteria of the compendium. For languages for which none of their corpora meet the size requirement, corpora with at least 5 hours of recordings may still be included. Multilingual corpora are included irrespective of their size.