The data is available for download in several formats. All files are encoded in UTF-8.
Are you a data scientist working with sign languages for the first time? Please take the time to familiarise yourself with the topic. There are many pitfalls to be aware of. We suggest reading the paper Fox, N., Woll, B. & Cormier, K. Best practices for sign language technology research. Univ Access Inf Soc (2023).
Note that some fields might contain commas, and are surrounded by quote chars where needed.
The possible values for the “confidence” column are as follow:
This format allows you to easily import the new languages into the NLTK Wordnet library, and allows you to use NLTK's usual functions with the new languages.
NLTK does not allow custom synsets, so this format is missing the synsets we created and the signs linked to them.
As per NLTK's requirements, languages are identified by their ISO 639-3 language codes, so, for example, German Sign Language is referred to by the code gsg
, rather than its common acronym DGS.
For each language we provide three different formats for textually representing signs as wordnet lemmas: By the type ID used in our resource, by their gloss/keywords, or by their video URL. Note that the video URL format only contains entries for which such a URL is available.
Lang. | ISO | Lemma is Type ID | Lemma is Gloss/keyword | Lemma is Video URL |
---|---|---|---|---|
BSL | bfi |
sign_wordnet_video_bfi.tab | sign_wordnet_gloss_bfi.tab | sign_wordnet_video_bfi.tab |
DGS | gsg |
sign_wordnet_video_gsg.tab | sign_wordnet_gloss_gsg.tab | sign_wordnet_video_gsg.tab |
DSGS | sgg |
sign_wordnet_video_sgg.tab | sign_wordnet_gloss_sgg.tab | sign_wordnet_video_sgg.tab |
GSL | gss |
sign_wordnet_video_gss.tab | sign_wordnet_gloss_gss.tab | sign_wordnet_video_gss.tab |
LSF | fsl |
sign_wordnet_video_fsl.tab | sign_wordnet_gloss_fsl.tab | sign_wordnet_video_fsl.tab |
NGT | dse |
sign_wordnet_video_dse.tab | sign_wordnet_gloss_dse.tab | sign_wordnet_video_dse.tab |
PJM | pso |
sign_wordnet_video_pso.tab | sign_wordnet_gloss_pso.tab | sign_wordnet_video_pso.tab |
STS | swl |
sign_wordnet_video_swl.tab | sign_wordnet_gloss_swl.tab | sign_wordnet_video_swl.tab |
from nltk.corpus import wordnet as wn
with open("sign_wordnet_video_dse.tab",mode="r",encoding="utf-8") as f:
wn.custom_lemmas(f,"dse")
wn.synset_from_pos_and_offset("n",2129165).lemma_names("dse")
>>>> ['https://signbank.cls.ru.nl/dictionary/protected_media/glossvideo/NGT/LE/LEEUW-B-22.mp4', 'https://signbank.cls.ru.nl/dictionary/protected_media/glossvideo/NGT/LE/LEEUW-A-1759.mp4']
from nltk.corpus import wordnet as wn
with open("sign_wordnet_gloss_dse.tab",mode="r",encoding="utf-8") as f:
wn.custom_lemmas(f,"dse")
wn.synset_from_pos_and_offset("n",2129165).lemma_names("dse")
>>>> ['LEEUW-B', 'LION-B', 'LEEUW-A', 'LION-A']
sense=wn.synsets("LION-B",lang="dse")[0]
sense.offset()
>>>> 2129165
sense.pos()
>>>> n
sense.definition()
>>>> large gregarious predatory feline of Africa and India having a tawny coat with a shaggy mane in the male