A Video-Based Reverse Dictionary for Sign Language Using Gesture Similarity

Orazumbekov, Batyrbek | Bayanov, Daniyal | Kaltay, Aruzhan | Sandygulova, Anara

Volume:: Proceedings of the LREC2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion
Venue:: Palma, Mallorca, Spain
Date:: 16 May 2026
Pages:: 398–407
Publisher:: European Language Resources Association (ELRA)
Licence:: CC BY-NC 4.0
sign-lang ID:: 26060
ISBN:: 978-2-493814-82-1

Abstract

Sign language recognition systems are usually modeled as classification systems that map gesture videos to pre-defined glosses. But these systems do not allow similarity searches, where a user can search for similar gestures without knowing the corresponding gloss. This paper presents a pose-based video-to-video search framework for isolated signs, which acts as a reverse gesture dictionary. The system employs keypoints on the skeletal structure instead of RGB images. Two architectures are proposed for modeling temporal information: an encoder with self-attention in a Transformer architecture and a Spatial-Temporal Graph Convolutional Network (ST-GCN). The embedding space is optimized using metric learning objectives, including supervised contrastive learning and ArcFace angular margin loss. The performance of the retrieval system is evaluated on the WLASL dataset using ranking metrics like Recall@K and mean Average Precision (mAP). Experiments reveal that the temporal modeling using the Transformer architecture is an improvement over the graph-based modeling approach in the low-shot learning scenario. The attention-based temporal pooling approach further enhances the ranking quality, with the best-performing model achieving an mAP of 0.237 on the WLASL validation set. Cross-dataset evaluation on a 226-label AUTSL dataset reveals non-trivial generalization performance on the unseen dataset, despite training only on the WLASL dataset.

Document Download

Paper PDF BibTeX File + Abstract

Cite as

Citation in ACL Citation Format

Batyrbek Orazumbekov, Daniyal Bayanov, Aruzhan Kaltay, Anara Sandygulova. 2026. A Video-Based Reverse Dictionary for Sign Language Using Gesture Similarity. In Proceedings of the LREC2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion, pages 398–407, Palma, Mallorca, Spain. European Language Resources Association (ELRA).

BibTeX Export

@inproceedings{orazumbekov:26060:sign-lang:lrec,
  author    = {Orazumbekov, Batyrbek and Bayanov, Daniyal and Kaltay, Aruzhan and Sandygulova, Anara},
  title     = {A Video-Based Reverse Dictionary for Sign Language Using Gesture Similarity},
  pages     = {398--407},
  editor    = {Efthimiou, Eleni and Fotinea, Stavroula-Evita and Hanke, Thomas and Hochgesang, Julie A. and Mesch, Johanna and Schulder, Marc},
  booktitle = {Proceedings of the {LREC2026} 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion},
  maintitle = {15th International Conference on Language Resources and Evaluation ({LREC} 2026)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Palma, Mallorca, Spain},
  day       = {16},
  month     = may,
  year      = {2026},
  isbn      = {978-2-493814-82-1},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/26060.html}
}