sign-lang@LREC Anthology

Introducing a Bangla Sentence–Gloss Pair Dataset for Bangla Sign Language Translation and Research

Saha, Neelavro | Shahriyar, Rafi | Roudra, Nafis Ashraf | Sakib, Saadman | Rasel, Annajiat Alim


Volume:
Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026)
Venue:
Palma, Mallorca, Spain
Date:
11 to 16 May 2026
Pages:
10457–10466
Publisher:
European Language Resources Association (ELRA)
Licence:
CC BY-NC 4.0
DOI:
10.63317/38qenrwzegr9
ISBN:
978-2-493814-49-4

Abstract

Bangla Sign Language (BdSL) translation represents a low-resource NLP task due to the lack of large-scale datasets that address sentence-level translation. Correspondingly, existing research in this field has been limited to word and alphabet level detection. In this work, we introduce Bangla-SGP, a novel parallel dataset consisting of 1,000 human-annotated sentence–gloss pairs which was augmented with around 3,000 synthetically generated pairs using syntactic and morphological rules through a rule-based Retrieval-Augmented Generation (RAG) pipeline. The gloss sequences of the spoken Bangla sentences are made up of individual glosses which are Bangla sign supported words and serve as an intermediate representation for a continuous sign. Our dataset consists of 1000 high quality Bangla sentences that are manually annotated into a gloss sequence by a professional signer. The augmentation process incorporates rule-based linguistic strategies and prompt engineering techniques that we have adopted by critically analyzing our human annotated sentence-gloss pairs and by working closely with our professional signer. Furthermore, we fine-tune several transformer-based models such as mBart50, Google mT5, GPT4.1-nano and evaluate their sentence-to-gloss translation performance using BLEU scores, based on these evaluation metrics we compare the model’s gloss-translation consistency across our dataset and the RWTH-PHOENIX-2014T benchmark.

Document Download

Paper PDF BibTeX File+ Abstract

Cite as

Citation in ACL Citation Format

Neelavro Saha, Rafi Shahriyar, Nafis Ashraf Roudra, Saadman Sakib, Annajiat Alim Rasel. 2026. Introducing a Bangla Sentence–Gloss Pair Dataset for Bangla Sign Language Translation and Research. In Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026), pages 10457–10466, Palma, Mallorca, Spain. European Language Resources Association (ELRA).

BibTeX Export

@inproceedings{saha-etal-2026-banglasl:lrec,
  author    = {Saha, Neelavro and Shahriyar, Rafi and Roudra, Nafis Ashraf and Sakib, Saadman and Rasel, Annajiat Alim},
  title     = {Introducing a {Bangla} {Sentence--Gloss} Pair Dataset for {Bangla} Sign Language Translation and Research},
  pages     = {10457--10466},
  editor    = {Piperidis, Stelios and Bel, N{\'u}ria and van den Heuvel, Henk and Ide, Nancy and Krek, Simon and Toral, Antonio},
  booktitle = {15th International Conference on Language Resources and Evaluation ({LREC} 2026)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Palma, Mallorca, Spain},
  day       = {11--16},
  month     = may,
  year      = {2026},
  isbn      = {978-2-493814-49-4},
  language  = {english},
  url       = {https://lrec.elra.info/lrec2026-main-820},
  doi       = {10.63317/38qenrwzegr9}
}
Something missing or wrong?