A Critical Study of Automatic Evaluation in Sign Language Translation

Yazdani, Shakib | Hamidullah, Yasser | España-Bonet, Cristina | Avramidis, Eleftherios | van Genabith, Josef

Volume:: Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026)
Venue:: Palma, Mallorca, Spain
Date:: 11 to 16 May 2026
Pages:: 9535–9548
Publisher:: European Language Resources Association (ELRA)
Licence:: CC BY-NC 4.0
DOI:: 10.63317/4n2sooe4fb2i
ISBN:: 978-2-493814-49-4

Abstract

Automatic evaluation metrics are crucial for advancing sign language translation (SLT). Current SLT evaluation metrics, such as BLEU and ROUGE, are only text-based, and it remains unclear to what extent text-based metrics can reliably capture the quality of SLT outputs. To address this gap, we investigate the limitations of text-based SLT evaluation metrics by analyzing six metrics, including BLEU, chrF, and ROUGE, as well as BLEURT on the one hand, and large language model (LLM)-based evaluators such as G-Eval and GEMBA zero-shot direct assessment on the other hand. Specifically, we assess the consistency and robustness of these metrics under three controlled conditions: paraphrasing, hallucinations in model outputs, and variations in sentence length. Our analysis highlights the limitations of lexical overlap metrics and demonstrates that while LLM-based evaluators better capture semantic equivalence often missed by conventional metrics, they can also exhibit bias toward LLM-paraphrased translations. Moreover, although all metrics are able to detect hallucinations, BLEU tends to be overly sensitive, whereas BLEURT and LLM-based evaluators are comparatively lenient toward subtle cases. This motivates the need for multimodal evaluation frameworks that extend beyond text-based metrics to enable a more holistic assessment of SLT outputs.

Document Download

Paper PDF BibTeX File + Abstract

Cite as

Citation in ACL Citation Format

Shakib Yazdani, Yasser Hamidullah, Cristina España-Bonet, Eleftherios Avramidis, Josef van Genabith. 2026. A Critical Study of Automatic Evaluation in Sign Language Translation. In Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026), pages 9535–9548, Palma, Mallorca, Spain. European Language Resources Association (ELRA).

BibTeX Export

@inproceedings{yazdani-etal-2026-critical:lrec,
  author    = {Yazdani, Shakib and Hamidullah, Yasser and Espa{\~n}a-Bonet, Cristina and Avramidis, Eleftherios and van Genabith, Josef},
  title     = {A Critical Study of Automatic Evaluation in Sign Language Translation},
  pages     = {9535--9548},
  editor    = {Piperidis, Stelios and Bel, N{\'u}ria and van den Heuvel, Henk and Ide, Nancy and Krek, Simon and Toral, Antonio},
  booktitle = {15th International Conference on Language Resources and Evaluation ({LREC} 2026)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Palma, Mallorca, Spain},
  day       = {11--16},
  month     = may,
  year      = {2026},
  isbn      = {978-2-493814-49-4},
  language  = {english},
  url       = {https://lrec.elra.info/lrec2026-main-749},
  doi       = {10.63317/4n2sooe4fb2i}
}