A Critical Study of Automatic Evaluation in Sign Language Translation
Yazdani, Shakib | Hamidullah, Yasser | España-Bonet, Cristina | Avramidis, Eleftherios
| van Genabith, Josef
- Volume:
- Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026)
- Venue:
- Palma, Mallorca, Spain
- Date:
- 11 to 16 May 2026
- Pages:
- 9535–9548
- Publisher:
- European Language Resources Association (ELRA)
- Licence:
- CC BY-NC 4.0
- DOI:
- 10.63317/4n2sooe4fb2i
- ISBN:
- 978-2-493814-49-4
Abstract
Automatic evaluation metrics are crucial for advancing sign language translation (SLT). Current SLT evaluation metrics, such as BLEU and ROUGE, are only text-based, and it remains unclear to what extent text-based metrics can reliably capture the quality of SLT outputs. To address this gap, we investigate the limitations of text-based SLT evaluation metrics by analyzing six metrics, including BLEU, chrF, and ROUGE, as well as BLEURT on the one hand, and large language model (LLM)-based evaluators such as G-Eval and GEMBA zero-shot direct assessment on the other hand. Specifically, we assess the consistency and robustness of these metrics under three controlled conditions: paraphrasing, hallucinations in model outputs, and variations in sentence length. Our analysis highlights the limitations of lexical overlap metrics and demonstrates that while LLM-based evaluators better capture semantic equivalence often missed by conventional metrics, they can also exhibit bias toward LLM-paraphrased translations. Moreover, although all metrics are able to detect hallucinations, BLEU tends to be overly sensitive, whereas BLEURT and LLM-based evaluators are comparatively lenient toward subtle cases. This motivates the need for multimodal evaluation frameworks that extend beyond text-based metrics to enable a more holistic assessment of SLT outputs.Document Download
Paper PDF BibTeX File + Abstract
Cite as
Citation in ACL Citation Format
Shakib Yazdani, Yasser Hamidullah, Cristina España-Bonet, Eleftherios Avramidis, Josef van Genabith. 2026. A Critical Study of Automatic Evaluation in Sign Language Translation. In Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026), pages 9535–9548, Palma, Mallorca, Spain. European Language Resources Association (ELRA).BibTeX Export
@inproceedings{yazdani-etal-2026-critical:lrec,
author = {Yazdani, Shakib and Hamidullah, Yasser and Espa{\~n}a-Bonet, Cristina and Avramidis, Eleftherios and van Genabith, Josef},
title = {A Critical Study of Automatic Evaluation in Sign Language Translation},
pages = {9535--9548},
editor = {Piperidis, Stelios and Bel, N{\'u}ria and van den Heuvel, Henk and Ide, Nancy and Krek, Simon and Toral, Antonio},
booktitle = {15th International Conference on Language Resources and Evaluation ({LREC} 2026)},
publisher = {{European Language Resources Association (ELRA)}},
address = {Palma, Mallorca, Spain},
day = {11--16},
month = may,
year = {2026},
isbn = {978-2-493814-49-4},
language = {english},
url = {https://lrec.elra.info/lrec2026-main-749},
doi = {10.63317/4n2sooe4fb2i}
}