"A Sacred Bird Called the Phoenix". Auditing the most-used Parallel Corpus for German Sign Language Recognition and Translation
Czehmann, Vera
| Yazdani, Shakib | Hamidullah, Yasser | Nunnari, Fabrizio
| Avramidis, Eleftherios 
- Volume:
- Proceedings of the LREC2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion
- Venue:
- Palma, Mallorca, Spain
- Date:
- 16 May 2026
- Pages:
- 80–92
- Publisher:
- European Language Resources Association (ELRA)
- Licence:
- CC BY-NC 4.0
- sign-lang ID:
- 26064
- ISBN:
- 978-2-493814-82-1
Abstract
This paper presents an empirical audit of the widely used RWTH-PHOENIX-2014T corpus, examining its suitability as a benchmark for sign language recognition and translation. Through human annotation of the training set and extensive sign-to-text back translation of the test set, we provide detailed statistics that indicate substantial quality issues, including information loss and lexical errors. Automatic scores comparing human sign-to-text back translations to the original speech transcribed references are remarkably low, suggesting strong translationese effects and substantial paraphrasing, revealing limitations of lexical metrics in adequately scoring translation quality. Replacing the original speech-transcribed references with human sign-to-text back translations while scoring existing sign language translation systems reveals the lack of robustness of system evaluation with lexical metrics against this test set. Our findings highlight risks associated with relying on this corpus for model evaluation and call for more rigorous, linguistically grounded evaluation practices in sign language technology research. The back-translated test set and error annotations are made publicly available.Document Download
Paper PDF BibTeX File + Abstract
Cite as
Citation in ACL Citation Format
Vera Czehmann, Shakib Yazdani, Yasser Hamidullah, Fabrizio Nunnari, Eleftherios Avramidis. 2026. "A Sacred Bird Called the Phoenix". Auditing the most-used Parallel Corpus for German Sign Language Recognition and Translation. In Proceedings of the LREC2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion, pages 80–92, Palma, Mallorca, Spain. European Language Resources Association (ELRA).BibTeX Export
@inproceedings{czehmann:26064:sign-lang:lrec,
author = {Czehmann, Vera and Yazdani, Shakib and Hamidullah, Yasser and Nunnari, Fabrizio and Avramidis, Eleftherios},
title = {"A Sacred Bird Called the Phoenix". Auditing the most-used Parallel Corpus for {German} {Sign} {Language} Recognition and Translation},
pages = {80--92},
editor = {Efthimiou, Eleni and Fotinea, Stavroula-Evita and Hanke, Thomas and Hochgesang, Julie A. and Mesch, Johanna and Schulder, Marc},
booktitle = {Proceedings of the {LREC2026} 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion},
maintitle = {15th International Conference on Language Resources and Evaluation ({LREC} 2026)},
publisher = {{European Language Resources Association (ELRA)}},
address = {Palma, Mallorca, Spain},
day = {16},
month = may,
year = {2026},
isbn = {978-2-493814-82-1},
language = {english},
url = {https://www.sign-lang.uni-hamburg.de/lrec/pub/26064.html}
}