sign-lang@LREC Anthology

"A Sacred Bird Called the Phoenix". Auditing the most-used Parallel Corpus for German Sign Language Recognition and Translation

Czehmann, Vera ORCID button Czehmann, Vera | Yazdani, Shakib | Hamidullah, Yasser | Nunnari, Fabrizio ORCID button Nunnari, Fabrizio | Avramidis, Eleftherios ORCID button Avramidis, Eleftherios


Volume:
Proceedings of the LREC2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion
Venue:
Palma, Mallorca, Spain
Date:
16 May 2026
Pages:
80–92
Publisher:
European Language Resources Association (ELRA)
Licence:
CC BY-NC 4.0
sign-lang ID:
26064
ISBN:
978-2-493814-82-1

Abstract

This paper presents an empirical audit of the widely used RWTH-PHOENIX-2014T corpus, examining its suitability as a benchmark for sign language recognition and translation. Through human annotation of the training set and extensive sign-to-text back translation of the test set, we provide detailed statistics that indicate substantial quality issues, including information loss and lexical errors. Automatic scores comparing human sign-to-text back translations to the original speech transcribed references are remarkably low, suggesting strong translationese effects and substantial paraphrasing, revealing limitations of lexical metrics in adequately scoring translation quality. Replacing the original speech-transcribed references with human sign-to-text back translations while scoring existing sign language translation systems reveals the lack of robustness of system evaluation with lexical metrics against this test set. Our findings highlight risks associated with relying on this corpus for model evaluation and call for more rigorous, linguistically grounded evaluation practices in sign language technology research. The back-translated test set and error annotations are made publicly available.

Document Download

Paper PDF BibTeX File+ Abstract

Cite as

Citation in ACL Citation Format

Vera Czehmann, Shakib Yazdani, Yasser Hamidullah, Fabrizio Nunnari, Eleftherios Avramidis. 2026. "A Sacred Bird Called the Phoenix". Auditing the most-used Parallel Corpus for German Sign Language Recognition and Translation. In Proceedings of the LREC2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion, pages 80–92, Palma, Mallorca, Spain. European Language Resources Association (ELRA).

BibTeX Export

@inproceedings{czehmann:26064:sign-lang:lrec,
  author    = {Czehmann, Vera and Yazdani, Shakib and Hamidullah, Yasser and Nunnari, Fabrizio and Avramidis, Eleftherios},
  title     = {"A Sacred Bird Called the Phoenix". Auditing the most-used Parallel Corpus for {German} {Sign} {Language} Recognition and Translation},
  pages     = {80--92},
  editor    = {Efthimiou, Eleni and Fotinea, Stavroula-Evita and Hanke, Thomas and Hochgesang, Julie A. and Mesch, Johanna and Schulder, Marc},
  booktitle = {Proceedings of the {LREC2026} 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion},
  maintitle = {15th International Conference on Language Resources and Evaluation ({LREC} 2026)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Palma, Mallorca, Spain},
  day       = {16},
  month     = may,
  year      = {2026},
  isbn      = {978-2-493814-82-1},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/26064.html}
}
Something missing or wrong?