Machine Translation from Spoken Language to Sign Language using Pre-trained Language Model as Encoder

Miyazaki, Taro | Morita, Yusuke | Sano, Masanori

Volume:: Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives
Venue:: Marseille, France
Date:: 16 May 2020
Pages:: 139–144
Publisher:: European Language Resources Association (ELRA)
License:: CC BY-NC 4.0
sign-lang ID:: 20002
ACL ID:: 2020.signlang-1.23
ISBN:: 979-10-95546-54-2

Content Categories

Languages:: Japanese Sign Language
Corpora:: JSL News Corpus

Abstract

Sign language is the first language for those who were born deaf or lost their hearing in early childhood, so such individuals require services provided with sign language. To achieve flexible open-domain services with sign language, machine translations into sign language are needed. Machine translations generally require large-scale training corpora, but there are only small corpora for sign language. To overcome this data-shortage scenario, we developed a method that involves using a pre-trained language model of spoken language as the initial model of the encoder of the machine translation model. We evaluated our method by comparing it to baseline methods, including phrase-based machine translation, using only 130,000 phrase pairs of training data. Our method outperformed the baseline method, and we found that one of the reasons of translation error is from pointing, which is a special feature used in sign language. We also conducted trials to improve the translation quality for pointing. The results are somewhat disappointing, so we believe that there is still room for improving translation quality, especially for pointing.

Keywords

Machine / Deep Learning – How to get along with the size of sign language resources actually existing
Use of (parallel) corpora and lexicons in translation studies and machine translation

Document Download

Paper PDF BibTeX File + Abstract

Cite as

Citation in ACL Citation Format

Taro Miyazaki, Yusuke Morita, Masanori Sano. 2020. Machine Translation from Spoken Language to Sign Language using Pre-trained Language Model as Encoder. In Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives, pages 139–144, Marseille, France. European Language Resources Association (ELRA).

BibTeX Export

@inproceedings{miyazaki:20002:sign-lang:lrec,
  author    = {Miyazaki, Taro and Morita, Yusuke and Sano, Masanori},
  title     = {Machine Translation from Spoken Language to Sign Language using Pre-trained Language Model as Encoder},
  pages     = {139--144},
  editor    = {Efthimiou, Eleni and Fotinea, Stavroula-Evita and Hanke, Thomas and Hochgesang, Julie A. and Kristoffersen, Jette and Mesch, Johanna},
  booktitle = {Proceedings of the {LREC2020} 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives},
  maintitle = {12th International Conference on Language Resources and Evaluation ({LREC} 2020)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Marseille, France},
  day       = {16},
  month     = may,
  year      = {2020},
  isbn      = {979-10-95546-54-2},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/20002.pdf}
}