sign-lang@LREC Anthology

Dealing with Sign Language Morphemes in Statistical Machine Translation

Massó, Guillem | Badia, Toni


Volume:
Proceedings of the LREC2010 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies
Venue:
Valletta, Malta
Date:
22 and 23 May 2010
Pages:
154–157
Publisher:
European Language Resources Association (ELRA)
License:
CC BY-NC
sign-lang ID:
10004

Content Categories

Languages:
Catalan Sign Language

Abstract

The aim of this research is to establish the role of linguistic information in data-scarce statistical machine translation for sign languages using freely available tools. The main challenge in statistical machine translation is the scarcity of suitable data, and this problem becomes more pronounced in sign languages. The available corpora are small, usually not domain-specific, and their annotation conventions can vary considerably. Elaborating our own corpus is a very time-consuming task and the amount of data that we can obtain is even more reduced. Under these conditions, morpho-syntactic information helps to improve statistical machine translation results, but there are not linguistic processing tools for sign languages. We have managed to improve translations from Catalan to Catalan Sign Language by using factored models in an open source translation system with basic linguistic information such as the lemma or an annotation tier tag. Furthermore, this allows us to deal with sign language morphemes in a more systematic way.

Document Download

Paper PDF BibTeX File+ Abstract

BibTeX Export

@inproceedings{masso:10004:sign-lang:lrec,
  author    = {Mass{\'o}, Guillem and Badia, Toni},
  title     = {Dealing with Sign Language Morphemes in Statistical Machine Translation},
  pages     = {154--157},
  editor    = {Dreuw, Philippe and Efthimiou, Eleni and Hanke, Thomas and Johnston, Trevor and Mart{\'i}nez Ruiz, Gregorio and Schembri, Adam},
  booktitle = {Proceedings of the {LREC2010} 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies},
  maintitle = {7th International Conference on Language Resources and Evaluation ({LREC} 2010)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Valletta, Malta},
  day       = {22--23},
  month     = may,
  year      = {2010},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/10004.pdf}
}
Something missing or wrong?