sign-lang@LREC Anthology

Synthetic Corpora: A Synergy of Linguistics and Computer Animation

Schnepp, Jerry | Wolfe, Rosalee ORCID button Wolfe, Rosalee | McDonald, John C.

Proceedings of the LREC2010 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies
Valletta, Malta
22 and 23 May 2010
European Language Resources Association (ELRA)
sign-lang ID:

Content Categories

American Sign Language


Synthetic corpora are computer representations of linguistic phenomena. They enable the creation of computer-generated animations depicting sign languages and are the complement of corpora containing videotaped exemplars.
Synthetic corpora have the potential to serve multiple disciplines. They can aid in the automatic recognition of sign, because they contain the geometric data required for intelligent visual detection algorithms. Synthetic corpora can also provide visual depictions of abstract representations and act as a verification tool for data integrity and hypothesis testing.
Because the signs are synthesized, not retrieved, they can be modified as they are formed. This provides the flexibility to generate an endless variety of utterances not possible with recordings, thus opening possibilities for automatic translation efforts. While representing sign for this purpose is still an open question, a synthetic corpus has the potential to serve in this capacity. The flexibility of synthetically-generated sign is also useful for the development of interpreter training software and self-directed learning tools for deaf children.
By necessity, linguistics and computer animation must play a role in the creation of such corpora, as any corpus will need to serve both disciplines. At first glance, the goals of these disciplines would appear to be at cross purposes. Linguistics researchers often use corpora to form hypotheses through queries on linguistic features. Thus the corpora must encode such general abstractions as handshape, position, motion, palm orientation and non-manual signals. In contrast, creating computer animations of sign requires voluminous and detailed data, as the resulting animations must be realistic enough to pass the scrutiny of fluent signers.
In actuality, the fields of linguistics and computer animation create a mutually beneficial synergy. Having the detailed precision required for animation can facilitate the exploration of subtle interactions among linguistic phenomena. Likewise, animators need an abstract representation to organize, combine, and synthesize complex animation data.
Regardless of the animation technique, linguistic knowledge is necessary to produce any synthetic corpus. Animators who hand-transcribe need to work closely with linguists, so that the gloss is tagged correctly. Linguistic information guides the transcription artist's efforts to produce a natural exemplar that encapsulates the essential motions of a sign. With motion capture, the role of linguistics is no less central. Motion capture equipment generates massive amounts of data that must be cleaned to remove extraneous noise. The linguistic attributes of a sign give the cleanup artists precisely what they need to process and extract the desired motion.
Our work thus far has focused on the creation of detailed and accurate animations of sign. The data that drive these animations have similarities to abstractions created by linguists. Thus, linguistic research guides the creation of a framework for automated sign synthesis.
This presentation will examine several linguistic processes and discuss an approach to their representation in a synthetic corpus, including co-occurrence in nonmanual signals. This will include a demonstration of computer-generated American Sign Language.

Document Download

Paper PDF BibTeX File+ Abstract

BibTeX Export

  author    = {Schnepp, Jerry and Wolfe, Rosalee and McDonald, John C.},
  title     = {Synthetic Corpora: A Synergy of Linguistics and Computer Animation},
  pages     = {217--220},
  editor    = {Dreuw, Philippe and Efthimiou, Eleni and Hanke, Thomas and Johnston, Trevor and Mart{\'i}nez Ruiz, Gregorio and Schembri, Adam},
  booktitle = {Proceedings of the {LREC2010} 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies},
  maintitle = {7th International Conference on Language Resources and Evaluation ({LREC} 2010)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Valletta, Malta},
  day       = {22--23},
  month     = may,
  year      = {2010},
  language  = {english},
  url       = {}
Something missing or wrong?