sign-lang@LREC Anthology

How Much Data Is Enough Data? A New Motion Capture Corpus for Probabilistic Sign Language Generation

Klezovich, Anna ORCID button Klezovich, Anna | Mesch, Johanna ORCID button Mesch, Johanna | Henter, Gustav Eje ORCID button Henter, Gustav Eje | Beskow, Jonas ORCID button Beskow, Jonas


Volume:
Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026)
Venue:
Palma, Mallorca, Spain
Date:
11 to 16 May 2026
Pages:
9549–9558
Publisher:
European Language Resources Association (ELRA)
Licence:
CC BY-NC 4.0
DOI:
10.63317/5pmyrs7f9o33
ISBN:
978-2-493814-49-4

Abstract

We present a new 4.1 hours long high-quality motion capture sign language dataset for Swedish Sign Language — STS Mocap v1. The dataset consists of high quality multimodal data: body tracked with markers, fingers tracked with Manus Quantum Metagloves, face tracked with iPhone LiveLink app in MetaHuman Animator mode, and corresponding textual sentence translation to spoken Swedish. With the help of this dataset, we show that four hours of motion capture data is enough for generative modeling of sign language conditioned on 2D pose. In comparison, training the same flow-matching model on only 30 minutes of this data, which is a common size for sign language motion capture datasets, shows a significant degradation in the quality of the synthesized data.

Document Download

Paper PDF BibTeX File+ Abstract

Cite as

Citation in ACL Citation Format

Anna Klezovich, Johanna Mesch, Gustav Eje Henter, Jonas Beskow. 2026. How Much Data Is Enough Data? A New Motion Capture Corpus for Probabilistic Sign Language Generation. In Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026), pages 9549–9558, Palma, Mallorca, Spain. European Language Resources Association (ELRA).

BibTeX Export

@inproceedings{klezovich-etal-2026-enough:lrec,
  author    = {Klezovich, Anna and Mesch, Johanna and Henter, Gustav Eje and Beskow, Jonas},
  title     = {How Much Data Is Enough Data? A New Motion Capture Corpus for Probabilistic Sign Language Generation},
  pages     = {9549--9558},
  editor    = {Piperidis, Stelios and Bel, N{\'u}ria and van den Heuvel, Henk and Ide, Nancy and Krek, Simon and Toral, Antonio},
  booktitle = {15th International Conference on Language Resources and Evaluation ({LREC} 2026)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Palma, Mallorca, Spain},
  day       = {11--16},
  month     = may,
  year      = {2026},
  isbn      = {978-2-493814-49-4},
  language  = {english},
  url       = {https://lrec.elra.info/lrec2026-main-750},
  doi       = {10.63317/5pmyrs7f9o33}
}
Something missing or wrong?