sign-lang@LREC Anthology

Assisting Corpus Annotation: Automatic BIO-Tagging of Clause-Like Units in Polish Sign Language. A Pilot Study on Corpus Data

Mostowski, Piotr ORCID button Mostowski, Piotr | Kuder, Anna ORCID button Kuder, Anna | Wójcicka, Joanna ORCID button Wójcicka, Joanna


Volume:
Proceedings of the LREC2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion
Venue:
Palma, Mallorca, Spain
Date:
16 May 2026
Pages:
351–360
Publisher:
European Language Resources Association (ELRA)
Licence:
CC BY-NC 4.0
sign-lang ID:
26053
ISBN:
978-2-493814-82-1

Abstract

The creation of large-scale sign language corpora is often bottlenecked by the labour-intensive process of multi-layered annotation that requires manual analysis. One of the annotation steps is the challenging and time-consuming task of segmenting continuous signing into clause-like-units (CLUs). In this paper, we propose an automated segmentation framework for Polish Sign Language (PJM) designed to support manual annotation. To detect sentence boundaries, we adapt the Multi-Stage Temporal Convolutional Network (MS-TCN) architecture, enhanced with a Channel Attention mechanism, to effectively fuse multimodal skeleton features (hands, body, and face) extracted via MediaPipe. We evaluate the model on a diverse subset of the PJM Corpus (40 video files, 25 signers), containing nearly 16,000 manually annotated clauses prior to the start of this study. The proposed method achieves a Segmental F1-score of 75.43% at IoU = 0.10 and 57.52% at IoU = 0.50, demonstrating a strong capability in localising sentence boundaries. Furthermore, ablation studies reveal that fusing manual kinematics with non-manual prosodic cues (face) yields a significant performance gain (+13.6 pp) over unimodal baselines, empirically confirming the linguistic necessity of incorporating both manual and non-manual articulators in the process of sentence delimitation. The solution offers a viable means for reducing CLU annotation time by automatically generating high-quality clause boundary proposals.

Document Download

Paper PDF BibTeX File+ Abstract

Cite as

Citation in ACL Citation Format

Piotr Mostowski, Anna Kuder, Joanna Wójcicka. 2026. Assisting Corpus Annotation: Automatic BIO-Tagging of Clause-Like Units in Polish Sign Language. A Pilot Study on Corpus Data. In Proceedings of the LREC2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion, pages 351–360, Palma, Mallorca, Spain. European Language Resources Association (ELRA).

BibTeX Export

@inproceedings{mostowski:26053:sign-lang:lrec,
  author    = {Mostowski, Piotr and Kuder, Anna and W{\'o}jcicka, Joanna},
  title     = {Assisting Corpus Annotation: Automatic {BIO-Tagging} of Clause-Like Units in {Polish} {Sign} {Language}. A Pilot Study on Corpus Data},
  pages     = {351--360},
  editor    = {Efthimiou, Eleni and Fotinea, Stavroula-Evita and Hanke, Thomas and Hochgesang, Julie A. and Mesch, Johanna and Schulder, Marc},
  booktitle = {Proceedings of the {LREC2026} 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion},
  maintitle = {15th International Conference on Language Resources and Evaluation ({LREC} 2026)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Palma, Mallorca, Spain},
  day       = {16},
  month     = may,
  year      = {2026},
  isbn      = {978-2-493814-82-1},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/26053.html}
}
Something missing or wrong?