Diffusion Models for Sign Language Video Anonymization

Xia, Zhaoyang | Zhou, Yang | Han, Ligong | Neidle, Carol | Metaxas, Dimitris

Volume:: Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources
Venue:: Torino, Italy
Date:: 25 May 2024
Pages:: 119–131
Publisher:: ELRA Language Resources Association (ELRA) and the International Committee on Computational Linguistics (ICCL)
Licence:: CC BY-NC 4.0
sign-lang ID:: 24014
ACL ID:: 2024.signlang-1.44
ISBN:: 978-2-493814-30-2

Content Categories

Languages:: American Sign Language
Corpora:: ASLLRP

Abstract

Since American Sign Language (ASL) has no standard written form, Deaf signers frequently share videos in order to communicate in their native language. However, this does not preserve privacy. Since critical linguistic information is transmitted through facial expressions, the face cannot be obscured. While signers have expressed interest, for a variety of applications, in sign language video anonymization that would effectively preserve linguistic content, attempts to develop such technology have had limited success and generally require pose estimation that cannot be readily carried out in the wild. To address current limitations, our research introduces DiffSLVA, a novel methodology that uses pre-trained large-scale diffusion models for text-guided sign language video anonymization. We incorporate ControlNet, which leverages low-level image features such as HED (Holistically-Nested Edge Detection) edges, to circumvent the need for pose estimation. Additionally, we develop a specialized module to capture linguistically essential facial expressions. We then combine the above methods to achieve anonymization that preserves the essential linguistic content of the original signer. This innovative methodology makes possible, for the first time, sign language video anonymization that could be used for real-world applications, which would offer significant benefits to the Deaf and Hard-of-Hearing communities.

Video Presentation

Language:: English
Subtitle:: English

Document Download

Paper PDF Poster BibTeX File + Abstract

Cite as

Citation in ACL Citation Format

Zhaoyang Xia, Yang Zhou, Ligong Han, Carol Neidle, Dimitris Metaxas. 2024. Diffusion Models for Sign Language Video Anonymization. In Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources, pages 119–131, Torino, Italy. ELRA Language Resources Association (ELRA) and the International Committee on Computational Linguistics (ICCL).

BibTeX Export

@inproceedings{xia:24014:sign-lang:lrec,
  author    = {Xia, Zhaoyang and Zhou, Yang and Han, Ligong and Neidle, Carol and Metaxas, Dimitris},
  title     = {Diffusion Models for Sign Language Video Anonymization},
  pages     = {119--131},
  editor    = {Efthimiou, Eleni and Fotinea, Stavroula-Evita and Hanke, Thomas and Hochgesang, Julie A. and Mesch, Johanna and Schulder, Marc},
  booktitle = {Proceedings of the {LREC-COLING} 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources},
  maintitle = {2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation ({LREC-COLING} 2024)},
  publisher = {{ELRA Language Resources Association (ELRA) and the International Committee on Computational Linguistics (ICCL)}},
  address   = {Torino, Italy},
  day       = {25},
  month     = may,
  year      = {2024},
  isbn      = {978-2-493814-30-2},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/24014.html}
}