Fine-tuning of Convolutional Neural Networks for the Recognition of Facial Expressions in Sign Language Video Samples

Deshpande, Neha | Nunnari, Fabrizio | Avramidis, Eleftherios

Volume:: Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives
Venue:: Marseille, France
Date:: 24 June 2022
Pages:: 29–38
Publisher:: European Language Resources Association (ELRA)
License:: CC BY-NC 4.0
ACL ID:: 2022.sltat-1.5
ISBN:: 979-10-95546-82-5

Content Categories

Projects:: EASIER, SocialWear
Languages:: German Sign Language
Corpora:: FePh

Abstract

In this paper, we investigate the capability of convolutional neural networks to recognize in sign language video frames the six basic Ekman facial expressions for 'fear', 'disgust', 'surprise', 'sadness', 'happiness', 'anger' along with the 'neutral' class. Given the limited amount of annotated facial expression data for the sign language domain, we started from a model pre-trained on general-purpose facial expression datasets and we applied various machine learning techniques such as fine-tuning, data augmentation, class balancing, as well as image preprocessing to reach a better accuracy. The models were evaluated using K-fold cross-validation to get more accurate conclusions. It is experimentally demonstrated that fine-tuning a pre-trained model along with data augmentation by horizontally flipping images and image normalization, helps in providing the best accuracy on the sign language dataset. The best setting achieves satisfactory classification accuracy, comparable to state-of-the-art systems in generic facial expression recognition. Experiments were performed using different combinations of the above-mentioned techniques based on two different architectures, namely MobileNet and EfficientNet, and is deemed that both architectures seem equally suitable for the purpose of fine-tuning, whereas class balancing is discouraged.

Document Download

Paper PDF BibTeX File + Abstract

Cite as

Citation in ACL Citation Format

Neha Deshpande, Fabrizio Nunnari, Eleftherios Avramidis. 2022. Fine-tuning of Convolutional Neural Networks for the Recognition of Facial Expressions in Sign Language Video Samples. In Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives, pages 29–38, Marseille, France. European Language Resources Association (ELRA).

BibTeX Export

@inproceedings{deshpande:70008:sltat:lrec,
  author    = {Deshpande, Neha and Nunnari, Fabrizio and Avramidis, Eleftherios},
  title     = {Fine-tuning of Convolutional Neural Networks for the Recognition of Facial Expressions in Sign Language Video Samples},
  pages     = {29--38},
  editor    = {Efthimiou, Eleni and Fotinea, Stavroula-Evita and Hanke, Thomas and McDonald, John C. and Shterionov, Dimitar and Wolfe, Rosalee},
  booktitle = {Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives},
  maintitle = {13th International Conference on Language Resources and Evaluation ({LREC} 2022)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Marseille, France},
  day       = {24},
  month     = jun,
  year      = {2022},
  isbn      = {979-10-95546-82-5},
  language  = {english},
  url       = {http://www.lrec-conf.org/proceedings/lrec2022/workshops/sltat/pdf/2022.sltat-1.5}
}