In this research project, computer vision techniques for recognition and analysis of gestures and facial expressions from video will be developed and the techniques will be applied for processing of sign language. This is a collaborative project between four partners: Helsinki University of Technology, University of Art and Design, University of Jyväskylä, and the Finnish Association of the Deaf. It has several objectives of which four are presented in more detail in this poster. The first objective is to develop novel methods for content-based processing and analysis of sign language video recorded using a single camera. The PicSOM retrieval system framework developed by the Helsinki University of Technology regarding content-based analysis of multimedia data will be adapted to continuous signing to facilitate automatic and semi-automatic analysis of sign language videos. The second objective of the project is to develop a computer system which can both (i) automatically indicate meaningful signs and other gesture-like sequences from a video signal which contains natural sign language data, and (ii) disregard parts of the signal which do not count as such sequences. In other words, the goal is to develop an automatized mechanism which can identify sign and gesture boundaries and indicate, from the video, the sequences that correspond to signs and gestures. The system is not expected to be able to tell the meanings of these sequences. An automatic segmentation of recorded continuous-signing sign language is an important first step in the automatic processing of sign language videos and online applications. It is our hypothesis that the temporal boundaries of different sign gestures can be detected and signs and non-signs (intersign transitions, other movements) can be classified using a combination of a hand motion detector, still image multimodal analysis, facial expression analysis and and other non- manual signal recognition. The PicSOM system inherently supports such fusion of different features. The third objective is linked to generating an example-based corpus for FinSL. There exist increasing amounts of recorded video data of the language, but almost no means for utilizing it efficiently due to missing indexing and lack of methods for content-based access. The studied methods could facilitate a leap forward in founding the corpus. The fourth objective is a feasibility study for the implementation of mobile video access to sign language dictionaries and corpora. Currently an existing dictionary can be searched by giving a rough description of the location, motion and handform of the sign. The automatic content-based analysis methods could be applied to online mobile phone videos, thus enabling sign language access to dictionaries and corpora.
@inproceedings{koskela:08002:sign-lang:lrec,
author = {Koskela, Markus and Laaksonen, Jorma and Jantunen, Tommi and Takkinen, Ritva and Rain{\`o}, P{\"a}ivi and Raike, Antti},
title = {Content-Based Video Analysis and Access for {Finnish} {Sign} {Language} -- A Multidisciplinary Research Project},
pages = {101--104},
editor = {Crasborn, Onno and Efthimiou, Eleni and Hanke, Thomas and Thoutenhoofd, Ernst D. and Zwitserlood, Inge},
booktitle = {Proceedings of the {LREC2008} 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora},
maintitle = {6th International Conference on Language Resources and Evaluation ({LREC} 2008)},
publisher = {{European Language Resources Association (ELRA)}},
address = {Marrakech, Morocco},
day = {1},
month = jun,
year = {2008},
language = {english},
url = {https://www.sign-lang.uni-hamburg.de/lrec/pub/08002.pdf}
}