sign-lang@LREC Anthology

Annotation and Maintenance of the Greek Sign Language Corpus (GSLC)

Efthimiou, Eleni ORCID button Efthimiou, Eleni | Fotinea, Stavroula-Evita ORCID button Fotinea, Stavroula-Evita

Proceedings of the LREC2008 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora
Marrakech, Morocco
1 June 2008
European Language Resources Association (ELRA)
sign-lang ID:

Content Categories

Greek Sign Language


This paper presents the design and development of a representative language corpus for the Greek Sign Language (GSL). Focus is put on the annotation methodology adopted to provide for linguistic information and annotated corpus exploitation for the extraction of a linguistic model intended to support HCI applications based on sign recognition.
The existence of an annotated corpus is a prerequisite for the creation of linguistic resources and for the development of NLP applications for any natural language articulated either orally or through signing. In the case of a sign language corpus, annotation performed on video sequences, is intended to support exploitation of linguistic information conveyed through various combinations of spatial-temporal parameters around the signer’s body.
The Greek Sign Language Corpus (GSLC) is been developed in the framework of the national project DIANOEMA (GSRT, M3.3, id 35) that aims at optical analysis and recognition of both static and dynamic signs, incorporating a GSL linguistic model in controlling robot motion. Since no previous GSL corpus is available to meet the requirements of multipurpose use in an HCI environment, the design of GSLC has taken into account annotation requirements as well as linguistic adequacy controls to ensure both corpus-based linguistic analysis and corpus re-usability. Linguistic analysis is a sufficient component for the development of NLP tools that, in the case of signed languages, support deaf accessibility to IT content and services. To effectively support this kind of language intensive operations, linguistic analysis has to derive from safe language data and also provide for an amount of linguistic phenomena, which allow for an adequate description of the language structure. In this context, safe data are defined as data commonly accepted by a specific language community. The design of GSLC content has made a distinction between three parts on the basis of the articulation units to be considered in respect to both linguistic analysis and the sign recognition process.
The first part comprises a list of lemmata which are representative of the use of handshapes as a primary sign formation component. This part of the corpus is developed on the basis of measurements of handshape frequency of use in sign morpheme formation but it has also taken into account the complete set of sign formation parameters. In this sense, in order to provide data for all sign articulation features of GSL, the corpus also includes characteristic lemmata with respect to all manual and non-manual features of the language. The second part of GSLC is composed of sets of controlled utterances, which form paradigms capable to expose the mechanisms GSL uses to expresses specific core grammar phenomena. The grammar coverage that corresponds to this part of the corpus is representative enough to allow for a formal description of the main structural-semantic mechanisms of the language. Finally, the third part of GSLC contains free narration sequences, which are intended to provide data of spontaneous language production and be used for machine learning purposes as regards sign recognition. With respect to data collection, all parts of the corpus have been performed by native signers under controlled conditions that guarantee absence of language interference from the part of the spoken language of the signers’ environment. Finally, quality control mechanisms have been applied to ensure data integrity.
In the framework of the current research target, annotation on the GSLC involves, on the one hand, descriptions of the phonological structure of morphemes and, on the other hand, sentence level markers. Sign phonology involves manual and non-manual features of sign formation. For the description of the phonological composition of sign morphemes the HamNoSys coding set is being used along with GSL specific feature coding. Sentence level annotation, except for sentence boundaries, involves phrase boundary marking and grammar information marking related to multi-layer indicators, as is the case of e.g. topicalisation, nominal phrase formation, temporal indicators and sentential negation. Sentence level annotation makes use of the ELAN annotator. Annotation integrity is subject to quality controls that involve both peer and external review by expert annotators.
The GSLC current implementation has foreseen extensibility on all content levels as well as on annotation features, thus, allowing for corpus re-usability in GSL research and HCI applications beyond the scope of a specific research project.
Indicative bibliography
Bowden, R., Windridge, D., Kadir, T., Zisserman, A. & Brady, M. (2004). «A Linguistic Feature Vector for the Visual Interpretation of Sign Language», In Tomas Pajdla, Jiri Matas (Eds), Proc. 8th European Conference on Computer Vision, ECCV04. LNCS3022, Springer-Verlag, Volume 1, pp391- 401.
Bellugi, U. & Fischer, S. (1972). «A comparison of Sign language and spoken language: rate and grammatical mechanisms», Cognition: International Journal of Cognitive Psychology, 1, 173-200.
Efthimiou, E., Sapountzaki, G., Karpouzis, C. & Fotinea, S-E. (2004). «Developing an e-Learning platform for the Greek Sign Language». Lecture Notes in Computer Science 3118: 1107-1113. Springer.
Efthimiou, E., Fotinea, S-E. & Sapountzaki, G. (2006). «Processing linguistic data for GSL structure representation»,Proc. of the Workshop on the Representation and Processing of Sign Languages: Lexicographic matters and didactic scenarios, Satellite Workshop to LREC-2006 Conference, May 28, pp.49-54.
ELAN annotator, Max Planck Institute for Psycholinguistics, available at:
Fotinea, S-E., Efthimiou, E., Karpouzis, K. & Caridakis, G. (2005). “Dynamic GSL synthesis to support access to e-content”, Proc. of the 3rd International Conference on Universal Access in Human-Computer Interaction (UAHCI 2005), 22-27 July 2005, Las Vegas, Nevada, USA.
HamNoSys Sign Language Notation System:
Karpouzis, K. Caridakis, G., Fotinea, S-E. & Efthimiou, E. (2005). “Educational Resources and Implementation of a Greek Sign Language Synthesis Architecture”, Computers and Education International Journal, Elsevier, in print, electronically available since Sept 05.
Kraiss, K.-F. (Ed.), (2006). Advanced Man-Machine Interaction - Fundamentals and Implementation. Series: Signals and Communication Technology, Springer.
Stokoe, W. 1978. Sign Language Structure (revised ed.). Silver Spring, MD: Linstok.

Document Download

Paper PDF Poster BibTeX File+ Abstract

BibTeX Export

  author    = {Efthimiou, Eleni and Fotinea, Stavroula-Evita},
  title     = {Annotation and Maintenance of the {Greek} {Sign} {Language} Corpus ({GSLC})},
  pages     = {58--63},
  editor    = {Crasborn, Onno and Efthimiou, Eleni and Hanke, Thomas and Thoutenhoofd, Ernst D. and Zwitserlood, Inge},
  booktitle = {Proceedings of the {LREC2008} 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora},
  maintitle = {6th International Conference on Language Resources and Evaluation ({LREC} 2008)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Marrakech, Morocco},
  day       = {1},
  month     = jun,
  year      = {2008},
  language  = {english},
  url       = {}
Something missing or wrong?