iLex - A Tool for Sign Language Lexicography and Corpus Analysis

Sign languages are the preferred communications medium for most Deaf people around the world. Sign language uses a number of visually distinctively recognisable articulators (hands, facial expression, mouth, body) in parallel and fully exploits spatial and temporal relations to establish grammatical features.

It is therefore not surprising that sign language researchers had been among the first to integrate digital video into tools for corpus analysis and lexicographic work. Most of these tools, however, only served the immediate needs of their developers as they were small-scale by-products of sign language-related research projects. Only recently, the sign language research community has started to fully explore the field of cross-linguistic studies. New research questions were definitely beyond the scope of the "home-brewed" tools available.

Now that a number of LR tools have multimedia capabilities and offer far more analysis functions than available before, one way to go for sign language researchers is to make these tools usable by defining appropriate coding conventions.

Our approach, however, is somewhat different. While we want to make tools available in the LR community usable within our environment, we consider modality-specific requirements that important that we go the other way round: We continue to develop our own environment, opening it up to other tools by a step-wise transition to an open architecture model. As a first step, we have implemented XML import and export facilities for the appropriate components. The import functionality can handle a number of timing models that are in use in the sign language research community, with user-selectable strategies for resolving ambiguities where necessary.

Some of the sign language specific features of relevance here are:

Sign languages have no writing system. Most researchers therefore use glosses (spoken language words semantically overlapping with the sign to be identified, relying on the reader's knowledge of the target language) and/or phonetic transcription, and in doubt they refer to the original data, i.e. digital video, readily available. Notation systems used are either alphabetic and non-roman or non-alphabetic. In either case, the large number of parameters and the complexity to describe simultaneity and sequentiality require input support. The system described here supports input for HamNoSys (Hamburg Notation System for Sign Language, one of the systems in wider use) at different levels of expertise on the user's side (from inline syntax checking to a graphical point-and-click interface) as well as animation from the HamNoSys string input to be compared with the original data (currently under development). As HamNoSys is also used for the described of co-speech as well as emblematic gestures, this part of the system of the system is also in use outside the sign language research community.
Iconicity is one of the key features of sign language. This does not mean, however, that semantics of most signs can be derived from the form. Instead, it makes sense to work with a two-level description of lexical items: On the first level, we have the types, i.e. form-meaning pairs, as abstractions from the tokens in the data, very much comparable to lexical items in spoken language. On the second level, form-meaning pairs sharing the same underlying image are mapped onto one type. This level of analysis is very helpful in analysing the widespread productive use of signs. On both levels of description, form-related relations such as homophony as well as semantic relations can be maintained. (Due to deficiencies in the underlying notation system - mirroring the state of the art in that field - even homophony is only suggested by the system and can be overridden by the user.)
Typologically, the sign languages researched so far fall into the category of polycomponential languages. However, morphemes cannot only be uttered sequentially, as in spoken languages, but also co-temporally. It is therefore essential to handle multi-tier representations with lots of tiers per signer involved. Timing granularities needed range from above to below sign (word) level.

The system described reflects our experience from a number of empirically based lexicographic and corpus analysis (child language as well as longitudinal studies on adult signers) projects carried out over the last twenty years. Support for crosslinguistic research, especially in language mixing and switching situations, has only recently been added. First results, however, are very promising.