sign-lang@LREC Anthology

iLex – A Database Tool for Integrating Sign Language Corpus Linguistics and Sign Language Lexicography

Hanke, Thomas ORCID button Hanke, Thomas | Storz, Jakob


Volume:
Proceedings of the LREC2008 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora
Venue:
Marrakech, Morocco
Date:
1 June 2008
Pages:
64–67
Publisher:
European Language Resources Association (ELRA)
License:
CC BY-NC
sign-lang ID:
08011

Content Categories

Editors:
iLex

Abstract

This poster presents iLex, a software tool targeted at both corpus linguistics and lexicography. It is now a shared belief in the LR community that lexicographic work on any language should be based on a corpus. Conversely, lemmatisation of a sign language corpus requires a lexicon to be built up in parallel.
For languages with a written form and orthography, lemmatisation is a more or less straight-forward process. For sign languages, however, type-token matching is a major task by itself. Glossing or form- based transcription, e.g. with HamNoSys, may be sufficient for small single-transcriber projects. Consistency, however, cannot be guaranteed over multiple transcribers, large quantities, or longer periods of time.
iLex is therefore designed as a relational database linking tokens with their types. That means that the transcription process does not consist of assigning text tags to time intervals of the source video, but of tagging intervals with a reference to a type. The database then allows the user to review all tokens of a type at any point of time in order to verify that the intended type-token pair really fits with the type’s definition and extension. Revisions of earlier decisions in the light of new data are as easy as dragging instances from one type to the other. Beyond the support in the initial type-token matching, iLex gives its users views onto the transcribed data orthogonal to the transcription itself, and thereby helps to improve transcription quality. With its ability to support users working on different projects in one database, iLex allows synergies between projects as each project immediately profits from data entered by others. The cost for these benefits is the necessity of a solid infrastructure: A database server needs to be installed, and ideally every user should have access to all videos, often requiring specialised video servers. For larger corpus projects, however, this should be taken for granted anyway. For data exchange with other research groups, iLex supports a number of file formats, such as ELAN, SignStream, and syncWRITER for transcription data and IMDI for metadata. While exporting data from iLex into these formats as well as a couple of presentation formats such as HTML with thumbnails is done with a simple menu command, importing data from other sources requires some additional steps to be done by the researcher. As other data formats consist of text tags only, some matching operations are necessary to convert from text to tokens. The newest release of iLex supports the user in this procedure: By learning a mapping from imported glosses to iLex types from user actions, it can partially automate future imports from the same source. In addition to data exchange with other transcription tools and export to presentation formats, iLex integrates with a number of tools for rapid production of sign language teaching materials and for virtual signing by means of avatars.
On the lexicography side, iLex can host all the data necessary for the production of dictionaries. With its scripting language support, iLex is able to almost completely automate the production of a variety of formats including print, DVD, online websites for computers, and online websites for iPods/iPhones.

Document Download

Paper PDF Poster BibTeX File+ Abstract

BibTeX Export

@inproceedings{hanke:08011:sign-lang:lrec,
  author    = {Hanke, Thomas and Storz, Jakob},
  title     = {{iLex} -- A Database Tool for Integrating Sign Language Corpus Linguistics and Sign Language Lexicography},
  pages     = {64--67},
  editor    = {Crasborn, Onno and Efthimiou, Eleni and Hanke, Thomas and Thoutenhoofd, Ernst D. and Zwitserlood, Inge},
  booktitle = {Proceedings of the {LREC2008} 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora},
  maintitle = {6th International Conference on Language Resources and Evaluation ({LREC} 2008)},
  publisher = {{European Language Resources Association (ELRA)}},
  address   = {Marrakech, Morocco},
  day       = {1},
  month     = jun,
  year      = {2008},
  language  = {english},
  url       = {https://www.sign-lang.uni-hamburg.de/lrec/pub/08011.pdf}
}
Something missing or wrong?