Trevor Johnston, University of Newcastle, Newcastle, Australia

The lexical database of AUSLAN (Australian Sign Language)

1.0 Introduction

2.0 Description of the database

2.1 Structure of the database (relational or flat)

2.2 Number and type of fields

2.3 Types of information registered

2.3.1 Overview of windows

2.3.2 The phonology window

2.3.3 The core meanings window

2.3.4 The English search window

2.3.5 The lexical links window

2.3.6 The usage and grammar window

2.3.7 The semantic domains window

2.4 Visual data

2.5 Notation and transcription methods used

3.0 Data exchange: Compatibility with other databases

4.0 The future: Compatible databases or a "universal" database?

Appendix 1: Table of fields in Auslan lexical database

1.0 Introduction

The lexical database of Australian Sign Language (henceforth Auslan) was begun in 1984. Over a fifteen year period, it has grown from a relatively small "database" of some hundreds of signs stored as paragraphs and then as tabular records in a word processing program (the earliest versions of Microsoft Word), to a relational database in FileMaker Pro 4.0 with over 7,000 records (each having hundreds of fields).

In effect, there are now three lexical databases of Auslan. The first is the source database of some 6,600 signs created for the Auslan dictionary project. Data was exported from this to create the Auslan dictionary-"Signs of Australia"-in book form (Johnston 1998). The second database is a restricted subset of some 4,000 signs from the first. It was used as the data set for the CD-ROM "Signs of Australia” (Johnston 1997, 1998). Professional programmers who produced the commercial CD-ROM gave this restricted database an additional video field to complement the line graphic of each sign record as well as designing a unique user interface to navigate through the database. In particular, the interface exploited the thousands of cross references that had been established between signs in the database (i.e., between numerous records and fields). The third Auslan lexical database has only recently been created (in 1999) by exporting from FoxPro 2.6 an updated and expanded form of the first database into a new FileMaker Pro 4.0 format. Not only does the new expanded database have more entries, it also has a new video field based on the CD-ROM data set.

Unless otherwise stated, the lexical database referred to in this paper is the full FoxPro database, as it existed in 1997 and early 1998, which served as the basis of the Auslan dictionaries.

A full description of a sign language lexical database must entail a discussion of the criteria according to which signs are selected for inclusion in the database and the status or ranking individual sign records are given within that database (e.g., as lexical signs, stems, variants, and so on). As mentioned above, I have produced two generalist dictionaries of Auslan in book format as well as a CD-ROM dictionary. These dictionaries have caused me to seriously rethink what linguists and sign lexicographers should and should not record in lexical databases (and, subsequently, dictionaries) and how this information should be recorded. These principles should accord with general lexicographic principles and be understandable and acceptable to any linguist and lexicographer. A background paper discussing this very issue was presented the Hamburg Workshop on Multimedia Dictionaries and at the Intersign Workshop on Sign Language Lexical Databases (Hamburg University). It is now in press and I refer the reader to this paper.1

2.0 Description of the database

The Auslan lexical database is a FoxPro (version 2.6) document. It has approximately 6,600 records

2.1 Structure of the database (relational or flat)

Though FoxPro is a relational database program, the Auslan database does not truly exploit this potential and thus is relatively "flat" and simple. There are, however, extensive cross references between most records in various fields. Indeed, virtually every record is referenced to at least one other record in at least one field, while hundreds of records are cross referenced to up to six other records in several fields.

2.2 Number and type of fields

There are 244 fields associated with each record, coding for information dealing with phonology, semantics, grammar, usage, iconicity, regional variation, notation, translation equivalents in English, and the educa-tional and religious background of users of the sign. Fields are boolean, textual, numeric or "general" (a FoxPro description of a field for graphics and other visual data) according to the type of information regis-tered. The fields found in the database are listed in Table 1 (see Appendix 1). The meaning of the field types is discussed in each of the following sections dealing with the various views onto the database.

2.3 Types of information registered

Given the number of fields in each record, six different portals or windows for entering and viewing data were designed to make the task manageable. The exact types of information stored and registered in the database is best dealt with by looking at each major window in turn and describing the number and kind of fields each displays.

2.3.1 Overview of windows

All of the windows, except for the one titled "English search", are designed around the field that registers the head of each record—a graphic (line drawing) representation of the sign itself. This field appears in each window on the database. The graphics or visual data field appears on the left of the window along with a text field that uniquely names and identifies the sign (the field is labeled "idgloss" in the database, for "I.D. gloss") and a text field that describes its form as notated in the Hamburg Notation System (HamNoSys) (called the "hns" field in the database). The identifying gloss is usually a single English word, often combined with distinguishing numbers and letters. For example, if two signs in the database are best referred to as "blue" then "blue.1" is the identifying gloss given to the first occurrence and "blue.2" is the one given to the second occurrence. If two signs are clearly related to each other as variants the stem or base form is identified with the letter "a" and the variant(s) with the letter "b" ("c", "d", etc.). Thus "before.1b" is a variant of "before.1a", while "before.2" is a separate and unrelated sign. If no numbers or letters are attached to an identifying gloss then it simply means that that word or gloss is not used to identify any other sign in the database. For example, "snow" is the English word used in the identifying gloss of one, and only one, sign.

Other related fields in the head of the record make explicit morphological information about the sign if that seems relevant or plausible (e.g., identifying the stem or base of the sign if the sign in question appears to be a modified form, and identifying the nature of the modification in terms of handshape, movement, etc.). These are coded in two text fields that parallel the "idgloss" and "hns" fields using notation and transcription conventions based on Johnston (1991). Naturally, each record in the database is numbered This number appears in each window. See Figure 1.

Figure 1 Template for all data entry and viewing windows in the Auslan lexical database.

It is an unexceptional and expected feature of the units of any language (signs or words) that they will often have more than one meaning. Databases that record meanings, just like dictionaries, need to identify the various senses of any identified sign or word in some principled hierarchy from, say, core to peripheral or metaphorical senses (see 2.3.3 "Core meanings window"). However, there comes a point where, according to a whole range of criteria, a separate identifiable sense of a sign or word is indicative of the existence of two separate signs or words that simply share the same phonological form (homophones). When this occurs, a given sign head (line graphic and notation) is entered as two (or more) separate records, each with a separate and sequential homophone number (1, 2, 3, etc.). Naturally, more often than not, the morphological information about the signs, if relevant, will be different.

2.3.2 The phonology window

Information is coded for the formational features of a sign (handshape, location, orientation, movement, non-manual features). In the Auslan lexical database, this information has been coded not only in a dedicated notation field but also in a series of boolean and numerical fields that code for the presence or absence of specific sign parameters. In other words, phonological information has been double-coded for some sign parameters.

This double-coding was done to aid database management and enquiry. Separate coding for particular pa-rameters was necessary for two reasons. First, sorting and search routines within FoxPro are fairly limited because of idiosyncrasies of the search rules themselves within text fields. For example, only characters at the end or beginning of a text field could be searched for, not intermediate characters. This naturally causes problems when searching for HamNoSys occurrences because the symbols specifying location and orienta-tion are always inside the notation string, never at the beginning or end. For similar reasons the sorting of signs based on phonological features (as coded in the notation) follows an inherent ASCII order which can-not be changed and/or can only operate on strings from beginning to end, never from any other point of departure. Second, there are limitations on the use of a unique purpose built font (such as HamNoSys) which has a potentially different inherent "sort order" to alphabetical and/or ASCII order. Therefore, sort order (alphabetical and/or ASCII) cannot be changed (though it can be reversed) to suit some sign language internal principle (e.g., handshape frequency, handshape markedness, locations high to low or left to right, and so on).

Consequently the Auslan lexical database has fields that separately code and tag for handshape, handedness (one, two, double), hand arrangement (symmetrical, alternating, parallel, or with the dominant hand moving only), and locations (primary and secondary). This allows for the searching, retrieval, cross-referencing and sorting of signs according to any of these criteria, singly or in combination (see Figure 2).

Figure 2 The phonology window

As can be seen from Figure 2, there are at least ten phonological fields. Some fields are boolean, such as those registering the presence or absence of a feature like "two-handedness" or "symmetry", and some are numerical, such as those encoding handshape and location (the number uniquely identifies the handshape or location variable).

2.3.3 The core meanings window

The two fundamental purposes of the any sign language lexical database are (i) the identification and recording of individual signs as forms and (ii) the description of the meaning of these forms within the signed language in question. Thus fields which register and code for meaning are a vital feature of these databases.

Most databases simply rely on glosses and a "notes" field to explain the meaning of a sign. This database, however, actually defines signs (admittedly using basic English, not Auslan itself). Moreover, the definition also entails identifying sign class membership (i.e., part of speech). For instance, the Auslan database has no less that 24 text fields that record sign meaning and an equivalent number of boolean fields registering the presence or absence of these meanings. The 24 text fields are divided into eight classes or "parts of speech": nominals, verbals, interrogatives, modifiers and linkers ("adverbials"), interactives ("interjections"), plus the additional classes of deictics and "general signs". See Johnston & Schembri (1999) for a definition of these terms. All the parts of speech have distinct fields for at least three related senses (nominals and verbals have up to five). The boolean fields make it possible to query the database as to the part of speech that any particular sign belongs to in Auslan and the number of distinct, but related senses, any sign may have (see Figure 3).

Figure 3 The core meanings window

The sense definitions are in simple basic English (see Appendix 1), modeled on the Cobuild English Learner"s restricted vocabulary. Single word English equivalents are only given if and when they are available in the translation language. Strictly speaking there is no "glossing" as such, at least with reference to meaning definition.

2.3.4 The English search window

Though glossing is not used as a technique for defining signs, it is used as part of a "bilingual" tool for rapidly narrowing down sign searches where the point of entry is a meaning as expressed by a particular English word. In some cases, such a search may yield only one match. In most other cases, however, the match is usually one to many.

As can be seen from Figure 4, there are twelve fields in the "English search" or "English gloss" category which tag each sign entry with at least one English word commonly associated with that sign. A sign may have up to twelve English words as tags or "glosses". This enables two things. First, successful searches and matches where English codes different parts of speech differently ("introduce" and "introduction") without there being any formal distinction in the Auslan sign. Second, successful searches and matches where there is a many to one relationship lexically (e.g., "wet, "damp", "soft" are all applicable to one sign or "before" which matches several signs).

Figure 4 The English search window

The "idgloss" is simply used to identify the sign uniquely in the database. It serves no other purpose and is not part of the sign definition.

2.3.5 The lexical links window

This window on the database is primarily concerned with the relationship of any given sign with other signs of the language in a system of meanings. The meaning of any lexical unit in a language (word or sign) is only partially captured by the attempt to state or explain its meaning either monolingually (using words or signs of the same language) or bilingually (using the words and equivalents of another language). The meaning (valeur) of any linguistic unit in a language is also a function of its place within the linguis-tic system of that language as a whole both paradigmatically and syntagmatically. It is also a function of its relationships of (near or actual) synonymy and antonymy in a semantic network as part of a paradigm. Con-sequently, the database is not just restricted to explaining the meaning of a sign using a definition. In addi-tion, the database has 14 fields that register types of semantic relationships. These are visible on the lexical links window (see Figure 5).

Figure 5 The lexical links window

There are potentially three possible near or actual synonyms and antonyms that can be cross-referenced with each individual sign, three possible cross-references to signs that can be compared or contrasted semantically with the current record, and five possible cross-references that can be made to potential variants forms of a sign. The latter is semantically revealing in signed languages because of the inherent meaningfulness of the sign parameters (handshape, movement, etc.) which are the very substance of different sign forms. As with all fields that cross reference to other sign records, these are text fields that contain the unique identifying gloss ("idgloss") of the cross-referenced sign.

2.3.6 The usage and grammar window

The Auslan database includes information about the "lexical status" of signs, sociolinguistic variables such as region (e.g., northern or southern dialect, state-based signs), religion, school and register (e.g., crude, technical). In all, there are 32 fields that code for the presence or absence of these variables.

Lexical status is discussed in some detail in Johnston & Schembri (1999) and need not be repeated here. Suffice it to say that the full Auslan database is a database of signs which are primarily lexemes. (The reduced data set for the Signs of Australia CD-ROM dictionary is of lexemes only.) Moreover, the class of lexemes has itself been coded and ranked according to type and "degree" of lexicalization. For example, with respect to type, lexemes are coded, where relevant, as compounds, blends, initializations, borrowings and so on. With respect to degree of lexicalization, there are eight checkboxes that code for a range from "high" lexicalization ("Auslan lexical sign") through "mid" lexicalization ("restricted lexical sign") to "low" lexicali-zation ("doubtful lexical status").

The database also includes some specific grammatical information about the Auslan signs. Naturally, sign class membership is relevant here (these fields are also visible on the "usage and grammar window" as they are on the "core meanings window"). It also includes information about the potential of a sign to exploit spatial modifications. There are five checkboxes that code for the inflecting potential of a sign in terms of directionality and location.

In addition, four checkboxes code for notional degree of iconicity attributable to each sign. In descending order from iconic to arbitrary these are: transparent, translucent, opaque and obscure.

All these field types can be seen in Figure 6.

Figure 6 The uasage and grammar window

2.3.7 The semantic domains window

The Auslan database includes information about semantics. This window simply tags each sign in the database according to the semantic areas it may be related to, where this is relevant and possible. Naturally, a semantic network like this is infinitely expandable and not every sign will be able to be categorized unless the categories are themselves so broad as to be meaningless or of little practical use.

Coding for semantic areas enables researchers to quickly isolate all signs relevant to major categories (e.g., colors, family relationships, weather, enumeration, and so on) that they may wish to compare cross-linguistically. All these field types can be seen in Figure 7.

Figure 7 The semantic domains window

2.4 Visual data

In the FoxPro Auslan lexical database, the head of each record is the graphics field that contains a line drawing of the sign for that record (in PICT format). This acts as a place marker and helps researchers visualize the entry sign and orientate themselves within the database. (Signs are organized visually, according to the phonological features of handshape, location, orientation etc., rather than alphabetically according to gloss.)

In the reduced form of the database that lies behind the Auslan CD-ROM, the individual line-drawing graphics have been replaced by digitized video clips of each sign. These video clips are stored separately in their own files and are retrieved instantly whenever a related record in the database is accessed. The first frame of the video clip acts as a place marker in the video field with the video only being played on request (in repeat, slow, or frame by frame modes).

If the head sign of any record is tagged as a variant or modified form of some other sign then a second graphics field is filled with an image of the "stem", "base" or "standard" sign it is deemed to be related to.

2.5 Notation and transcription methods used

Notation (the writing down of the form of individual signs, words, etc., of a language) and transcription (the writing down of phrases, sentences, and other stretches or fragments of the spoken or signed texts of a language) can be done using HamNoSys. As mentioned above in 2.3.1, this is the system used in the Auslan lexical database. Since the database is a repository of individual signs, what concerns us here most directly are notation systems rather than transcription systems, insofar as they can be distinguished and separated.

The use of HamNoSys for the notation of signs in the Auslan database is a consequence of the historical evolution of the database. When begun in the early 1980s, there was no possibility of storing sign language data using digital video technology and, of course, ordinary video technology was cumbersome and totally inefficient for rapid retrieval of individual signs. A notation system was needed to make it possible to record, in some way, the form rather than the "meaning" of each sign (as in a gloss). Various systems were tried for transcribing Auslan data but, by the late 1980s, after HamNoSys became available for Apple computers, the researcher settled on this system, partly because it was the only one available for Apple computers (which he used) and partly because it avoided using any alphabetic symbols whatsoever (as in Stokoe notation).

All signs are notated in HamNoSys in a dedicated text field. This was true even after databases could relatively easily accommodate simple visual data in graphics fields because it could still be argued that a line drawing or a photograph was an inadequate representation and needed to be complemented by a dedicated notation. However, over the period of time that the database has evolved, the importance of notation and transcription systems (and the fields that contain this information) has diminished somewhat.

The most obvious development in sign language lexical databases has been the dramatic increase in the storage capacity of the computers normally available to researchers and the development of digital video technology. This has made it possible to store in the database a video clip of each sign record. A notation is no longer needed in order to simply identify the sign form of any particular entry. Thus one central rationale for a notation system in a sign language database has disappeared completely.

The second development in database design (at least in the Auslan lexical database) has been the provision of separate fields for registering discrete formational features of each sign. This phonological specification can be done at three levels: (i) for a restricted set of features (e.g., only those likely to be useful in searches and sorts); (ii) for each and every feature for which a symbol exists in a notation system (such as HamNoSys); or (iii) for an even larger set of features than exist in any single notation system. Consequently the phonological coding in the database can be poorer than, exactly equivalent to, or even superior to, the information encoded in a dedicated sign notation string.

The Auslan FoxPro database coded for a restricted set of phonological features (see section 2.3.3 above). The information encoded in such a phonological "matrix" (where the notation is simply the selection of the presence or absence of a handshape, location, etc.) can be used as the basis for any kind of sorting of or enquiry into the database much more readily than a transcription or notations system itself (such as HamNoSys). Furthermore, a notation for any sign can, in principle, be generated from the matrix given a relatively simple set of rules of selection, realization and ordering of symbols. This is very straightforward if the phonological matrix has a one to one correspondence with the notation system (i.e., phonological fields match the value of all available notation symbols). (Specifying these rules may be relatively straightforward, but actually writing and implementing them within a computer program is, of course, usually beyond the computing and programming expertise of most sign language researchers, including the author of this paper.)

Nonetheless, there is still a place in sign language lexical databases for a dedicated notation field regardless of whether a notation is computer generated or individually and manually entered. A lexical database can be the source of data sets that match particular glosses to notation strings which could be exported and merged with other programs, especially computerized transcription programs that build automatic dictionaries (see section 4.0). Most importantly, these data sets can help guarantee the internal consistency of glossing, sign class allocation, and morpheme identification by the language researcher. In other words, the notation system has increasingly become an expression of a phonological theory of the language in the broadest sense. Its purpose is no longer narrowly phonetic (capturing the precise form of the token) since this can now be easily viewed on digital video. Its purpose is phonemic (describing the type) and, hence, analytic.

Finally, computer-based notation and transcription systems have also been invaluable resources for the production of written texts in the linguistic analysis of sign language data, even when accompanied by line drawings or photographs. However, even here, new multimedia technologies (CD-ROM, Internet) are making it increasingly possible to incorporate visual data into multimedia documents. Thus notation and transcription systems appear to be of on-going importance and relevance as tools of analysis, if increasingly less relevant in data presentation.

3.0 Data exchange: Compatibility with other databases

Commercially available relational databases (such as 4thDimension, FileMaker Pro, FoxPro, etc.) are perfectly adequate for the majority of storage and research needs at the lexical level, that is in lexical databases. Even if a database format becomes superseded and discontinued, as with FoxPro, most of the data in any of these types of relational databases can be easily exported into another, provided it is encoded in a form that facilitates this. If it is not, then this presents significant compatibility problems.

These problems manifest themselves on at least two levels of particular concern here—notation systems and relations within a database.

With respect to notation systems, the problem of compatibility of differing notation and transcription systems is rightly seen as a major problem. Information which has been painstakingly and laboriously coded within a dedicated sign language notation system is not easily exported into another database that uses a different notation system. In principle, automatic translation routines should be able to convert symbols from one system into the symbols of another. (Once again this may be outside the expertise of most sign linguists using these databases but it is nonetheless still straightforward.) However, problems arise with the over- and under-specification of features in differing systems. A conversion from one system to another can lead to a loss of information if the same differences are not discriminated in the second system as in the first. When reconverted to back into the first system (the so-called "round-trip fidelity problem") errors can become significant.

If a dedicated notation field is the only way in which phonological information about individual records is coded in a sign language lexical database then this does indeed raise serious concerns about the compatibility of databases. Any database that encodes sign parameters solely within a notation field will, of course, encounter problems in transferring that information into another notation or transcription system. There seems to be no simple solution to the problem, apart from all researchers agreeing to use one and only one notation system (the signed language equivalent of the IPA). This seems highly unlikely, if not premature (since phonemic and phonetic discrimination can only grow as more and more signed languages are studied and analyzed).

If, however, sign parameters are coded within a "phonological matrix" within a database (as a reduced set of features as in the Auslan lexical database, or as a comprehensive set of features as in its latest incarnation in FileMaker Pro) it is then a much more simple process to export the data from one signed language database into another (and back again). One could represent this is a notation system by using semi-automatic generation procedures if necessary, but unless the two notation system make exactly the same discriminations there will be information loss at some point. However, much more importantly, one could do the kind of comparative work between the two different data sets that motivates the desire for exportability and compatibility in the first instance. Even if there is no perfect match of categories in both signed languages or databases, it will still be possible to engage in a whole raft of comparative inquiries based on any number and combination of these features.

With respect to relations, one drawback of data importability and exportability is the current inability to transfer relations as well as the raw data itself. However, just as past problems of video compression and limits to information storage on CD-ROMs were solved (or at least vastly improved) after some sign language CD-ROM projects were already under way, similarly, provided researchers use commercially available, popular and widely used relational databases (such as FileMaker Pro), future import/export obstacles (including the relations themselves) should be overcome by the time they become a real necessity.

What is of greatest importance is guaranteeing that the kinds of fields and the way the data is encoded in each database at least facilitates comparison between databases and data sets. Preferably, database design should allow for the useful export and import of data. For example, at minimum, fields for formational features, dialect variants, semantic fields, basic grammatical class (e.g., nominal, verbal) and sign modify-ing class (e.g., directional, plain) should be available to facilitate cross linguistic studies. Despite there be-ing some disagreement in the literature, the most widely accepted models in sign phonology specify five parameters. There is also widespread agreement on the major categories within each parameter. Not surpris-ingly, signed languages appear to have differing phonemic repertoires (especially in handshapes, but also in locations etc.). But there is, nonetheless, significant overlap. On the whole, there is widespread agreement and any phonological matrix from one language would have a large degree of overlap with another signed language. It is in the other fields coding for other aspects of the sign (part of speech, inflection type, vari-ants, dialects) that there has been, and will continue to be, wide divergence.

At least by making explicit the categories used in the Auslan lexical database, and explaining their rationale, this description is a contribution to an attempt to "harmonize" future databases as much as is possible in this respect. Information I have coded for in my database may appear to be useful to other sign language researchers, just as a type of information coded for in their databases but currently not in the Auslan database may be something that I should give consideration to including.

4.0 The future: compatible databases or a "universal" database?

In section 2.3.6, it was mentioned that the Auslan lexical database was just that, a database of lexemes in the language, not a database of Auslan signs. It was research into the lexicon of Auslan that inspired the discrimination between lexemes and signs (or more simply "lexicalized signs" and "non-lexicalized signs") in the analysis of the language (Johnston & Schembri 1999).

Now that the database has evolved and questions regarding database compatibility have surfaced, it is possible that there may be a place in signed language research for databases of signs, not just lexemes. A database of signs can be used for language internal analysis of phonology, lexicon and grammar. More importantly, sign databases can be used for comparing patterns of lexicalization and grammaticization across signed languages and families of signed languages. Though, of course, the distinction between the signs and lexemes still needs to be made and coded in such databases, there is no good reason why a modern signed language database should not be expanded to include tens, if not hundreds, of thousands of records listing all potential sign forms in the language.

One observes, for instance, that the possible simultaneous combination of the sign parameters of handshape, location, orientation and movement can generate, at absolute maximum, only a couple of million possible sign forms. Even if non-manual features were included at the level of the lexicon (though I believe, over all, they should not) the number is simply very large (still in the millions), but certainly not innumerable or, for all practical purposes, infinite. On closer inspection, it appears that vast numbers of the potential sign forms represented by the theoretically possible combinations of sign parameters are simply never realized. In reality, they are impossible (e.g., it is a physical impossibility to achieve certain orientations at certain locations) or impractical (e.g., it is difficult or uncomfortable to perform the combination). They are not actually "well-formed" and only appear possible on paper, according to an abstract formula of potential parameter combinations. This reduces the number of possible sign forms dramatically from a couple of million to hundreds of thousands.

It might be objected that affixation and compounding within any sign language could be expected to compensate for this reduction, increasing the numbers of sign entries significantly once again. However, two facts lead us not to expect this to be the case. First, these processes are relatively rare: affixation is unknown or extremely rare in signed languages observed to date, including Auslan; and compounding, though frequent, is limited. Second, both processes are a feature of lexicalization (i.e., they are associated with lexemes) and not of potential sign forms as such. The lexicons of some signed languages, such as Auslan, seem to have been fairly well documented. For example, Johnston & Schembri (1999) suggest that the Ausland lexicon consists of thousands, not tens of thousands or even hundreds of thousands of lexemes. There appears not to be a large inventory of hitherto unrecognized lexemes in Auslan, let alone compounds or affixed forms.

Consequently, it may well be possible to list in a database all the potential sign forms available to a given signed language. In other words, a future signed language "lexical" database could operate as a vast template of potential sign forms with hundreds of thousands of pseudo-entries. When data justified the creation of a new sign entry, a potential and specific pre-existing slot (generated by the parameter formula for that record) would be activated and occupied by the attested form.

Ultimately, it may well be possible to "superimpose" sign language databases on top of each other in order to reveal shared and divergent patterns of "signification" and lexicalization. Indeed, given the apparent high degree of overlap of many of the sign parameters in terms of both form and meaning, one may even entertain the possibility of a "universal sign language database" in which each sign language was, in fact, simply a language-specific window on the universal database of forms. Of course, future sign language re-search may actually reveal that the assumption of a high degree of shared parameter forms with similar iconic motivations across signed languages is actually false. A failed attempt to build such a universal data-base of sign forms would be good empirical evidence for such divergence.

Regardless of whether sign language databases are lexeme-based or sign-based on the one hand, or language-specific or "universal" on the other, a second important aspect of future work with these databases involves their integration into other computer-based linguistic tools. For example, interlinear transcription programs and programs for the creation and analysis of language corpora. The immediate future task is to design a program that integrates these databases into transcription programs such as IT and Shoebox (used almost exclusively for annotating and transcribing spoken language texts) and/or SyncWriter and SignStream (designed and used for annotating and transcribing signed language texts in a multimedia format). The first type of program has the ability to build dictionaries of glossed and annotated forms. They can import data sets of lexical information from other programs. The second type is able to align multi-tiered transcriptions with digitized video, but currently lack dictionaries or the ability to make them.

As mentioned above (section 2.5), a lexical database can be the source of data sets that match particular glosses to notation strings which could be exported and merged with other programs, especially computerized transcription programs that build automatic dictionaries. Most importantly, these data sets can help guarantee the internal consistency of glossing, sign class allocation, and morpheme identification by the language researcher. The original lexical database can itself be adjusted in response to the transcription process. Already data sets can be exported from the Auslan lexical database and imported into IT. The data set greatly speeds up, if not semi-automates, the processes of text transcription. This indirect manual integration of a database with IT is tedious and fraught with the possibility or likelihood of error at each stage of importing and exporting, but at least it can be done. Unfortunately, it seems that even this relatively simple procedure it is not possible with sign language transcription programs, such as SyncWriter or SignStream.

Obviously, the immediate future task in this area is to streamline and integrate all capabilities into one program: a sign language annotation and transcription program that automatically builds dictionaries (data sets), aligns transcription to video, and which also can import and export data sets from signed language lexical databases.

Appendix 1: Table of fields in Ausland lexical database

  Field name Explanation
Field type
Character width Decimal points
  alternate alternating movement Boolean    
  angcongtf anglican congregation Boolean    
  animaltf animals & plants Boolean    
  ant1 antonym 1 Text
(= idgloss)
25  
  ant1tf   Boolean    
  ant2 antonym 2 Text
(= idgloss)
25  
  ant2tf   Boolean    
  ant3 antonym 3 Text
(= idgloss)
25  
  ant3tf   Boolean    
  artstf arts Boolean    
  aslloantf ASL loan sign Boolean    
  auslextf Auslan lexical sign Boolean    
  autostf cars & machines Boolean    
  begindirtf beginning directional Boolean    
  blend blend (= A + B) Text 50  
  blendtf   Boolean    
  bodyacttf action of the body Boolean    
  bodyloctf body locating Boolean    
  bodyparttf body parts Boolean    
  bslloantf BSL loan sign Boolean    
  catholictf catholic Boolean    
  cathschtf catholic school Boolean    
  cf1 compare with 1 Text
(= idgloss)
25  
  cf1tf   Boolean    
  cf2 compare with 2 Text
(= idgloss)
25  
  cf2tf   Boolean    
  cf3 compare with 3 Text
(= idgloss)
25  
  cf3tf   Boolean    
  citytf Cities, countries & continents Boolean    
  clothestf Clothing & accessories Boolean    
  colorstf Colours Boolean    
  comp compound Text 50  
  comptf   Boolean    
  cookingtf Cooking Boolean    
  daystf Days & months Boolean    
  deafness Deafness Boolean    
  deictic1 Pronouns & pointing signs meaning 1 Text
(= meaning)
254  
  deictic2 Pronouns & pointing signs meaning 2 Text
(= meaning)
254  
  deictic3 Pronouns & pointing signs meaning 3 Text
(= meaning)
254  
  deictic4 Pronouns & pointing signs meaning 4 Text
(= meaning)
254  
  deictictf   Boolean    
  dirtf directional Boolean    
  domhndsh domiant handshape Numeric 5 1
  domonly dominant hand only moves Boolean    
  doublehnd double handed sign Boolean    
  drinkstf Drinking & eating Boolean    
  educatetf Education Boolean    
  enddirtf end directional Boolean    
  english1 English Keyword 1 Text 25  
  english10 English Keyword 10 Text 25  
  english11 English Keyword 11 Text 25  
  english12 English Keyword 12 Text 25  
  english2 English Keyword 2 Text 25  
  english3 English Keyword 3 Text 25  
  english4 English Keyword 4 Text 25  
  english5 English Keyword 5 Text 25  
  english6 English Keyword 6 Text 25  
  english7 English Keyword 7 Text 25  
  english8 English Keyword 8 Text 25  
  english9 English Keyword 9 Text 25  
  engtf1   Boolean    
  engtf10   Boolean    
  engtf11   Boolean    
  engtf12   Boolean    
  engtf2   Boolean    
  engtf3   Boolean    
  engtf4   Boolean    
  engtf5   Boolean    
  engtf6   Boolean    
  engtf7   Boolean    
  engtf8   Boolean    
  engtf9   Boolean    
  familytf Family Boolean    
  feelingstf Feelings & emotions Boolean    
  foodstf Foods Boolean    
  furntf Furntiure & fixtures Boolean    
  genmean Meaning of a "general sign" Text
(= meaning)
254  
  gensign General sign Boolean    
  govtf Government & politics Boolean    
  groomtf Grooming Boolean    
  healthtf Health & medicine Boolean    
  hobbiestf Recreation Boolean    
  idgloss unique sign name Text
(= idgloss)
25  
  inittf Initialisation Boolean    
  interj1 Interjection/interactive sign meaning 1 Text
(= meaning)
254  
  interj1tf   Boolean    
  interj2 Interjection/interactive sign meaning 2 Text
(= meaning)
254  
  interj2tf   Boolean    
  interj3 Interjection/interactive sign meaning 3 Text
(= meaning)
254  
  interj3tf   Boolean    
  judgetf Judgements & attitudes Boolean    
  jwtf Jehovah"s Witnesses Boolean    
  law Law Boolean    
  lingacttf Actions using language Boolean    
  locdirtf Locations & directions Boolean    
  locprim primary location Numeric 5 0
  locsecond secondary location Numeric 5 0
  marginal marginal lexical sign Boolean    
  mathstf Arithmetic, maths & geometry Boolean    
  mattertf Materials Boolean    
  metalangtf Language about language Boolean    
  mindacttf Mind & thinking Boolean    
  moneytf Money Boolean    
  naturetf Geography & the natural world Boolean    
  nomlex1 Noun meaning 1 Text
(= meaning)
254  
  nomlex1tf   Boolean    
  nomlex2 Noun meaning 2 Text
(= meaning)
254  
  nomlex2tf   Boolean    
  nomlex3 Noun meaning 3 Text
(= meaning)
254  
  nomlex3tf   Boolean    
  nomlex4 Noun meaning 4 Text
(= meaning)
254  
  nomlex4tf   Boolean    
  nomlex5 Noun meaning 5 Text
(= meaning)
254  
  nomlex5tf   Boolean    
  nthtf Northern dialect Boolean    
  obscuretf obscure Boolean    
  oldentry popular explantion of visual etymology Text 254  
  onehand one handed sign Boolean    
  opaquetf opaque Boolean    
  ordertf Order & sequence Boolean    
  orienttf orientating Boolean    
  otherreltf other religions Boolean    
  para parallel movement Boolean    
  partlex1 adverbs & linkers meaning 1 Text
(= meaning)
254  
  partlex1tf   Boolean    
  partlex2 adverbs & linkers meaning 2 Text
(= meaning)
254  
  partlex2tf   Boolean    
  partlex3 adverbs & linkers meaning 3 Text
(= meaning)
254  
  partlex3tf   Boolean    
  peopletf People (descriptions) topic Boolean    
  propnametf Proper Name lexical status Boolean    
  qldtf queensland dialect Boolean    
  qualitytf Quality, kind & condition Boolean    
  quantitytf Quantity, size & rate Boolean    
  queries notes Text 254  
  questle2tf   Boolean    
  questlex question sign meaning 1 Text
(= meaning)
254  
  questlex2 question sign meaning 2 Text
(= meaning)
254  
  questlextf   Boolean    
  reglextf regional lexcial sign Boolean    
  religiontf Religion Boolean    
  restrict restricted lexical sign Boolean    
  roomstf Rooms Boolean    
  salutation Greetings and leave-takings Boolean    
  satf South Australian dialect Boolean    
  seasonstf Weather Boolean    
  sense major entry "homophones": 1, 2, etc. Numeric 2  
  senseacttf Senses Boolean    
  sextf Sex Boolean    
  shapestf Shapes & patterns Boolean    
  shoptf Shopping & business Boolean    
  sn sign number Numeric 6 1
  sportstf Sport Boolean    
  stateschtf state school Boolean    
  sthtf southern dialec Boolean    
  subhndsh subordinate handshape Numeric 5 1
  sym symmetrical movement Boolean    
  syn1 synonym 1 Text
(= unique sign name)
25  
  syn1tf   Boolean    
  syn2 synonym 2 Text
(= unique sign name)
25  
  syn2tf   Boolean    
  syn3 synonym 3 Text
(= unique sign name)
25  
  syn3tf   Boolean    
  tastf Tasmanina dialect Boolean    
  telecommun Electronics, telecommunications & computers Boolean    
  timetf Time Boolean    
  transltf translucent sign Boolean    
  transptf transparent sign Boolean    
  traveltf Travel & transport Boolean    
  twohand two-handed sign Boolean    
  utensilstf Gadgets, utensils & tools Boolean    
  verblex1 Verb & adjective meaning 1 Text
(= meaning)
254  
  verblex1tf   Boolean    
  verblex2 Verb & adjective meaning 2 Text
(= meaning)
254  
  verblex2tf   Boolean    
  verblex3 Verb & adjective meaning 3 Text
(= meaning)
254  
  verblex3tf   Boolean    
  verblex4 Verb & adjective meaning 4 Text
(= meaning)
254  
  verblex4tf   Boolean    
  verblex5 Verb & adjective meaning 5 Text
(= meaning)
254  
  verblex5tf   Boolean    
  victf Victorian dialect Boolean    
  watf West Australian dialect Boolean    
  worktf Work & employment Boolean    

Bibliographical References

Johnston, Trevor: Transcription and glossing of sign language texts: Examples from AUSLAN (Australian Sign Language). In: International Journal of Sign Linguistics. 1 2 (1991) - S. 3-28 (back to text)

Johnston, Trevor / Royal N.S.W. Institute for Deaf and Blind Children: Signs of Australia on CD-ROM : a dictionary of Auslan (Australian Sign Language). In: North Rocks, NSW : North Rocks Pr. 1997/1998 (Software) (back to text)

Johnston, Trevor: Signs of Australia : a new dictionary of Auslan (the sign language of the Australian deaf community). rev. ed. North Rocks, NSW : North Rocks Pr. 1998 - 603 S. (back to text)

Johnston, Trevor & Adam Schembri (in press) On Defining Lexeme in a Signed Language. Sign Language & Linguistics, 2:1 (1999) (back to text)


Posted: 9.12.99

List of workshop papers