Trevor Johnston, University of Newcastle, Newcastle, Australia

The lexical database of AUSLAN (Australian Sign Language)

1.0 Introduction
2.0 Description of the database

2.1 Structure of the database (relational or flat)

2.2 Number and type of fields

2.3 Types of information registered

2.3.1 Overview of windows

2.3.2 The phonology window

2.3.3 The core meanings window

2.3.4 The English search window

2.3.5 The lexical links window

2.3.6 The usage and grammar window

2.3.7 The semantic domains window

2.4 Visual data

2.5 Notation and transcription methods used

3.0 Data exchange: Compatibility with other databases

4.0 The future: Compatible databases or a "universal" database?

Appendix 1: Table of fields in Auslan lexical database

1.0 Introduction

The lexical database of Australian Sign Language (henceforth Auslan) was begun in 1984. Over a fifteen year period, it has grown from a relatively small "database" of some hundreds of signs stored as paragraphs and then as tabular records in a word processing program (the earliest versions of Microsoft Word), to a relational database in FileMaker Pro 4.0 with over 7,000 records (each having hundreds of fields).

In effect, there are now three lexical databases of Auslan. The first is the source database of some 6,600 signs created for the Auslan dictionary project. Data was exported from this to create the Auslan dictionary-"Signs of Australia"-in book form (Johnston 1998). The second database is a restricted subset of some 4,000 signs from the first. It was used as the data set for the CD-ROM "Signs of Australia” (Johnston 1997, 1998). Professional programmers who produced the commercial CD-ROM gave this restricted database an additional video field to complement the line graphic of each sign record as well as designing a unique user interface to navigate through the database. In particular, the interface exploited the thousands of cross references that had been established between signs in the database (i.e., between numerous records and fields). The third Auslan lexical database has only recently been created (in 1999) by exporting from FoxPro 2.6 an updated and expanded form of the first database into a new FileMaker Pro 4.0 format. Not only does the new expanded database have more entries, it also has a new video field based on the CD-ROM data set.

Unless otherwise stated, the lexical database referred to in this paper is the full FoxPro database, as it existed in 1997 and early 1998, which served as the basis of the Auslan dictionaries.

A full description of a sign language lexical database must entail a discussion of the criteria according to which signs are selected for inclusion in the database and the status or ranking individual sign records are given within that database (e.g., as lexical signs, stems, variants, and so on). As mentioned above, I have produced two generalist dictionaries of Auslan in book format as well as a CD-ROM dictionary. These dictionaries have caused me to seriously rethink what linguists and sign lexicographers should and should not record in lexical databases (and, subsequently, dictionaries) and how this information should be recorded. These principles should accord with general lexicographic principles and be understandable and acceptable to any linguist and lexicographer. A background paper discussing this very issue was presented the Hamburg Workshop on Multimedia Dictionaries and at the Intersign Workshop on Sign Language Lexical Databases (Hamburg University). It is now in press and I refer the reader to this paper.¹

2.0 Description of the database

The Auslan lexical database is a FoxPro (version 2.6) document. It has approximately 6,600 records

2.1 Structure of the database (relational or flat)

Though FoxPro is a relational database program, the Auslan database does not truly exploit this potential and thus is relatively "flat" and simple. There are, however, extensive cross references between most records in various fields. Indeed, virtually every record is referenced to at least one other record in at least one field, while hundreds of records are cross referenced to up to six other records in several fields.

2.2 Number and type of fields

There are 244 fields associated with each record, coding for information dealing with phonology, semantics, grammar, usage, iconicity, regional variation, notation, translation equivalents in English, and the educa-tional and religious background of users of the sign. Fields are boolean, textual, numeric or "general" (a FoxPro description of a field for graphics and other visual data) according to the type of information regis-tered. The fields found in the database are listed in Table 1 (see Appendix 1). The meaning of the field types is discussed in each of the following sections dealing with the various views onto the database.

2.3 Types of information registered

Given the number of fields in each record, six different portals or windows for entering and viewing data were designed to make the task manageable. The exact types of information stored and registered in the database is best dealt with by looking at each major window in turn and describing the number and kind of fields each displays.

2.3.1 Overview of windows

All of the windows, except for the one titled "English search", are designed around the field that registers the head of each record—a graphic (line drawing) representation of the sign itself. This field appears in each window on the database. The graphics or visual data field appears on the left of the window along with a text field that uniquely names and identifies the sign (the field is labeled "idgloss" in the database, for "I.D. gloss") and a text field that describes its form as notated in the Hamburg Notation System (HamNoSys) (called the "hns" field in the database). The identifying gloss is usually a single English word, often combined with distinguishing numbers and letters. For example, if two signs in the database are best referred to as "blue" then "blue.1" is the identifying gloss given to the first occurrence and "blue.2" is the one given to the second occurrence. If two signs are clearly related to each other as variants the stem or base form is identified with the letter "a" and the variant(s) with the letter "b" ("c", "d", etc.). Thus "before.1b" is a variant of "before.1a", while "before.2" is a separate and unrelated sign. If no numbers or letters are attached to an identifying gloss then it simply means that that word or gloss is not used to identify any other sign in the database. For example, "snow" is the English word used in the identifying gloss of one, and only one, sign.

Other related fields in the head of the record make explicit morphological information about the sign if that seems relevant or plausible (e.g., identifying the stem or base of the sign if the sign in question appears to be a modified form, and identifying the nature of the modification in terms of handshape, movement, etc.). These are coded in two text fields that parallel the "idgloss" and "hns" fields using notation and transcription conventions based on Johnston (1991). Naturally, each record in the database is numbered This number appears in each window. See Figure 1.

Figure 1 Template for all data entry and viewing windows in the Auslan lexical database.

It is an unexceptional and expected feature of the units of any language (signs or words) that they will often have more than one meaning. Databases that record meanings, just like dictionaries, need to identify the various senses of any identified sign or word in some principled hierarchy from, say, core to peripheral or metaphorical senses (see 2.3.3 "Core meanings window"). However, there comes a point where, according to a whole range of criteria, a separate identifiable sense of a sign or word is indicative of the existence of two separate signs or words that simply share the same phonological form (homophones). When this occurs, a given sign head (line graphic and notation) is entered as two (or more) separate records, each with a separate and sequential homophone number (1, 2, 3, etc.). Naturally, more often than not, the morphological information about the signs, if relevant, will be different.

2.3.2 The phonology window

Information is coded for the formational features of a sign (handshape, location, orientation, movement, non-manual features). In the Auslan lexical database, this information has been coded not only in a dedicated notation field but also in a series of boolean and numerical fields that code for the presence or absence of specific sign parameters. In other words, phonological information has been double-coded for some sign parameters.

This double-coding was done to aid database management and enquiry. Separate coding for particular pa-rameters was necessary for two reasons. First, sorting and search routines within FoxPro are fairly limited because of idiosyncrasies of the search rules themselves within text fields. For example, only characters at the end or beginning of a text field could be searched for, not intermediate characters. This naturally causes problems when searching for HamNoSys occurrences because the symbols specifying location and orienta-tion are always inside the notation string, never at the beginning or end. For similar reasons the sorting of signs based on phonological features (as coded in the notation) follows an inherent ASCII order which can-not be changed and/or can only operate on strings from beginning to end, never from any other point of departure. Second, there are limitations on the use of a unique purpose built font (such as HamNoSys) which has a potentially different inherent "sort order" to alphabetical and/or ASCII order. Therefore, sort order (alphabetical and/or ASCII) cannot be changed (though it can be reversed) to suit some sign language internal principle (e.g., handshape frequency, handshape markedness, locations high to low or left to right, and so on).

Consequently the Auslan lexical database has fields that separately code and tag for handshape, handedness (one, two, double), hand arrangement (symmetrical, alternating, parallel, or with the dominant hand moving only), and locations (primary and secondary). This allows for the searching, retrieval, cross-referencing and sorting of signs according to any of these criteria, singly or in combination (see Figure 2).

Figure 2 The phonology window

As can be seen from Figure 2, there are at least ten phonological fields. Some fields are boolean, such as those registering the presence or absence of a feature like "two-handedness" or "symmetry", and some are numerical, such as those encoding handshape and location (the number uniquely identifies the handshape or location variable).

2.3.3 The core meanings window

The two fundamental purposes of the any sign language lexical database are (i) the identification and recording of individual signs as forms and (ii) the description of the meaning of these forms within the signed language in question. Thus fields which register and code for meaning are a vital feature of these databases.

Most databases simply rely on glosses and a "notes" field to explain the meaning of a sign. This database, however, actually defines signs (admittedly using basic English, not Auslan itself). Moreover, the definition also entails identifying sign class membership (i.e., part of speech). For instance, the Auslan database has no less that 24 text fields that record sign meaning and an equivalent number of boolean fields registering the presence or absence of these meanings. The 24 text fields are divided into eight classes or "parts of speech": nominals, verbals, interrogatives, modifiers and linkers ("adverbials"), interactives ("interjections"), plus the additional classes of deictics and "general signs". See Johnston & Schembri (1999) for a definition of these terms. All the parts of speech have distinct fields for at least three related senses (nominals and verbals have up to five). The boolean fields make it possible to query the database as to the part of speech that any particular sign belongs to in Auslan and the number of distinct, but related senses, any sign may have (see Figure 3).

Figure 3 The core meanings window

The sense definitions are in simple basic English (see Appendix 1), modeled on the Cobuild English Learner"s restricted vocabulary. Single word English equivalents are only given if and when they are available in the translation language. Strictly speaking there is no "glossing" as such, at least with reference to meaning definition.

2.3.4 The English search window

Though glossing is not used as a technique for defining signs, it is used as part of a "bilingual" tool for rapidly narrowing down sign searches where the point of entry is a meaning as expressed by a particular English word. In some cases, such a search may yield only one match. In most other cases, however, the match is usually one to many.

As can be seen from Figure 4, there are twelve fields in the "English search" or "English gloss" category which tag each sign entry with at least one English word commonly associated with that sign. A sign may have up to twelve English words as tags or "glosses". This enables two things. First, successful searches and matches where English codes different parts of speech differently ("introduce" and "introduction") without there being any formal distinction in the Auslan sign. Second, successful searches and matches where there is a many to one relationship lexically (e.g., "wet, "damp", "soft" are all applicable to one sign or "before" which matches several signs).

Figure 4 The English search window

The "idgloss" is simply used to identify the sign uniquely in the database. It serves no other purpose and is not part of the sign definition.

2.3.5 The lexical links window

This window on the database is primarily concerned with the relationship of any given sign with other signs of the language in a system of meanings. The meaning of any lexical unit in a language (word or sign) is only partially captured by the attempt to state or explain its meaning either monolingually (using words or signs of the same language) or bilingually (using the words and equivalents of another language). The meaning (valeur) of any linguistic unit in a language is also a function of its place within the linguis-tic system of that language as a whole both paradigmatically and syntagmatically. It is also a function of its relationships of (near or actual) synonymy and antonymy in a semantic network as part of a paradigm. Con-sequently, the database is not just restricted to explaining the meaning of a sign using a definition. In addi-tion, the database has 14 fields that register types of semantic relationships. These are visible on the lexical links window (see Figure 5).

Figure 5 The lexical links window

There are potentially three possible near or actual synonyms and antonyms that can be cross-referenced with each individual sign, three possible cross-references to signs that can be compared or contrasted semantically with the current record, and five possible cross-references that can be made to potential variants forms of a sign. The latter is semantically revealing in signed languages because of the inherent meaningfulness of the sign parameters (handshape, movement, etc.) which are the very substance of different sign forms. As with all fields that cross reference to other sign records, these are text fields that contain the unique identifying gloss ("idgloss") of the cross-referenced sign.

2.3.6 The usage and grammar window

The Auslan database includes information about the "lexical status" of signs, sociolinguistic variables such as region (e.g., northern or southern dialect, state-based signs), religion, school and register (e.g., crude, technical). In all, there are 32 fields that code for the presence or absence of these variables.

Lexical status is discussed in some detail in Johnston & Schembri (1999) and need not be repeated here. Suffice it to say that the full Auslan database is a database of signs which are primarily lexemes. (The reduced data set for the Signs of Australia CD-ROM dictionary is of lexemes only.) Moreover, the class of lexemes has itself been coded and ranked according to type and "degree" of lexicalization. For example, with respect to type, lexemes are coded, where relevant, as compounds, blends, initializations, borrowings and so on. With respect to degree of lexicalization, there are eight checkboxes that code for a range from "high" lexicalization ("Auslan lexical sign") through "mid" lexicalization ("restricted lexical sign") to "low" lexicali-zation ("doubtful lexical status").

The database also includes some specific grammatical information about the Auslan signs. Naturally, sign class membership is relevant here (these fields are also visible on the "usage and grammar window" as they are on the "core meanings window"). It also includes information about the potential of a sign to exploit spatial modifications. There are five checkboxes that code for the inflecting potential of a sign in terms of directionality and location.

In addition, four checkboxes code for notional degree of iconicity attributable to each sign. In descending order from iconic to arbitrary these are: transparent, translucent, opaque and obscure.

All these field types can be seen in Figure 6.

Figure 6 The uasage and grammar window

2.3.7 The semantic domains window

The Auslan database includes information about semantics. This window simply tags each sign in the database according to the semantic areas it may be related to, where this is relevant and possible. Naturally, a semantic network like this is infinitely expandable and not every sign will be able to be categorized unless the categories are themselves so broad as to be meaningless or of little practical use.

Coding for semantic areas enables researchers to quickly isolate all signs relevant to major categories (e.g., colors, family relationships, weather, enumeration, and so on) that they may wish to compare cross-linguistically. All these field types can be seen in Figure 7.

Figure 7 The semantic domains window

2.4 Visual data

In the FoxPro Auslan lexical database, the head of each record is the graphics field that contains a line drawing of the sign for that record (in PICT format). This acts as a place marker and helps researchers visualize the entry sign and orientate themselves within the database. (Signs are organized visually, according to the phonological features of handshape, location, orientation etc., rather than alphabetically according to gloss.)

In the reduced form of the database that lies behind the Auslan CD-ROM, the individual line-drawing graphics have been replaced by digitized video clips of each sign. These video clips are stored separately in their own files and are retrieved instantly whenever a related record in the database is accessed. The first frame of the video clip acts as a place marker in the video field with the video only being played on request (in repeat, slow, or frame by frame modes).

If the head sign of any record is tagged as a variant or modified form of some other sign then a second graphics field is filled with an image of the "stem", "base" or "standard" sign it is deemed to be related to.

2.5 Notation and transcription methods used

Notation (the writing down of the form of individual signs, words, etc., of a language) and transcription (the writing down of phrases, sentences, and other stretches or fragments of the spoken or signed texts of a language) can be done using HamNoSys. As mentioned above in 2.3.1, this is the system used in the Auslan lexical database. Since the database is a repository of individual signs, what concerns us here most directly are notation systems rather than transcription systems, insofar as they can be distinguished and separated.

The use of HamNoSys for the notation of signs in the Auslan database is a consequence of the historical evolution of the database. When begun in the early 1980s, there was no possibility of storing sign language data using digital video technology and, of course, ordinary video technology was cumbersome and totally inefficient for rapid retrieval of individual signs. A notation system was needed to make it possible to record, in some way, the form rather than the "meaning" of each sign (as in a gloss). Various systems were tried for transcribing Auslan data but, by the late 1980s, after HamNoSys became available for Apple computers, the researcher settled on this system, partly because it was the only one available for Apple computers (which he used) and partly because it avoided using any alphabetic symbols whatsoever (as in Stokoe notation).

All signs are notated in HamNoSys in a dedicated text field. This was true even after databases could relatively easily accommodate simple visual data in graphics fields because it could still be argued that a line drawing or a photograph was an inadequate representation and needed to be complemented by a dedicated notation. However, over the period of time that the database has evolved, the importance of notation and transcription systems (and the fields that contain this information) has diminished somewhat.

The most obvious development in sign language lexical databases has been the dramatic increase in the storage capacity of the computers normally available to researchers and the development of digital video technology. This has made it possible to store in the database a video clip of each sign record. A notation is no longer needed in order to simply identify the sign form of any particular entry. Thus one central rationale for a notation system in a sign language database has disappeared completely.

The second development in database design (at least in the Auslan lexical database) has been the provision of separate fields for registering discrete formational features of each sign. This phonological specification can be done at three levels: (i) for a restricted set of features (e.g., only those likely to be useful in searches and sorts); (ii) for each and every feature for which a symbol exists in a notation system (such as HamNoSys); or (iii) for an even larger set of features than exist in any single notation system. Consequently the phonological coding in the database can be poorer than, exactly equivalent to, or even superior to, the information encoded in a dedicated sign notation string.

The Auslan FoxPro database coded for a restricted set of phonological features (see section 2.3.3 above). The information encoded in such a phonological "matrix" (where the notation is simply the selection of the presence or absence of a handshape, location, etc.) can be used as the basis for any kind of sorting of or enquiry into the database much more readily than a transcription or notations system itself (such as HamNoSys). Furthermore, a notation for any sign can, in principle, be generated from the matrix given a relatively simple set of rules of selection, realization and ordering of symbols. This is very straightforward if the phonological matrix has a one to one correspondence with the notation system (i.e., phonological fields match the value of all available notation symbols). (Specifying these rules may be relatively straightforward, but actually writing and implementing them within a computer program is, of course, usually beyond the computing and programming expertise of most sign language researchers, including the author of this paper.)

Nonetheless, there is still a place in sign language lexical databases for a dedicated notation field regardless of whether a notation is computer generated or individually and manually entered. A lexical database can be the source of data sets that match particular glosses to notation strings which could be exported and merged with other programs, especially computerized transcription programs that build automatic dictionaries (see section 4.0). Most importantly, these data sets can help guarantee the internal consistency of glossing, sign class allocation, and morpheme identification by the language researcher. In other words, the notation system has increasingly become an expression of a phonological theory of the language in the broadest sense. Its purpose is no longer narrowly phonetic (capturing the precise form of the token) since this can now be easily viewed on digital video. Its purpose is phonemic (describing the type) and, hence, analytic.

Finally, computer-based notation and transcription systems have also been invaluable resources for the production of written texts in the linguistic analysis of sign language data, even when accompanied by line drawings or photographs. However, even here, new multimedia technologies (CD-ROM, Internet) are making it increasingly possible to incorporate visual data into multimedia documents. Thus notation and transcription systems appear to be of on-going importance and relevance as tools of analysis, if increasingly less relevant in data presentation.

3.0 Data exchange: Compatibility with other databases

Commercially available relational databases (such as 4thDimension, FileMaker Pro, FoxPro, etc.) are perfectly adequate for the majority of storage and research needs at the lexical level, that is in lexical databases. Even if a database format becomes superseded and discontinued, as with FoxPro, most of the data in any of these types of relational databases can be easily exported into another, provided it is encoded in a form that facilitates this. If it is not, then this presents significant compatibility problems.

These problems manifest themselves on at least two levels of particular concern here—notation systems and relations within a database.

With respect to notation systems, the problem of compatibility of differing notation and transcription systems is rightly seen as a major problem. Information which has been painstakingly and laboriously coded within a dedicated sign language notation system is not easily exported into another database that uses a different notation system. In principle, automatic translation routines should be able to convert symbols from one system into the symbols of another. (Once again this may be outside the expertise of most sign linguists using these databases but it is nonetheless still straightforward.) However, problems arise with the over- and under-specification of features in differing systems. A conversion from one system to another can lead to a loss of information if the same differences are not discriminated in the second system as in the first. When reconverted to back into the first system (the so-called "round-trip fidelity problem") errors can become significant.

If a dedicated notation field is the only way in which phonological information about individual records is coded in a sign language lexical database then this does indeed raise serious concerns about the compatibility of databases. Any database that encodes sign parameters solely within a notation field will, of course, encounter problems in transferring that information into another notation or transcription system. There seems to be no simple solution to the problem, apart from all researchers agreeing to use one and only one notation system (the signed language equivalent of the IPA). This seems highly unlikely, if not premature (since phonemic and phonetic discrimination can only grow as more and more signed languages are studied and analyzed).

If, however, sign parameters are coded within a "phonological matrix" within a database (as a reduced set of features as in the Auslan lexical database, or as a comprehensive set of features as in its latest incarnation in FileMaker Pro) it is then a much more simple process to export the data from one signed language database into another (and back again). One could represent this is a notation system by using semi-automatic generation procedures if necessary, but unless the two notation system make exactly the same discriminations there will be information loss at some point. However, much more importantly, one could do the kind of comparative work between the two different data sets that motivates the desire for exportability and compatibility in the first instance. Even if there is no perfect match of categories in both signed languages or databases, it will still be possible to engage in a whole raft of comparative inquiries based on any number and combination of these features.

With respect to relations, one drawback of data importability and exportability is the current inability to transfer relations as well as the raw data itself. However, just as past problems of video compression and limits to information storage on CD-ROMs were solved (or at least vastly improved) after some sign language CD-ROM projects were already under way, similarly, provided researchers use commercially available, popular and widely used relational databases (such as FileMaker Pro), future import/export obstacles (including the relations themselves) should be overcome by the time they become a real necessity.

What is of greatest importance is guaranteeing that the kinds of fields and the way the data is encoded in each database at least facilitates comparison between databases and data sets. Preferably, database design should allow for the useful export and import of data. For example, at minimum, fields for formational features, dialect variants, semantic fields, basic grammatical class (e.g., nominal, verbal) and sign modify-ing class (e.g., directional, plain) should be available to facilitate cross linguistic studies. Despite there be-ing some disagreement in the literature, the most widely accepted models in sign phonology specify five parameters. There is also widespread agreement on the major categories within each parameter. Not surpris-ingly, signed languages appear to have differing phonemic repertoires (especially in handshapes, but also in locations etc.). But there is, nonetheless, significant overlap. On the whole, there is widespread agreement and any phonological matrix from one language would have a large degree of overlap with another signed language. It is in the other fields coding for other aspects of the sign (part of speech, inflection type, vari-ants, dialects) that there has been, and will continue to be, wide divergence.

At least by making explicit the categories used in the Auslan lexical database, and explaining their rationale, this description is a contribution to an attempt to "harmonize" future databases as much as is possible in this respect. Information I have coded for in my database may appear to be useful to other sign language researchers, just as a type of information coded for in their databases but currently not in the Auslan database may be something that I should give consideration to including.

4.0 The future: compatible databases or a "universal" database?

In section 2.3.6, it was mentioned that the Auslan lexical database was just that, a database of lexemes in the language, not a database of Auslan signs. It was research into the lexicon of Auslan that inspired the discrimination between lexemes and signs (or more simply "lexicalized signs" and "non-lexicalized signs") in the analysis of the language (Johnston & Schembri 1999).

Now that the database has evolved and questions regarding database compatibility have surfaced, it is possible that there may be a place in signed language research for databases of signs, not just lexemes. A database of signs can be used for language internal analysis of phonology, lexicon and grammar. More importantly, sign databases can be used for comparing patterns of lexicalization and grammaticization across signed languages and families of signed languages. Though, of course, the distinction between the signs and lexemes still needs to be made and coded in such databases, there is no good reason why a modern signed language database should not be expanded to include tens, if not hundreds, of thousands of records listing all potential sign forms in the language.

One observes, for instance, that the possible simultaneous combination of the sign parameters of handshape, location, orientation and movement can generate, at absolute maximum, only a couple of million possible sign forms. Even if non-manual features were included at the level of the lexicon (though I believe, over all, they should not) the number is simply very large (still in the millions), but certainly not innumerable or, for all practical purposes, infinite. On closer inspection, it appears that vast numbers of the potential sign forms represented by the theoretically possible combinations of sign parameters are simply never realized. In reality, they are impossible (e.g., it is a physical impossibility to achieve certain orientations at certain locations) or impractical (e.g., it is difficult or uncomfortable to perform the combination). They are not actually "well-formed" and only appear possible on paper, according to an abstract formula of potential parameter combinations. This reduces the number of possible sign forms dramatically from a couple of million to hundreds of thousands.

It might be objected that affixation and compounding within any sign language could be expected to compensate for this reduction, increasing the numbers of sign entries significantly once again. However, two facts lead us not to expect this to be the case. First, these processes are relatively rare: affixation is unknown or extremely rare in signed languages observed to date, including Auslan; and compounding, though frequent, is limited. Second, both processes are a feature of lexicalization (i.e., they are associated with lexemes) and not of potential sign forms as such. The lexicons of some signed languages, such as Auslan, seem to have been fairly well documented. For example, Johnston & Schembri (1999) suggest that the Ausland lexicon consists of thousands, not tens of thousands or even hundreds of thousands of lexemes. There appears not to be a large inventory of hitherto unrecognized lexemes in Auslan, let alone compounds or affixed forms.

Consequently, it may well be possible to list in a database all the potential sign forms available to a given signed language. In other words, a future signed language "lexical" database could operate as a vast template of potential sign forms with hundreds of thousands of pseudo-entries. When data justified the creation of a new sign entry, a potential and specific pre-existing slot (generated by the parameter formula for that record) would be activated and occupied by the attested form.

Ultimately, it may well be possible to "superimpose" sign language databases on top of each other in order to reveal shared and divergent patterns of "signification" and lexicalization. Indeed, given the apparent high degree of overlap of many of the sign parameters in terms of both form and meaning, one may even entertain the possibility of a "universal sign language database" in which each sign language was, in fact, simply a language-specific window on the universal database of forms. Of course, future sign language re-search may actually reveal that the assumption of a high degree of shared parameter forms with similar iconic motivations across signed languages is actually false. A failed attempt to build such a universal data-base of sign forms would be good empirical evidence for such divergence.

Regardless of whether sign language databases are lexeme-based or sign-based on the one hand, or language-specific or "universal" on the other, a second important aspect of future work with these databases involves their integration into other computer-based linguistic tools. For example, interlinear transcription programs and programs for the creation and analysis of language corpora. The immediate future task is to design a program that integrates these databases into transcription programs such as IT and Shoebox (used almost exclusively for annotating and transcribing spoken language texts) and/or SyncWriter and SignStream (designed and used for annotating and transcribing signed language texts in a multimedia format). The first type of program has the ability to build dictionaries of glossed and annotated forms. They can import data sets of lexical information from other programs. The second type is able to align multi-tiered transcriptions with digitized video, but currently lack dictionaries or the ability to make them.

As mentioned above (section 2.5), a lexical database can be the source of data sets that match particular glosses to notation strings which could be exported and merged with other programs, especially computerized transcription programs that build automatic dictionaries. Most importantly, these data sets can help guarantee the internal consistency of glossing, sign class allocation, and morpheme identification by the language researcher. The original lexical database can itself be adjusted in response to the transcription process. Already data sets can be exported from the Auslan lexical database and imported into IT. The data set greatly speeds up, if not semi-automates, the processes of text transcription. This indirect manual integration of a database with IT is tedious and fraught with the possibility or likelihood of error at each stage of importing and exporting, but at least it can be done. Unfortunately, it seems that even this relatively simple procedure it is not possible with sign language transcription programs, such as SyncWriter or SignStream.

Obviously, the immediate future task in this area is to streamline and integrate all capabilities into one program: a sign language annotation and transcription program that automatically builds dictionaries (data sets), aligns transcription to video, and which also can import and export data sets from signed language lexical databases.

Appendix 1: Table of fields in Ausland lexical database

Field name Explanation Field type Character width Decimal points

alternate alternating movement Boolean

angcongtf anglican congregation Boolean

animaltf animals & plants Boolean

ant1 antonym 1 Text
(= idgloss) 25

ant1tf Boolean

ant2 antonym 2 Text
(= idgloss) 25

ant2tf Boolean

ant3 antonym 3 Text
(= idgloss) 25

ant3tf Boolean

artstf arts Boolean

aslloantf ASL loan sign Boolean

auslextf Auslan lexical sign Boolean

autostf cars & machines Boolean

begindirtf beginning directional Boolean

blend blend (= A + B) Text 50

blendtf Boolean

bodyacttf action of the body Boolean

bodyloctf body locating Boolean

bodyparttf body parts Boolean

bslloantf BSL loan sign Boolean

catholictf catholic Boolean

cathschtf catholic school Boolean

cf1 compare with 1 Text
(= idgloss) 25

cf1tf Boolean

cf2 compare with 2 Text
(= idgloss) 25

cf2tf Boolean

cf3 compare with 3 Text
(= idgloss) 25

cf3tf Boolean

citytf Cities, countries & continents Boolean

clothestf Clothing & accessories Boolean

colorstf Colours Boolean

comp compound Text 50

comptf Boolean

cookingtf Cooking Boolean

daystf Days & months Boolean

deafness Deafness Boolean

deictic1 Pronouns & pointing signs meaning 1 Text
(= meaning) 254

deictic2 Pronouns & pointing signs meaning 2 Text
(= meaning) 254

deictic3 Pronouns & pointing signs meaning 3 Text
(= meaning) 254

deictic4 Pronouns & pointing signs meaning 4 Text
(= meaning) 254

deictictf Boolean

dirtf directional Boolean

domhndsh domiant handshape Numeric 5 1

domonly dominant hand only moves Boolean

doublehnd double handed sign Boolean

drinkstf Drinking & eating Boolean

educatetf Education Boolean

enddirtf end directional Boolean

english1 English Keyword 1 Text 25

english10 English Keyword 10 Text 25

english11 English Keyword 11 Text 25

english12 English Keyword 12 Text 25

english2 English Keyword 2 Text 25

english3 English Keyword 3 Text 25

english4 English Keyword 4 Text 25

english5 English Keyword 5 Text 25

english6 English Keyword 6 Text 25

english7 English Keyword 7 Text 25

english8 English Keyword 8 Text 25

english9 English Keyword 9 Text 25

engtf1 Boolean

engtf10 Boolean

engtf11 Boolean

engtf12 Boolean

engtf2 Boolean

engtf3 Boolean

engtf4 Boolean

engtf5 Boolean

engtf6 Boolean

engtf7 Boolean

engtf8 Boolean

engtf9 Boolean

familytf Family Boolean

feelingstf Feelings & emotions Boolean

foodstf Foods Boolean

furntf Furntiure & fixtures Boolean

genmean Meaning of a "general sign" Text
(= meaning) 254

gensign General sign Boolean

govtf Government & politics Boolean

groomtf Grooming Boolean

healthtf Health & medicine Boolean

hobbiestf Recreation Boolean

idgloss unique sign name Text
(= idgloss) 25

inittf Initialisation Boolean

interj1 Interjection/interactive sign meaning 1 Text
(= meaning) 254

interj1tf Boolean

interj2 Interjection/interactive sign meaning 2 Text
(= meaning) 254

interj2tf Boolean

interj3 Interjection/interactive sign meaning 3 Text
(= meaning) 254

interj3tf Boolean

judgetf Judgements & attitudes Boolean

jwtf Jehovah"s Witnesses Boolean

law Law Boolean

lingacttf Actions using language Boolean

locdirtf Locations & directions Boolean

locprim primary location Numeric 5 0

locsecond secondary location Numeric 5 0

marginal marginal lexical sign Boolean

mathstf Arithmetic, maths & geometry Boolean

mattertf Materials Boolean

metalangtf Language about language Boolean

mindacttf Mind & thinking Boolean

moneytf Money Boolean

naturetf Geography & the natural world Boolean

nomlex1 Noun meaning 1 Text
(= meaning) 254

nomlex1tf Boolean

nomlex2 Noun meaning 2 Text
(= meaning) 254

nomlex2tf Boolean

nomlex3 Noun meaning 3 Text
(= meaning) 254

nomlex3tf Boolean

nomlex4 Noun meaning 4 Text
(= meaning) 254

nomlex4tf Boolean

nomlex5 Noun meaning 5 Text
(= meaning) 254

nomlex5tf Boolean

nthtf Northern dialect Boolean

obscuretf obscure Boolean

oldentry popular explantion of visual etymology Text 254

onehand one handed sign Boolean

opaquetf opaque Boolean

ordertf Order & sequence Boolean

orienttf orientating Boolean

otherreltf other religions Boolean

para parallel movement Boolean

partlex1 adverbs & linkers meaning 1 Text
(= meaning) 254

partlex1tf Boolean

partlex2 adverbs & linkers meaning 2 Text
(= meaning) 254

partlex2tf Boolean

partlex3 adverbs & linkers meaning 3 Text
(= meaning) 254

partlex3tf Boolean

peopletf People (descriptions) topic Boolean

propnametf Proper Name lexical status Boolean

qldtf queensland dialect Boolean

qualitytf Quality, kind & condition Boolean

quantitytf Quantity, size & rate Boolean

queries notes Text 254

questle2tf Boolean

questlex question sign meaning 1 Text
(= meaning) 254

questlex2 question sign meaning 2 Text
(= meaning) 254

questlextf Boolean

reglextf regional lexcial sign Boolean

religiontf Religion Boolean

restrict restricted lexical sign Boolean

roomstf Rooms Boolean

salutation Greetings and leave-takings Boolean

satf South Australian dialect Boolean

seasonstf Weather Boolean

sense major entry "homophones": 1, 2, etc. Numeric 2

senseacttf Senses Boolean

sextf Sex Boolean

shapestf Shapes & patterns Boolean

shoptf Shopping & business Boolean

sn sign number Numeric 6 1

sportstf Sport Boolean

stateschtf state school Boolean

sthtf southern dialec Boolean

subhndsh subordinate handshape Numeric 5 1

sym symmetrical movement Boolean

syn1 synonym 1 Text
(= unique sign name) 25

syn1tf Boolean

syn2 synonym 2 Text
(= unique sign name) 25

syn2tf Boolean

syn3 synonym 3 Text
(= unique sign name) 25

syn3tf Boolean

tastf Tasmanina dialect Boolean

telecommun Electronics, telecommunications & computers Boolean

timetf Time Boolean

transltf translucent sign Boolean

transptf transparent sign Boolean

traveltf Travel & transport Boolean

twohand two-handed sign Boolean

utensilstf Gadgets, utensils & tools Boolean

verblex1 Verb & adjective meaning 1 Text
(= meaning) 254

verblex1tf Boolean

verblex2 Verb & adjective meaning 2 Text
(= meaning) 254

verblex2tf Boolean

verblex3 Verb & adjective meaning 3 Text
(= meaning) 254

verblex3tf Boolean

verblex4 Verb & adjective meaning 4 Text
(= meaning) 254

verblex4tf Boolean

verblex5 Verb & adjective meaning 5 Text
(= meaning) 254

verblex5tf Boolean

victf Victorian dialect Boolean

watf West Australian dialect Boolean

worktf Work & employment Boolean

Bibliographical References

Johnston, Trevor: Transcription and glossing of sign language texts: Examples from AUSLAN (Australian Sign Language). In: International Journal of Sign Linguistics. 1 2 (1991) - S. 3-28 (back to text)

Johnston, Trevor / Royal N.S.W. Institute for Deaf and Blind Children: Signs of Australia on CD-ROM : a dictionary of Auslan (Australian Sign Language). In: North Rocks, NSW : North Rocks Pr. 1997/1998 (Software) (back to text)

Johnston, Trevor: Signs of Australia : a new dictionary of Auslan (the sign language of the Australian deaf community). rev. ed. North Rocks, NSW : North Rocks Pr. 1998 - 603 S. (back to text)

Johnston, Trevor & Adam Schembri (in press) On Defining Lexeme in a Signed Language. Sign Language & Linguistics, 2:1 (1999) (back to text)

Posted: 9.12.99

List of workshop papers

Field name	Explanation	Field type	Character width	Decimal points
alternate	alternating movement	Boolean
angcongtf	anglican congregation	Boolean
animaltf	animals & plants	Boolean
ant1	antonym 1	Text (= idgloss)	25
ant1tf		Boolean
ant2	antonym 2	Text (= idgloss)	25
ant2tf		Boolean
ant3	antonym 3	Text (= idgloss)	25
ant3tf		Boolean
artstf	arts	Boolean
aslloantf	ASL loan sign	Boolean
auslextf	Auslan lexical sign	Boolean
autostf	cars & machines	Boolean
begindirtf	beginning directional	Boolean
blend	blend (= A + B)	Text	50
blendtf		Boolean
bodyacttf	action of the body	Boolean
bodyloctf	body locating	Boolean
bodyparttf	body parts	Boolean
bslloantf	BSL loan sign	Boolean
catholictf	catholic	Boolean
cathschtf	catholic school	Boolean
cf1	compare with 1	Text (= idgloss)	25
cf1tf		Boolean
cf2	compare with 2	Text (= idgloss)	25
cf2tf		Boolean
cf3	compare with 3	Text (= idgloss)	25
cf3tf		Boolean
citytf	Cities, countries & continents	Boolean
clothestf	Clothing & accessories	Boolean
colorstf	Colours	Boolean
comp	compound	Text	50
comptf		Boolean
cookingtf	Cooking	Boolean
daystf	Days & months	Boolean
deafness	Deafness	Boolean
deictic1	Pronouns & pointing signs meaning 1	Text (= meaning)	254
deictic2	Pronouns & pointing signs meaning 2	Text (= meaning)	254
deictic3	Pronouns & pointing signs meaning 3	Text (= meaning)	254
deictic4	Pronouns & pointing signs meaning 4	Text (= meaning)	254
deictictf		Boolean
dirtf	directional	Boolean
domhndsh	domiant handshape	Numeric	5	1
domonly	dominant hand only moves	Boolean
doublehnd	double handed sign	Boolean
drinkstf	Drinking & eating	Boolean
educatetf	Education	Boolean
enddirtf	end directional	Boolean
english1	English Keyword 1	Text	25
english10	English Keyword 10	Text	25
english11	English Keyword 11	Text	25
english12	English Keyword 12	Text	25
english2	English Keyword 2	Text	25
english3	English Keyword 3	Text	25
english4	English Keyword 4	Text	25
english5	English Keyword 5	Text	25
english6	English Keyword 6	Text	25
english7	English Keyword 7	Text	25
english8	English Keyword 8	Text	25
english9	English Keyword 9	Text	25
engtf1		Boolean
engtf10		Boolean
engtf11		Boolean
engtf12		Boolean
engtf2		Boolean
engtf3		Boolean
engtf4		Boolean
engtf5		Boolean
engtf6		Boolean
engtf7		Boolean
engtf8		Boolean
engtf9		Boolean
familytf	Family	Boolean
feelingstf	Feelings & emotions	Boolean
foodstf	Foods	Boolean
furntf	Furntiure & fixtures	Boolean
genmean	Meaning of a "general sign"	Text (= meaning)	254
gensign	General sign	Boolean
govtf	Government & politics	Boolean
groomtf	Grooming	Boolean
healthtf	Health & medicine	Boolean
hobbiestf	Recreation	Boolean
idgloss	unique sign name	Text (= idgloss)	25
inittf	Initialisation	Boolean
interj1	Interjection/interactive sign meaning 1	Text (= meaning)	254
interj1tf		Boolean
interj2	Interjection/interactive sign meaning 2	Text (= meaning)	254
interj2tf		Boolean
interj3	Interjection/interactive sign meaning 3	Text (= meaning)	254
interj3tf		Boolean
judgetf	Judgements & attitudes	Boolean
jwtf	Jehovah"s Witnesses	Boolean
law	Law	Boolean
lingacttf	Actions using language	Boolean
locdirtf	Locations & directions	Boolean
locprim	primary location	Numeric	5	0
locsecond	secondary location	Numeric	5	0
marginal	marginal lexical sign	Boolean
mathstf	Arithmetic, maths & geometry	Boolean
mattertf	Materials	Boolean
metalangtf	Language about language	Boolean
mindacttf	Mind & thinking	Boolean
moneytf	Money	Boolean
naturetf	Geography & the natural world	Boolean
nomlex1	Noun meaning 1	Text (= meaning)	254
nomlex1tf		Boolean
nomlex2	Noun meaning 2	Text (= meaning)	254
nomlex2tf		Boolean
nomlex3	Noun meaning 3	Text (= meaning)	254
nomlex3tf		Boolean
nomlex4	Noun meaning 4	Text (= meaning)	254
nomlex4tf		Boolean
nomlex5	Noun meaning 5	Text (= meaning)	254
nomlex5tf		Boolean
nthtf	Northern dialect	Boolean
obscuretf	obscure	Boolean
oldentry	popular explantion of visual etymology	Text	254
onehand	one handed sign	Boolean
opaquetf	opaque	Boolean
ordertf	Order & sequence	Boolean
orienttf	orientating	Boolean
otherreltf	other religions	Boolean
para	parallel movement	Boolean
partlex1	adverbs & linkers meaning 1	Text (= meaning)	254
partlex1tf		Boolean
partlex2	adverbs & linkers meaning 2	Text (= meaning)	254
partlex2tf		Boolean
partlex3	adverbs & linkers meaning 3	Text (= meaning)	254
partlex3tf		Boolean
peopletf	People (descriptions) topic	Boolean
propnametf	Proper Name lexical status	Boolean
qldtf	queensland dialect	Boolean
qualitytf	Quality, kind & condition	Boolean
quantitytf	Quantity, size & rate	Boolean
queries	notes	Text	254
questle2tf		Boolean
questlex	question sign meaning 1	Text (= meaning)	254
questlex2	question sign meaning 2	Text (= meaning)	254
questlextf		Boolean
reglextf	regional lexcial sign	Boolean
religiontf	Religion	Boolean
restrict	restricted lexical sign	Boolean
roomstf	Rooms	Boolean
salutation	Greetings and leave-takings	Boolean
satf	South Australian dialect	Boolean
seasonstf	Weather	Boolean
sense	major entry "homophones": 1, 2, etc.	Numeric	2
senseacttf	Senses	Boolean
sextf	Sex	Boolean
shapestf	Shapes & patterns	Boolean
shoptf	Shopping & business	Boolean
sn	sign number	Numeric	6	1
sportstf	Sport	Boolean
stateschtf	state school	Boolean
sthtf	southern dialec	Boolean
subhndsh	subordinate handshape	Numeric	5	1
sym	symmetrical movement	Boolean
syn1	synonym 1	Text (= unique sign name)	25
syn1tf		Boolean
syn2	synonym 2	Text (= unique sign name)	25
syn2tf		Boolean
syn3	synonym 3	Text (= unique sign name)	25
syn3tf		Boolean
tastf	Tasmanina dialect	Boolean
telecommun	Electronics, telecommunications & computers	Boolean
timetf	Time	Boolean
transltf	translucent sign	Boolean
transptf	transparent sign	Boolean
traveltf	Travel & transport	Boolean
twohand	two-handed sign	Boolean
utensilstf	Gadgets, utensils & tools	Boolean
verblex1	Verb & adjective meaning 1	Text (= meaning)	254
verblex1tf		Boolean
verblex2	Verb & adjective meaning 2	Text (= meaning)	254
verblex2tf		Boolean
verblex3	Verb & adjective meaning 3	Text (= meaning)	254
verblex3tf		Boolean
verblex4	Verb & adjective meaning 4	Text (= meaning)	254
verblex4tf		Boolean
verblex5	Verb & adjective meaning 5	Text (= meaning)	254
verblex5tf		Boolean
victf	Victorian dialect	Boolean
watf	West Australian dialect	Boolean
worktf	Work & employment	Boolean