Ernst Thoutenhoofd1
Deaf Studies: Department of Education Studies, Social Studies & Combined Honours, University of Central Lancashire
thoutenhoofd@uclan.ac.uk

The development of a FileMaker Pro database for the morphemic analysis of productive forms in BSL

1. Project background
1.1 Analysis
1.2 The PLD in the making
2. The datamodel of the Productive Lexicon Database
3. The data layout of the Productive Lexicon Database
3.1 BSL file
3.2 English Dictionary file
3.3 Movies file
3.4 Signatures file
3.5 Morphemes file
3.6 Notations file
4. Current prospects for further development
4.1 Recent redevelopments
4.2 AUSLAN/BSL/English
5. Sign language dictionaries: revisiting issues and developments

6. Appendix 1
6.1 Overview of the Productive Lexicon Database’s categorical content
6.2 References
60.3 Footnotes

1. Project background

The project that required the development of the FileMaker Pro files discussed in this article sought to explore a particular type of lexical patterning in naturally occurring British Sign Language (BSL) discourse. The project aimed to analyse productive lexicon at the morphological level of surface structure. Fluent BSL users, it seems, exploit productive lexicon highly efficiently. Although some work has been carried out on these productive resources, there has been insufficient detailed work on the precise nature of the morpheme types which are used, and on the rules governing their combination (see also the contribution by Mary Brennan elsewhere in this volume). The database developed for storing BSL productive forms was expected to allow for detailed analysis of these two areas of productive sign formation.

Contextually, this database development rides hot on the heels of the SIGNBASE project for which a team of researchers from the UK and the Netherlands developed a sign language repository in C++. The Productive Lexicon Database (or PLD ) derives from the SIGNBASE developments the basis for its database structure (for a description of the SIGNBASE project see Brien et al. 1995). Most importantly, these include a central ‘sign-record’, linked to various types of information via a number of pre-specified sign-to-X ‘relationships’. Where this newer database structure most crucially deviates from SIGNBASE is in the fact that neither the nature, nor the number, of these relationships are ‘hard-coded’, or authored, into the database programming. Instead, each ‘relationship’ is constructed by exploiting the software’s ability to link/unlink separate, independent files to a ‘core’ file (the file containing the sign records).

1.1 Analysis

As a more detailed categorisation of the nature of the research, the database design brief included reference to the following activities:

The phonological description involves describing the manual and non-manual articulators and their actions. These are then transcribed using the notation system developed for BSL signs and exploited in previous publications and multimedia productions of the DSRU.

The signs are analysed into their component morphemes, which may be arranged in both sequential and simultaneous ordered patterns. A major aim of the project was to discover whether specific types of morpheme co-occur with a limited range of other types or, for example, only in specific simultaneous or sequential pattern combinations. Once sufficient detailed data concerning sign structure would have been entered into the database, it would be possible to search for different types of morpheme combinations and thus elaborate the rules of co-occurrence.

Each individual morpheme within a given productive sign could then be analysed in terms of its category type. The main classes of morpheme type are those which have been established, for example, within the Dictionary of BSL English (Brien, 1992) and in the work of Brennan (1990a, 1990b, 1994). These include such forms as size and shape classifiers, handling classifiers, aspectual movement morphemes and metaphor morphemes. However, while some forms are fairly easily allocated to particular classes, some morphemes pose categorisation difficulties. It was therefore decided to keep a number of ‘open’ categories stored in the database; the data in these open categories could always be re-allocated to existing or newly defined closed categories as the research evolved. As a direct result of the exploratory nature of the research in hand, the PLD ’s design brief has resulted in a new type of flexible, open-ended and modular database for linguistics analysis of productive forms.

1.2 The PLD in the making

Using the design brief to guide the structural layout, a relational database was developed in the commercially available and popular FileMaker Pro software (version 4.0). This development only proved necessary when the SignBase Administrator bespoke software (see Brien and Brennan 1995) from which the Productive Lexicon project intended to benefit turned out to be unsuitable as a research tool for this project. It was therefore agreed that a new database be designed using commercially available software, for two main reasons: to save project effort and cost, and to retain full command over structural specifications and development opportunities arising.

The development of this database took about two months, including a number of rigorous changes following pilot assessments. The general feeling among project members was that the PLD development work was a successful attempt—albeit an unsophisticated one at the programming level—at a dedicated form of sign linguistic database modelling, and with the considerable benefit of three years’ experience in working through a content structure for the SignBase Administrator, the PLD demonstrates what can be achieved by marrying linguistic structure and database structure in all stages of development.2 The data-model on which the PLD is based clearly reflects our understanding of the morphology of productive sign lexicon as formulated at the outset of the project, but significantly, the PLD could be adapted and expanded on the fly as that understanding developed; as our understanding grew, so too did the PLD’s capabilities.

2. The datamodel of the Productive Lexicon Database

the PLD effectively comprises series of related records located in multiple database files which are ‘connected’ only by the core sign record’s auto-entered id number. The id number is an entirely arbitrary allocation, and merely makes it possible to change a sign’s label without breaking the links between the files. This information is stored in the central file, called the BSL file and is the only information required to start a new entry. Since the PLD is a research tool and therefore subject to frequent redevelopment, the graphic interface is unremarkable, but all functions in the files are strongly colour-coded; the PLD groups together functions relating to data maintenance, functions relating to data exploration, and navigating functions operating within a file and between files in separate, colour-coded menus.

The relations between the files that are a part of the current version of the PLD can quite simply be schematised as in figure 1 below.

figure 1: the PLD design model

The horizontal model describes the following set of central characteristics:

Relational links are achieved by storing main lookup key data in records in each of the files separately: when a new sign record is entered, the PLD places a new record for that sign in each of the related files. And conversely, when a sign record is deleted, the PLD deletes all records concerning that sign in the other files. Apart from this set of core functions, the sign record itself contains mainly project specific administrative information. All substantive content found on a given sign in the BSL file is derived from the other files in what FileMaker Pro calls ‘portal windows’, since one portal window can store data from multiple records in another file. For example, the Signature file operates like a lookup table linking sign entries to morphemes records, placing one record for each morpheme of a sign entry in the Morphemes file. But in the ‘Morphemes’ section of the central BSL file, a portal window displays the data in all of these morpheme records as a listing of morphemes in a single sign record. This has the added benefit of being able to be selective, in the BSL file, about the data from the Morphemes file that need to be available for any particular project, since one can be selective about the information that should be available through the portals.

An overview listing of the Productive Lexicon Database’s current content definitions precedes the bibliography (see the Appendix).

3. The data layout of the Productive Lexicon Database

3.1 BSL file

This file concentrates information concerning BSL signs as lexical items. It returns information concerning BSL grammar, BSL usage, English equivalence (i.e. bilingual dictionary functions), and a sign’s morphology, as well as offering (besides a characteristic still image) four different kinds of movie: BSL citation form, BSL definition, BSL example sentence, and BSL story. Any central sign record can have only one citation form but can have multiple definitions, examples and stories linked to them. Of course, it is perfectly likely that the file will contain separate records for multiple productions of one sign—in fact, it does, and there would be means in FileMaker scripting to link these signs. The BSL file serves nevertheless as a central reservoir of information: it has been constructed to reflect, from data in other files, only that which users wish to have access to given the nature of the project in hand.

The BSL file’s ‘Central Sign Record’.

The BSL file’s listing of morphemes. Note the morphemic signature in the bottom-right of the window: This sign has three sequential morphemes (horizontal row), with a further two being simultaneous with the second sequential morpheme (vertical row).

3.2 English Dictionary file

The English Dictionary file contains English vocabulary only in so far as it has been defined as equivalent to existing signs in the BSL file. That is, there can only ever be English words contained in the PLD for which there is a sign available in translation. The English Dictionary file furthermore stores information concerning English grammar, English usage, BSL equivalence (i.e. bilingual dictionary functions), as well as offering English definition(s) and English example sentence(s). Whatever information concerning English is given in the BSL file is maintained here.

The English Dictionary file’s main window.

3.3 Movies file

The Movies file is where all the Productive Lexicon Database’s movie references are stored. The file incorporates a separate ‘Layout’ for each of the four types of movie reference, and another one for the representative still image. Each movie reference is accompanied by an English translation field, which also makes the movie’s meaning available for text-string search functions. The Movies file can also contain information concerning the original source of the material (e.g. the name or reference number of the source tape) and the current location of the movie (e.g. disk location), and can include a time-code reference so that signs can be sorted according to their original location in the narrative that contained them. Whatever information concerning movies is given in the BSL file is maintained here.

The Movies file’s main window. In this case the movie has been defined as a ‘Citation’ derived from the Urostomy tape starting at 7 seconds and 9 frames into the movie.

3.4 Signatures file

The Signatures file mediates between the BSL file and the Morphemes file. It allows for signs to be assigned, currently, up to eighty-one morphemes: nine sequential morphemes and nine simultaneous morphemes. In assigning morphemes to a sign, the file auto-enters a label for the morphemic construct of the sign on the following basis: a sign necessarily has at least one morpheme, so by default a sign is labelled "simple (mono-morphemic) sign”. If one or more sequential morphemes are added, the sign is re-labelled as "complex (sequential morphemic) sign”. Extending this principle, the file can also define a sign as a complex, simultaneous morphemic sign, or as a complex, sequential-simultaneous morphemic sign. For every morpheme thus added to the signature record of a sign, the PLD creates a corresponding record in the Morphemes file containing not only the sign’s name and id number but also the location of that morpheme within the morphemic signature. Deleting a sign from the central BSL file causes a script there to erase all related morpheme records in the Morphemes file. The Signatures file can contain any amount of information which is relevant to the morphemic structure of a sign as a whole but which is sub-lexical to the sign’s information stored in the BSL file.

The Signatures file. Here the user simply records the number and ordering of the morphemes involved in the sign. The file then fills in the ‘Sign type’ information and prepares a new corresponding record for every morpheme in the Morphemes file.

The dark grey area of buttons represents the amount of morphemes available under the first version of the Productive Lexicon Database—but when data analysis had begun it seemed that a significant additional number of morphemes were involved in the production of productive forms, and so a comfortable surplus of extra morpheme buttons were added. This modification, carried out in a matter of hours, demonstrates the flexibility of the FileMaker Pro software.

3.5 Morphemes file

The Morphemes file is a unique database file which organises records as discrete morphemes; that is, morphemes can be searched and sorted by sign parameter information or any other morphological-level category, as well as by sign label. The list of entries has as many records for any sign as their are morphemes of it. It will list those in the order as defined in the Signatures file, with simultaneous morphemes preceding sequential morphemes that follow the morpheme defined as co-occurring with that simultaneous morpheme, based simply on the ordering schema of the decimal dot, here taken as indicating simultaneity, e.g. 1.1 (initial morpheme), 1.2 (sim), 2.1 (seq) et cetera. Each morphemic record in this file is a discrete record containing the conventional parameter information of a sign’s production (location, handshape, orientation, movement, and non-manual feature) as well as information concerning the linguistic status of the morpheme and any additional information pertinent to the morphological level of sign description. Whatever information concerning discrete morphemes is given in the BSL file is maintained here.

The Morphemes file’s ‘List’ layout. Here the morphemic information can be searched, per sign, per group of signs (e.g. those belonging to one movie), or indeed across the entire database.

The morphemic information of a sign is stored as a record for every morpheme. The ‘Overview’ window here shows the first morpheme (1.1) of the sign, in this case a nonmanual morpheme.

In the above picture the same entry is seen from the ‘Nonmanual Features’ Layout. Here the information is stored through a choice of free text fields, or by choosing a text string or category from editable pop-up menus. Information can be stored regarding both the form and the function of the morpheme.

3.6 Notations file

The Notations file stores the BSL transcription of signs, using a transcription format based loosely on that developed by Stokoe et al. (1965) and later revised by Brennan et al. (1984), but adapted here again to better suit ASCII file characteristics. The separate idea of an ‘English translator’ has been raised, a scripted function which would propose a written English translation of the notations entered, and indeed it does seem possible to script an accurate English translator entirely within a FileMaker environment. In the current version, very basic FileMaker scripting generates a syntactically deficient and crudely punctuated English translation of the notations entered. Within narrowly defined limits however, it is already possible to search for parameter descriptions through both the transcription form itself and the related English text-strings; furthermore, the English syntax, although deficient, reflects sufficient detail to allow users to ‘read back’ the notations entered as a double-check during data entry. As a direct result of these activities it is possible to entertain the notion of software driven BSL notation/written English interpreters in the medium-term future, on the lexical level at least, although, as for notation and data handling conventions and standards earlier was never a development priority within the project’s lifetime, and it has therefore not been given much attention.

The Notations file showing the ‘Location’ Layout. Clicking any of the Location buttons places the corresponding character in the notation field, and an accompanying English translation in the field below that. There are additional Layouts for Handshape, Orientation, Hand Arrangement, and Movement.

4. Current prospects for further development

FileMaker Pro allows for data manipulation through files, records, layouts and fields by anyone with the ability to compose scripts using FileMaker’s next-step-type scripting menu which lists pre-organised scripting statements. This excellent scripting facility provides FileMaker Pro with a remarkable data modelling flexibility which can be exploited to operate vertically, i.e. deeper into currently relevant categories of analyses, as well as horizontally, i.e. spreading out to include additional lexicographic (syntactic, usage, discourse, sociometric), or even encyclopaedic type data categories. The PLD can thus be expanded by adding new ‘layouts’ and/or fields to existing files (for storing so-called ‘flat’ information), or by adding more files (for storing so-called ‘relational’ information), each of which can have scripted—that is, particular—relations to any one, or more, or all, files associated with the Productive Lexicon Database.

the PLD appears to achieve significantly faster processing rates than did the bespoke system developed under the SignBase project, but perhaps more importantly, the user’s accessibility to information fields is directly suggested by the linguistic status of the data in these fields. This results from sign linguistics guided data-modelling and a user-guided design approach; for instance, no ‘multiple window’ layering is necessarily visible in the interface, and actual sign movies are never more than a single mouse-click away from the information presented.3

the PLD benefits from an extensive range of FileMaker Pro search functions and capabilities; these can in due course be scripted to include syntactically more complex text-string, numerical, logical and field status search functions: more sophisticated scripting would make both qualitative and in-depth quantitative analyses on the dataset possible.

Finally, the PLD benefited from significant developmental work invested in the SignBase project by all the Signbase partners, but the PLD’s greater research-driven flexibility means that it is a more persuasive model for various next-generation applications. Due to the fact that the PLD is built using commercial software, the model has contemporary networking and data exchange functions under both the MacOS and WindowsOS, features hosting functions under networking conditions and is ‘web-wise’. Import/export encoding include SYLK, DIF, WKS and BASIC, as well as being able to access ClarisWorks, DBF, and Excel files. Although current import and export procedures allow for some flexibility—for example in field ordering—data cannot be significantly reconstituted in the 4.0 version used to build the Productive Lexicon Database. But later versions (post-October 1998) of FileMaker Pro have been SQL compliant and feature an ODBC dialogue box—these functional additions offer connectivity to large relational database engines such as Oracle, SQL Server and Sybase, placing FileMaker Pro in direct competition with Microsoft’s Access in particular.

the PLD reflects no expectations regarding sign language notation exchange mechanisms, and even less concerning data ‘standards’, since neither of these were on the list of priorities. Such advanced mechanisms and other IT issues aside, sign descriptions—Hamnosis sign descriptions for instance—could simply be incorporated as part of main lookup key arrangements where the notation system used does not violate the basic organisation of western script organisation (i.e. characters left to right in horizontal lines from top to bottom) and remain within ASCII conventions that keep them available to regular font design encoding (TrueType and Postscript) and use.

4.1 Recent redevelopments

The ability, through portals, to ‘borrow information’ between files also insures the integrity of information stored in the separate files, and this in turn allows various files to be used in multiple, very different sorts of applications. For example, the entire BSL file could simply be copied and linked to other files containing sign records of another sign language. This kind of independent data-integrety of the files that make up a database would allow for the relatively easy construction of bi- and multi-lingual dictionaries.

As an example, in a recent prototypical redevelopment of the PLD into a trilingual dictionary type database, the English file (word file) and BSL file (sign record file) were duplicated, and their scripted relations adapted to suit dictionary purposes—as suggested earlier, the conditions that were set for storing information matched those of the SIGNBASE repository. The BSL file was then copied again as a file containing ‘another’ sign language, resulting in the data model shown in figure 2.

figure 2: Elaborated dictionary model based on the Productive Lexicon Database’s files

As before, relations of one to one, one to many, or many to one can hold between record numbers in each of the parallel files, so where a sign in sign language (1) may only have one word associated with it in spoken language (1), it may have multiple signs associated with it in sign language (2). Or indeed, two roughly equivalent, associated signs in sign languages (1) and (2) will be linked to different definition movies and translations in their respective sign language, and these may reflect meanings that are not entirely overlapping. These signs may consequently be linked separately to similar, but not entirely overlapping spoken language word sets, reflecting the meaning differences between the three languages, and so on. Remember too that individual language files do not have to operate on the basis of any one overarching glossing or sign labelling convention, since record associations are established manually (portal windows offer indexed entry lists—one list for the contents of each language file—in each language file linked to the database) and stored by an arbitrary record number. Beyond glossing and id numbering, what is stored for each language individually depends wholly on the available data for each language, and the intended audience of the database. Therefore, even the glossing conventions between two sign languages can be different, with the relations between signs from the two languages entirely based on equivalences in meaning, or on form, rather than on glossing convention.

4.2 AUSLAN/BSL/English

In the example introduced above, the second sign language file was defined as containing Australian Sign Language (AUSLAN); new fields were added to exactly match those of the Fox Pro database used for the AUSLAN CD Rom Dictionary (Johnston 199X), on the basis of a list of the required field definitions and descriptions kindly provided by Trevor Johnston, even including some idiosyncratic fields, for instance that in which ‘TJ’ can store his personal notes and observations. This new database also benefits from a much simplified set of navigation tools. Of course, the BSL file could simply be disconnected from the other files in the construction of a bilingual AUSLAN/English repository.

The AUSLAN Dictionary’s sign record file—note the ‘sign links’ heading under which the sign can be linked to other files, in this case a BSL file and an English file.

The grammar layout of the AUSLAN dictionary, detailing fields particular to Johnston’s lexicographic analysis of AUSLAN vocabulary.

The much simplified navigation structure of the trilingual BSL/AUSLAN/English dictionary has concentrated many of the navigation and maintenance functions on a separate ‘navigation-layout’. Note the blue buttons that would take the user to corresponding or linked records in either the BSL file or the English file.

This FileMaker Pro prototype is one step removed (albeit the largest one: the data input) from offering a tri-lingual AUSLAN/BSL/English dictionary based entirely on linguistic principles which may vary (including their notations) between the three languages it incorporates. In developments such as these, sign language lexicography is going down routes no spoken language lexicography project has gone before.

5. Sign language dictionaries: revisiting issues and developments

Following their asessment of a range of issues concerning sign lexicographic research and development, premised on the need for sign language dictionaries to offer a true reflection of the nature of sign language meaning and structure, Brien and Brennan conclude that the main contribution of new technology may lie in the fact that it

allows a sign language to be presented in its own terms, and through its own resources can at least provide the potential for Deaf people themselves to play a key role in dictionary making. (Brien and Brennan 1995:336)

The need for Deaf people to be centrally involved in sign language lexicography is undoubted, and probably indeed a clear invitation deriving from ICT as a research tool. But one would imagine that the role for computer technology is more incisive than this. In this respect, Brien and Brennan’s assessment is mostly one concerning the state of sign language lexicography content development and its principles, rather than an assessment of technological change and inventiveness in providing the tools and the ‘workspace’ for sign lexicographic activities.

The FileMaker Pro developments set out here are prototypical, especially in the extent to which the developments described mostly happened on a shoe-string and without the assistance of trained IT experts in database modelling and authoring. However, the PLD and the AUSLAN/BSL/English dictionary prototype also do suggest a flexible kind of lexicographical workspace unimagined by Brien and Brennan in their assessment now five years ago. Most notably, following their observation that (as is the case for spoken languages) the need exists for various kinds of ‘specialist’ sign language dictionaries such as dialect dictionaries, etymological dictionaries, thematic dictionaries, and thesauri, they conclude that

it will be too costly, in terms of both memory space and finance, to make everything available within a single application. (Brien and Brennan 1995:315)

Flexible, open data-modelling suggests an alternative to cumbersome all-encompassing single applications (although few solutions will ever overcome the problem of finance). When considered in modular terms, repositories storing different kinds of information may nevertheless be endlessly reconfigured and recombined into applications for varying needs and audiences. Clearly, this is playing to perhaps one of the few strengths of a couple of as yet untried and tested databases, but what counts is the principle.

In particular, taken in conjunction with Brien and Brennan’s (1995:321) logical caution that sign lexicographers should be wary of including information on categories which are still keenly debated and contested in the sign linguistic literature, there is very clear support here for the position that what is least needed in sign lexicography at the start of 2000 is an epistemological model of progress under which researchers and lexicographers ‘agree voluntarily’ on any set of principles or formats for storing sign lexicographic data which can at best be based on the occasionally shallow and frequently contradictory sign linguistic research of the later 20th Century. Even apart from the pretense involved in seeing such local unilateral agreements as a service to those perceived to be ‘followers’ in that the signatories project themselves, in some way or another, in the role of the avant-garde on the grounds of a presumed superior (because more advanced) knowledge of contemporary sign lexicography, what is objectionable in such an aim is the total disregard for one’s own ignorances and the lack of definition in the terrain of interest, and moreover the danger that the contributions of the most exciting developments may fail to challenge the research community—although this kind of development would indeed be typical of behaviour in any academic community, locating the idea itself as wholly conventional and predictable in academic terms (Kuhn 1970).

Rather, what is required, of those involved in designing sign language repositories and data models as well as of those authoring content, is to develop a grasp over the extent to which new technology allows for a kind of freedom in our lexicographic imagination which can take the field over and beyond the conventions used in the kinds of lexicography most have always used as guiding examples: spoken language dictionaries.

6. Appendix 1

6.1 Overview of the Productive Lexicon Database’s categorical content

BSL file (listing includes contents of Movies and Notations files)

English file

Signatures file

Morphemes file

6.2 References

Brennan, M., Colville, M.D, Lawson, L.K. and Hughes, G. (second edition, 1984): Words in Hand: A Structural Analysis of the Signs of British Sign Language. (1980). British Deaf Association, Carlisle, England.

Brennan, M. (1990a): Word Formation in British Sign Language. University of Stockholm, Stockholm, Sweden.

Brennan, M. (1990b): "Productive Morphology in British Sign Language”, in: Prillwitz, S. and Vollhaber, T. (Eds.) Current Trends in European Sign Language Research: Proceedings of the Third European Congress on Sign Language Research. Signum Press, Hamburg, Germany.

Brennan, M. (1992): "The Visual World of BSL: An Introduction”, in: Brien, D. (Ed.) Dictionary of British Sign Language/English. Faber and Faber, London, England.

Brennan, M. (1994): "Pragmatics and Productivity”, in: Ahlgren, I., Bergman, B. and Brennan, M. (Eds.) Perspectives on Sign Language Structure: Papers from the Fifth International Symposium on Sign Language Research (Volume 1). International Sign Linguistics Association, Durham, England.

Brien, D. and Brennan, M. (1995) Sign language dictionaries: issues and developments, in: Bos, H. and Schermer, T. (eds) Sign language Research 1994: proceedings of the Fourth European Congress on Sign Language Research in Munich. Hamburg, Germany: Signum Press.

Brien, D., Brennan, M., Schermer, T., Harder, R., and Bakker, R. (1995) Creating a sign language database: the SIGNBASE project, in: Bos, H. and Schermer, T. (eds) Sign language Research 1994: proceedings of the Fourth European Congress on Sign Language Research in Munich. Hamburg, Germany: Signum Press.

Johnston, T., Royal N.S.W. Institute for Deaf and Blind Children (1997) Signs of Australia on CD-ROM : a dictionary of Auslan (Australian Sign Language). In: North Rocks, NSW : North Rocks Pr. 1997 (Software)

Kuhn, T. (1970) The structure of scientific revolutions. Chicago, USA: Chicago University Press.

Stokoe, W.C., Casterline, D.C. and Cronenburg, C.G. (Revised edition, 1976): Dictionary of American Sign Language on Linguistic Principles. (1965). Linstok Press, Silver Spring, Maryland, USA.

6.3 Footnotes

1.The PLD development was carried out while being a member of the Deaf Studies Research Unit (DSRU), University of Durham. (back to the text)

2. As an example of what can be scripted easily by following such a content-driven model, the PLD makes the morphemic 'signature' of every sign entered explicit as a diagrammatic sketch (see the screen shots below). (back to the text)

3. Although interface properties such as these should be considered as integral part of database design, our own experience suggests that such considerations may slip all too easily to the bottom of the agenda in database development. (back to the text)


List of workshop papers