Corpus

What is a corpus?

A corpus is a scientifically edited collection of examples of how a language is commonly used. A corpus can, for example, contain a large number of written texts, or recorded or filmed conversations. Such data collections are used to explore the usage of a language or to find out about the vocabulary and grammar of this language.

Approach

One of the aims of the project is to develop a corpus that represents the everyday speech of competent Deaf German Sign Language (DGS) users. For this purpose a nationwide data collection with 330 Deaf people in 12 locations in Germany has been carried out. By this procedure, regional differences of signs have been captured, too.
The data collection consisted of several parts: conversations or discussions of two participants about several topics as well as different tasks like the retelling of a picture story or a film. The recordings contain about 500 hours of DGS-usage. This amount of data corresponds to approximately 3 million usages (occurences) of signs, which is comparable to corpora of spoken languages. The annotation of the corpus is very time-consuming as most of the work cannot be done automatically but has to be done by hand.

Usage

 

The corpus contains a variety of narrations that are interesting for the Deaf community from a cultural point of view. These recordings are also potentially useful as educational material for schools for the Deaf or DGS courses.

In order to make this valuable material also accessible to everyone interested in DGS and sign language, for publication as a public corpus primarily narrations and conversations have been selected. The portal Meine DGS (meine-dgs.de) [my DGS] contains more than 47 hours of these narrations and conversations in form of DGS videos and their respective translations (subtitles). In addition, 88 selected jokes (approximately 2.5 hours) are presented there as well.

 

The research portal (ling.meine-dgs.de) of the public corpus is freely accessible and includes another 100 minutes of DGS videos. This video material exemplifies the different tasks employed in the course of the elicitation.

All videos on this website, with the exception of the jokes, are not only translated but also provided with annotations of corresponding glosses and mouthings. All written text, except for the mouthings, is also available in English. The corpus is meant to give linguists the possibility to analyse different aspects of DGS on an empirical basis, even beyond the duration of the project itself.

 

Within the project, the corpus is being analysed along the needs and requirement for developing the DGS – German dictionary.