What is a corpus?

A corpus is a scientifically edited collection of examples of how a language is commonly used. A corpus can, for example, contain a large number of written texts, or recorded or filmed conversations. Such data collections are used to explore the usage of a language or to find out about the vocabulary and grammar of this language.


One of the aims of the project is to develop a corpus that represents the everyday speech of competent Deaf German Sign Language (DGS) users. For this purpose a nationwide data collection with 330 Deaf people in 12 locations in Germany has been carried out. By this procedure, regional differences of signs have been captured, too.
The data collection consisted of several parts: conversations or discussions of two participants about several topics as well as different tasks like the retelling of a picture story or a film. The recordings contain about 500 hours of DGS-usage. This amount of data corresponds to approximately 3 million usages (occurences) of signs, which is comparable to corpora of spoken languages. The annotation of the corpus is very time-consuming as most of the work cannot be done automatically but has to be done by hand.


The corpus will be analysed within the project to develop a DGS - German dictionary. This corpus is supposed to give linguists the possibility to analyse different aspects of DGS on an empirical basis, even beyond the duration of the project.

The corpus contains many narrations that will be interesting for the deaf community from a cultural point of view and which can be used e.g. in schools for the deaf or in DGS-courses. The signed stories and talks are also available to all who are interested in DGS and enjoy sign language.