Corpus
What is a corpus?
A corpus is a scientifically prepared collection of examples of how a language is commonly used. A corpus can e.g.contain a large number of written texts, or recorded or filmed conversations. Such data collections are used to explore the usage of a language or to find out about the vocabulary and grammar of this language.
Approach
The aim of our project is to develop a corpus that represents the everyday speech of competent deaf German Sign Language (DGS) users. For this purpose a nationwide data collection with over 300 deaf people in 12 locations in Germany will take place. By doing this regional differences of signs will be captured too. The data collection consists of several parts: Among others to be recorded will be conversations or discussions of two participants about several topics, as well as different tasks like the retelling of a picture story or a film. The recordings will contain about 400 hours of DGS-usage. This amount of data corresponds to approximately 2.25 million usages of signs which is comparable to corpora of spoken languages. The preparation of the corpus is very time-consuming as most of the work cannot be done automatically but has to be done by hand.
Usage
The corpus will be evaluated within the project to develop a DGS - German dictionary. This corpus is supposed to give linguists the possibility to analyse different aspects of DGS on an empirical basis, even beyond the duration of the project.
The corpus will contain many explanations that will be interesting for the deaf community from a cultural point of view and which can be used for example in schools for the deaf or in DGS-courses. The signed stories and talks are also available to all who are interested in DGS and enjoy sign language.