DOI: 10.25592/dgs.corpus-4.0

Welcome to the Public DGS Corpus Release 4!

In this portal you will find 50 hours of video materials from the DGS-Korpus project which have been made available along with annotations for research purposes.

If you want to download materials, please pay attention to the license conditions.

Experts in corpus query languages may want to check our MEINE DGS – ANNIS portal featuring almost the same dataset as this site, but allowing more complex searches than possible here.

By clicking Types, you can view the list of all sign types occurring in the public corpus. Click on one of these types to see all corresponding tokens in the public corpus. Clicking once again on a token reference brings directly to the occurrence in the transcript.

By clicking Formats, you get a list of all elicitation formats used, each with the number of transcripts for this format in the Public DGS Corpus. Click on one format to see more details on the format as well as the list of all transcripts, sorted by regions and topics.

For all transcripts, keywords have been assigned in order to provide a rough content-related access to the data. By clicking Keywords, you can view an index of all keywords and find the transcripts they have been assigned to.

Background Information

Data Elicitation Formats

A set of 20 different tasks for the participants were used. The formats ranged from story-retelling (with prompts in sign, picture, or movie) to discussions on a given topic as well as free conversations. With careful planning, it was possible to make the mix of formats diverting enough that most participants enjoyed the recording session despite a net length of 5 hours.

The set contains a number of tasks previously used in other corpus projects, both on spoken and sign languages, to lay a basis for cross-linguistic research, as well as new formats. Not all details of the newly developed elicitation materials are available in the publications in order to keep the material suitable for future data collections. The materials are, however, available to other researchers upon request.

Data Collection Regions

From experiences in earlier projects, it was one of the key decisions to have a mobile studio to be set up in different places across Germany. The idea was to have as much of a “local” spirit with the recording location in the region and all persons involved coming from that region while still ensuring high-quality recordings needed for transcription. Obviously, the number of locations selected for recordings needs to be a compromise between, on the one hand, localness in the above sense and the participants’ travel times, and, on the other hand, the logistics.

Our solution was the definition of thirteen data collection regions, trying to respect the catchment areas of current and former Schools for the Deaf, state (Bundesland) borders determining a. o. educational settings, especially the former border between West and East Germany, suspected dialectal borders, but also practical considerations such as travel times to the recording locations. The regions were further subdivided into up to five sub-regions relevant for participant selection. Large metropolitan areas form their own sub-regions, in contrast to others with mixed or more rural structures.

Below you find a map of Germany showing the data collection regions. For comparison, there is a map of Germany showing the states (Bundesländer).