Welcome to the Public DGS Corpus!

In this portal you find 50 hours of video materials from the DGS-Korpus project made available together with annoations for research purposes.

If you want to download materials, please pay attention to the license conditions.

By clicking Transcripts you can view the available data sorted by either transcript name ordered by either transcript name or elicitation format. Several download links are available. By clicking the transcript name, you can open an online preview of the transcript. If you want to browse the videos first without paying attention to the annotation and can read German, you find videos with subtitles in the MEINE DGS portal where the videos are available with subtitles.

By clicking Types, you can view the list of all sign types occurring in the public corpus. Click on one of these types to see all corresponding tokens in the public corpus. Clicking once again on a token reference brings directly to the occurrence in the transcript.

For all transcripts, keywords have been assigned in order to provide a rough content-related access to the data. By clicking Topics, you can view an index of all keywords and find the transcripts they have been assigned to.

Background Information

Data Elicitation Formats

We used a set of 20 different tasks for the informant. The formats ranged from story-retelling (with prompts in sign, picture, or movie) to discussions on a given topic as well as free conversations. With careful planning, it was possible to make the mix of formats diverting enough that most participants enjoyed the recording session despite a net length of 5 hours.

The set contains a number of tasks previously used in other corpus projects, both on spoken and sign languages, to lay a basis for cross-linguistic research, as well as new formats. Not all details of the newly developed elicitation materials are available in the publications in order to keep the material suitable for future data collections. The materials are, however, available to other researchers upon request.

For more detail, please consult the following publications:

  • Hanke, Thomas / Hong, Sung-Eun / König, Susanne / Langer, Gabriele / Nishio, Rie / Rathmann, Christian (2010): “Designing Elicitation Stimuli and Tasks for the DGS Corpus Project”. Poster presented at the Theoretical Issues in Sign Language Research Conference (TISLR 10), Sept 30– Oct 2, 2010 at Purdue University, Indiana, USA. [Poster]
  • Nishio, Rie / Hong, Sung-Eun / König, Susanne / Konrad, Reiner / Langer, Gabriele / Hanke, Thomas / Rathmann, Christian (2010). “Elicitation methods in the DGS (German Sign Language) Corpus Project”. Poster presented at the 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, following the 2010 LREC Conference in Malta, May 22.-23., 2010. Workshop Proceedings. W13. 4th Workshop on Representation and Processing of Sign Languages: Corpora and Sign Language Technologies. May 22/23 2010. Valetta – Malta. Paris: ELRA, pp. 178-185. [Paper] [Poster]

Data Collection Regions

From experiences in earlier projects, it was one of the key decisions to have a mobile studio to be set up in different places across Germany. The idea was to have as much of a “local” spirit with the recording location in the region and all persons involved coming from that region while still ensuring high-quality recordings needed for transcription. Obviously, the number of locations selected for recordings needs to be a compromise between localness in the above sense, but also relevant for the informants’ travel times, and the logistics.

Our solution was the definition of thirteen data collection regions, trying to respect the catchment areas of current and former Schools for the Deaf, state (Bundesland) borders determining a. o. educational settings, especially the former border between West and East Germany, suspected dialectal borders, but also practical considerations such as travel times to the recording locations. The regions were further subdivided into up to five sub-regions relevant for informant selection. Large metropolitan areas form their own sub-regions, in contrast to others with mixed or more rural structures.

Below to the left, you find a map of Germany showing the data collection regions. For comparison, you have a map of Germany showing the states (Bundesländer) on the right.

ber: Berlin fra: Frankfurt goe: Göttingen hb: Bremen hh: Hamburg koe: Cologne lei: Leipzig mst: Münster mue: Munich mvp: Rostock nue: Nuremberg sh: Schleswig-Holstein stu: Stuttgart Schleswig-Holstein 2,81 Mio Hamburg 1,73 Mio Lower Saxony 7,78 Mio Bremen 0,65 Mio North Rhine-Westphalia( 17,55 Mio Hesse 6,02 Mio Rhineland-Palatinate 3,99 Mio Baden-Württemberg 10,57 Mio Bavaria 12,52 Mio Saarland 0,99 Mio Berlin 3,38 Mio Land Brandenburg 2,45 Mio Mecklenburg-West Pomerania 1,60 Mio Saxony 4,05 Mio Saxony-Anhalt 2,26 Mio Thuringia 2,17 Mio

Informants

Due to the lack of census data on the Deaf population, the target number of informants per region was determined from the population figures of the community at large, with a weight of 2 for larger cities to reflect (unproven) experience that Deaf people often prefer to live in larger cities. Together with a set minimum of 16 informants per region (to cover four age groups times two sexes with at least two informants each), this resulted in a target number of 328 participants. We actually filmed 330 participants.

In the map below, you find the number of participants from each region, detailed by age group.

Berlin(Berlin,Brandenburg,partiallySaxony-Anhalt) Frankfurt(South Hesse,Saarland,partiallyRhineland-Palatinate) Göttingen(Hannover,South Lower Saxony,North Hesse) Bremen(Bremen,North-West Lower Saxony) Hamburg(Hamburg,North Lower Saxony) Cologne(North Rhine,partiallyRhineland-Palatinate) Leipzig(Saxony,Thuringia,partiallySaxony-Anhalt) Münster(Westphalia,Osnabrück,County ofBentheim) Munich(South Bavaria) Rostock(Mecklenburg-Vorpommern) Nuremberg(North Bavaria) Schleswig-Holstein Stuttgart(Baden-Württemberg) 18-30 31-45 46-60 61+

Across regions, the age groups are rather balanced with respect to age groups, and perfectly with respect to sex.

40 45 42 38 165 male 41 46 41 37 165 female 81 91 83 75 330 total

Annotation Conventions applied in the DGS-Korpus Public Data

The annotation conventions are described in the project note AP03-2018-01.

File Formats available for Download

If you use iLex, please download the iLex file and import it into your iLex database. You may want to download the A, B and C camera perspectives as well in order to have them available locally. This is not strictly necessary as the iLex import file provides urls to access the files via http.

If you use ELAN, please download the ELAN file as well as the A, B and C movies, then open the ELAN file. When asked, point ELAN to the movie files just downloaded.

For other tools such as MaxQDA, it is often possible to import SRT (subtitle) files. Please note that the files linked differ between the English and the German pages. If the tool you are using can handle multiple video track files, download the A, B and C files. If the tool only accepts one file, you may want to use the AB movie file which is a side-by-side of the B and A perspective.