DOI: /10.25592/dgs.corpus-3.0

Welcome to the Public DGS Corpus Release 3!

In this portal you find 50 hours of video materials from the DGS-Korpus project made available together with annotations for research purposes.

If you want to download materials, please pay attention to the license conditions.

By clicking Transcripts, you can view the available data sorted by either transcript name or elicitation format. Several download links are available (cf. below, File Formats available for Download). By clicking the transcript name, you can open an online preview of the transcript. If you want to browse the videos first without paying attention to the annotation and you can read German, you find them in the MEINE DGS portal where they are available with subtitles.

By clicking Types, you can view the list of all sign types occurring in the public corpus. Click on one of these types to see all corresponding tokens in the public corpus. Clicking once again on a token reference brings directly to the occurrence in the transcript.

For all transcripts, keywords have been assigned in order to provide a rough content-related access to the data. By clicking Keywords, you can view an index of all keywords and find the transcripts they have been assigned to.

Background Information

Data Elicitation Formats

We used a set of 20 different tasks for the informants. The formats ranged from story-retelling (with prompts in sign, picture, or movie) to discussions on a given topic as well as free conversations. With careful planning, it was possible to make the mix of formats diverting enough that most participants enjoyed the recording session despite a net length of 5 hours.

The set contains a number of tasks previously used in other corpus projects, both on spoken and sign languages, to lay a basis for cross-linguistic research, as well as new formats. Not all details of the newly developed elicitation materials are available in the publications in order to keep the material suitable for future data collections. The materials are, however, available to other researchers upon request.

For more detail, please consult the following publications:

  • Hanke, Thomas / Hong, Sung-Eun / König, Susanne / Langer, Gabriele / Nishio, Rie / Rathmann, Christian (2010): “Designing Elicitation Stimuli and Tasks for the DGS Corpus Project”. Poster presented at the Theoretical Issues in Sign Language Research Conference (TISLR 10), Sept 30 – Oct 2, 2010 at Purdue University, Indiana, USA. [Poster]
  • Nishio, Rie / Hong, Sung-Eun / König, Susanne / Konrad, Reiner / Langer, Gabriele / Hanke, Thomas / Rathmann, Christian (2010). “Elicitation methods in the DGS (German Sign Language) Corpus Project”. Poster presented at the 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, following the 2010 LREC Conference in Malta, May 22-23, 2010. Paris: ELRA, pp. 178-185. [Paper] [Poster]

Data Collection Regions

From experiences in earlier projects, it was one of the key decisions to have a mobile studio to be set up in different places across Germany. The idea was to have as much of a “local” spirit with the recording location in the region and all persons involved coming from that region while still ensuring high-quality recordings needed for transcription. Obviously, the number of locations selected for recordings needs to be a compromise between, on the one hand, localness in the above sense and the informants’ travel times, and, on the other hand, the logistics.

Our solution was the definition of thirteen data collection regions, trying to respect the catchment areas of current and former Schools for the Deaf, state (Bundesland) borders determining a. o. educational settings, especially the former border between West and East Germany, suspected dialectal borders, but also practical considerations such as travel times to the recording locations. The regions were further subdivided into up to five sub-regions relevant for informant selection. Large metropolitan areas form their own sub-regions, in contrast to others with mixed or more rural structures.

Below you find a map of Germany showing the data collection regions. For comparison, there is a map of Germany showing the states (Bundesländer).

ber: Berlin6.18 M fra: Frankfurt8.69 M goe: Göttingen5.53 M hb: Bremen3.28 M hh: Hamburg2.82 M koe: Cologne10.84 M lei: Leipzig8.72 M mst: Münster9.08 M mue: Munich7.26 M mvp: Rostock1.69 M nue: Nuremberg5.23 M sh: Schleswig-Holstein2.83 M stu: Stuttgart10.74 M Schleswig-Holstein 2.83 M Hamburg 1.75 M Lower Saxony 8.62 M Bremen 0.66 M North Rhine-Westphalia 18.03 M Hesse 6.08 M Rhineland-Palatinate 4.05 M Baden-Württemberg 10.74 M Bavaria 12.50 M Saarland 1.04 M Berlin 3.40 M Brandenburg 2.54 M Mecklenburg-West Pomerania 1.69 M Saxony 4.22 M Saxony-Anhalt 2.43 M Thuringia 2.31 M

Informants

Due to the lack of census data on the Deaf population, the target number of informants per region was based on the population figures of the general population, with a weight of 2 for larger cities to reflect the common (though unconfirmed) observation that Deaf people often prefer to live in larger cities. Together with a set minimum of 16 informants per region (to cover four age groups times two sexes with at least two informants each), this resulted in a target number of 328 participants. We actually filmed 330 participants.

In the map below, you find the number of participants from each region, detailed by age group.

Berlin(Berlin,Brandenburg,partiallySaxony-Anhalt) Frankfurt(South Hesse,Saarland,partiallyRhineland-Palatinate) Göttingen(Hannover,South Lower Saxony,North Hesse) Bremen(Bremen,North-West Lower Saxony) Hamburg(Hamburg,North Lower Saxony) Cologne(North Rhine,partiallyRhineland-Palatinate) Leipzig(Saxony,Thuringia,partiallySaxony-Anhalt) Münster(Westphalia,Osnabrück,County ofBentheim) Munich(South Bavaria) Rostock(Mecklenburg-Vorpommern) Nuremberg(North Bavaria) Schleswig-Holstein Stuttgart(Baden-Württemberg) 18-30 31-45 46-60 61+

Across regions, the age groups are fairly well-balanced with respect to age groups, and perfectly with respect to sex.

40 45 42 38 165 male 41 46 41 37 165 female 81 91 83 75 330 total

Annotation Conventions applied in the DGS-Korpus Public Data

The annotation conventions are described in the project note AP03-2018-01.

File Formats available for Download

If you use iLex, please download the iLex file and import it into your iLex database. You may want to download the A, B and C camera perspectives as well in order to have them available locally. This is not strictly necessary as the iLex import file provides urls to access the files via https. In addition to the annotation, the iLex files contain metadata on the session as well as the informants.

If you use ELAN, please download the ELAN file and optionally the A, B and C movies, then open the ELAN file. Downloading the movie files is not strictly necessary as the ELAN file contains urls to access the files via https. However, this may turn out to be usefull for performance reasons when working in ELAN.

For other tools such as MaxQDA, it is often possible to import SRT (subtitle) files. Please note that the files linked differ between the English and the German pages. If the tool you are using can handle multiple video track files, download the A, B and C files. If the tool only accepts one file, you may want to use the AB movie file which is a side-by-side of the B and A perspective.

We make the OpenPose analyses of the A and B camera perspectives and the corresponding side views available. A download file contains the data for all four perspectives plus information on the spatial resolution of the input file (which is different from the resolution of the files offered here for download). Where the video is anonymised, the OpenPose data contains empty coordinates arrays. For size reasons, the files are zipped. For details on how the OpenPose data were processed, please cf. project note AP06-2019-01.

Finally, you can download a CMDI file containing metadata for the session as well as the participants.

How to cite

We ask you to cite corresponding DGS-Korpus publications if you publish your research based on this material.

If you want to cite the dataset itself, please find the citation data here. In order to cite individual transcripts or type data, please use the DOIs shown on the respective web pages. By clicking on any DOI, you not only get a list of all versions of that transcript or type already published, but also find a version-independent DOI always referring to the latest version published of that transcript or type.