Autumn School: Sign language data meets data science – data science meets sign linguistics

Workshop dates: 25–26 September 2023

Workshop place: Institute of German Sign Language and Communication of the Deaf, University of Hamburg

Organized by: The EU EASIER project in cooperation with the DGS-Korpus project

The goal of this Autumn School is to generate expertise for under-resourced sign languages to extend the scope of EASIER to more European sign languages. The idea is not only to support people who already work with sign languages and train them in technological approaches, but also to train people who already work with in data science, language technologies, etc. in the handling of sign languages. Therefore, both groups will have a general input in the beginning on the subject that is new to them.

The subsequent sessions will discuss different topics needed to develop sign language processing, as well as a session where participants can present their own work and exchange ideas in a more flexible setting.

Workshop languages are English and International Sign.

Please be aware that this is a tentative schedule. Single sessions may change.

Financial Support: We also offer conference support to attend the workshop. More information can be found here: https://www.project-easier.eu/conference-support/

Tentative schedule

TimeTrack ATrack B
DAY 1: Monday 25 Sept. 2023
09:00 – 09:30Registration (A0018)
09:30 – 10:15101: Data science for sign language linguists (A0020) 101: Sign linguistics for data scientists (C0059)
10:15 – 11:00First things first: Things you have to think about when collecting sign language data (A0020)
11:00 – 11:30Coffee break (C0048)
11:30 – 12:15Transcription: Improving the quality of annotation by using a lexical database (A0020) Sequential and simultaneous: sign language morphology (C0059)
12:15 – 13:00Making your dictionary useful beyond looking up signs and making it fit for automatic processing (A0020) From signing space to strings on paper: writing down sign languages phonetically (C0059)
13:00 – 13:45Lunch (C0048)
13:45 – 14:30Future technologies I: Using the power of wordnet for dictionary work (A0020) Getting started with sign language data: Where do I find sign language data and how do I treat them – handling glosses, overfitting, and NLP tools (C0059)
14:30 – 15:15Future technologies II: Make your data searchable with keywords, HamNoSys and the Super Spotter (A0020) Do you like it? Gathering Feedback in an accessible way (C0059)
15:15 – 15:45Coffee break (C0048)
15:45 – 18:00Hands-on Fair (everyone can present) (C1053)
DAY 2: Tuesday 26 Sept. 2023
09:00 – 09:45No MoCap, no problem – MediaPipe and OpenPose (C1053)
09:45 – 10:30From strings on paper to signing space: Animating sign languages (C1053)
10:30 – 11:00Coffee break (C0048)
11:00 – 11:45Not just the hands – full on sign language recognition (C1053)
11:45 – 12:30Ethical open data principles: Share and care for your data, and don’t forget to anonymise (C1053)
12:30 – 13:00Closing (C1053)

Syllabus

101: Data science for sign language linguists

Thomas Hanke, U Hamburg

With language data becoming ever bigger in their amount we need automatic processes to gain more insights into linguistic structures. In this class participants will learn about different techniques from the field of data science and natural language processing which might be of interest for their daily work. This course also lays a good basis for the subsequent sessions and off-line discussions.

First things first: Things you have to think about when collecting sign language data

Johanna Mesch, Stockholm U

Sign languages are collected in their true form only by video recordings. This class will look at different points that researchers and field workers have to think of when collecting sign language data. That is the technical aspects, like how many cameras are needed and where and how to store the data, as well as linguistic aspects, like what are the best ways to elicit spontaneous, natural signing. The talk will show examples from a gold-standard corpus.

101: Sign linguistics for data scientists

Annika Herrmann

Sign languages are fully-fledged natural languages. This session gives a general introduction into the topic of sign language linguistics, covering key aspects that differentiate sign languages from spoken languages, like the use of 3D space with multiple simultaneous articulators. The class will lay a ground for the subsequent sessions and off-line discussions.

Transcription: Improving the quality of annotation by using a lexical database

Kearsy Cormier, UC London

The session will look at different technologies used to transcribe and align sign language data and how to organize one's vocabulary. The workflow with a lexical database integrated in one’s annotation work will be the key topic. Different kinds of lexical databases will be shown, to see how big the difference can be when looking at richness of the resources (only glosses, glosses and keywords, other information, as phonological transcripts, translations, examples, etc.). The concept of ID-glosses is introduced and different annotation conventions will be discussed.

Sequential and simultaneous: sign language morphology

Cornelia Loos, U Hamburg

This session introduces sign language morphology and it’s key points: sequentiality and simultaneity.

Making your dictionary useful beyond looking up signs and making it fit for automatic processing

Sarah Ebling, U Zurich

Dictionaries are very useful tools for common language users. By adding some more content and offering a few more options (regarding download and license), a dictionary can become a even more valuable resource, useful not only for the common user but also for data scientists. This class will talk about the requirements for valuable dictionary resources and why we should take this aspect into account.

From signing space to strings on paper: writing down sign languages phonetically

Maria Kopf, U Hamburg

This session introduces sign language phonetics and explains how the smallest building blocks of sign languages can be written down in a phonetic writing system such as HamNoSys.

Future technologies I: Using the power of Wordnet for dictionary work

Sam Bigeard, U Hamburg

Wordnets are a powerful tool to represent the sense of a sign beyond providing keywords in a spoken language. They make dictionaries easier to explore, avoid mistranslation, and can augment your dictionary with images and translations in many languages. This session explains what a Wordnet is, how to use Wordnet, how to index your dictionary with Wordnet senses, and why you would want to go that extra mile.

Getting started with sign language data: Where do I find sign language data and how do I treat them – handling glosses, overfitting, and NLP tools

Thomas Hanke, U Hamburg

This session introduces different kinds of sign language data and defines what minimal requirements for usable sign language resources are. Participants will learn what data already exist, what to consider when harvesting web data and how to recognize good quality data. The influence of the source language on the signed data will be discussed (interpreted spoken language vs. original signing).

Future technologies II: Make your data searchable with keywords, HamNoSys and the Super Spotter

Reiner Konrad & Maren Brumm, U Hamburg

Finding one’s way through big amounts of data can be difficult. This class will show how one can search a corpus with methods such as keywords, HamNoSys transcripts and future technology as sign spotting.

Do you like it? Gathering Feedback in an accessible way

Davy van Landuyt, European Union of the Deaf

Gathering feedback for sign language technology differs in many aspects from gathering feedback for other technological tools. To really meet one’s target group one has to think about the accessibility of common feedback systems. This class will show possible strategies to meet the signing community and offer possibilities for feedback in an accessible way, such as answering via video instead of text.

Hands-on Fair

(everybody)

The Hands-on Fair invites participants and lecturers as well as EASIER project members to present and demonstrate their data and technology. For more context we ask every interested person to provide a poster with background information on the data or tool they present.

No MoCap, no problem – MediaPipe and OpenPose

Amit Moryossef, U Zurich

Motion Capture data is very expensive in terms of production and the amount of available MoCap data is much smaller than simple video data. MediaPipe and OpenPose offer pose estimation for video data that can be used to process signing. This class will show how this technology works and what typical use cases are.

From strings on paper to signing space: Animating sign languages

Rosalee Wolfe, ATHENA RC

Displaying sign languages as output from MT systems is more difficult than displaying text of a spoken language because sign languages have no widely accepted written form. This presentation will cover sign language display strategies and discuss some of the interesting open questions regarding textual representations that promise the most effective support of sign language display.

Not just the hands – full on sign language recognition

Richard Bowden, U Surrey

Signed languages are not just produced by the hands – face and body convey grammar and other important information. This session shows how these features can be automatically recognized and what common hurdles are.

Ethical open data principles: Share and care for your data, and don’t forget to anonymise

Marc Schulder, U Hamburg

This session addresses how data can be published in ways that are both open and ethical. It introduces the FAIR and CARE principles and how they apply to sign language data. This covers topics such as data access, usage licences, the importance of metadata and documentation, persistent identifiers, but also accountability, working with and for sign language communities and respecting participants rights through the use of informed consent and appropriate anonymisation of (meta)data.

Registration

There is no registration fee to the autumn school, but registration is required. Registrations is now closed.

Workshop location

Gorch-Fock-Wall 7, 20354 Hamburg

Public transport: U2 Gänsemarkt, U1 Stephansplatz, S11/S21/S31 Dammtor