Autumn School: Sign language data meets data science – data science meets sign linguistics
Workshop dates: 25–26 September 2023
Workshop place: Institute of German Sign Language and Communication of the Deaf, University of Hamburg
Organized by: The EU EASIER project in cooperation with the DGS-Korpus project
The goal of this Autumn School is to generate expertise for under-resourced sign languages to extend the scope of EASIER to more European sign languages. The idea is not only to support people who already work with sign languages and train them in technological approaches, but also to train people who already work with in data science, language technologies, etc. in the handling of sign languages. Therefore, both groups will have a general input in the beginning on the subject that is new to them.
The subsequent sessions will discuss different topics needed to develop sign language processing, as well as a session where participants can present their own work and exchange ideas in a more flexible setting.
Workshop languages are English and International Sign.
Please be aware that this is a tentative schedule. Single sessions may change.
Financial Support: We also offer conference support to attend the workshop. More information can be found here: https://www.project-easier.eu/conference-support/
Tentative schedule
Syllabus
101: Data science for sign language linguists
Thomas Hanke, U Hamburg
With language data becoming ever bigger in their amount we need automatic processes to gain more insights into linguistic structures. In this class participants will learn about different techniques from the field of data science and natural language processing which might be of interest for their daily work. This course also lays a good basis for the subsequent sessions and off-line discussions.
First things first: Things you have to think about when collecting sign language data
Johanna Mesch, Stockholm U
Sign languages are collected in their true form only by video recordings. This class will look at different points that researchers and field workers have to think of when collecting sign language data. That is the technical aspects, like how many cameras are needed and where and how to store the data, as well as linguistic aspects, like what are the best ways to elicit spontaneous, natural signing. The talk will show examples from a gold-standard corpus.
101: Sign linguistics for data scientists
Annika Herrmann
Sign languages are fully-fledged natural languages. This session gives a general introduction into the topic of sign language linguistics, covering key aspects that differentiate sign languages from spoken languages, like the use of 3D space with multiple simultaneous articulators. The class will lay a ground for the subsequent sessions and off-line discussions.
Transcription: Improving the quality of annotation by using a lexical database
Kearsy Cormier, UC London
The session will look at different technologies used to transcribe and align sign language data and how to organize one's vocabulary. The workflow with a lexical database integrated in one’s annotation work will be the key topic. Different kinds of lexical databases will be shown, to see how big the difference can be when looking at richness of the resources (only glosses, glosses and keywords, other information, as phonological transcripts, translations, examples, etc.). The concept of ID-glosses is introduced and different annotation conventions will be discussed.
Sequential and simultaneous: sign language morphology
Cornelia Loos, U Hamburg
This session introduces sign language morphology and it’s key points: sequentiality and simultaneity.
Making your dictionary useful beyond looking up signs and making it fit for automatic processing
Sarah Ebling, U Zurich
Dictionaries are very useful tools for common language users. By adding some more content and offering a few more options (regarding download and license), a dictionary can become a even more valuable resource, useful not only for the common user but also for data scientists. This class will talk about the requirements for valuable dictionary resources and why we should take this aspect into account.
From signing space to strings on paper: writing down sign languages phonetically
Maria Kopf, U Hamburg
This session introduces sign language phonetics and explains how the smallest building blocks of sign languages can be written down in a phonetic writing system such as HamNoSys.
Future technologies I: Using the power of Wordnet for dictionary work
Sam Bigeard, U Hamburg
Wordnets are a powerful tool to represent the sense of a sign beyond providing keywords in a spoken language. They make dictionaries easier to explore, avoid mistranslation, and can augment your dictionary with images and translations in many languages. This session explains what a Wordnet is, how to use Wordnet, how to index your dictionary with Wordnet senses, and why you would want to go that extra mile.Getting started with sign language data: Where do I find sign language data and how do I treat them – handling glosses, overfitting, and NLP tools
Thomas Hanke, U Hamburg
This session introduces different kinds of sign language data and defines what minimal requirements for usable sign language resources are. Participants will learn what data already exist, what to consider when harvesting web data and how to recognize good quality data. The influence of the source language on the signed data will be discussed (interpreted spoken language vs. original signing).
Future technologies II: Make your data searchable with keywords, HamNoSys and the Super Spotter
Reiner Konrad & Maren Brumm, U Hamburg
Finding one’s way through big amounts of data can be difficult. This class will show how one can search a corpus with methods such as keywords, HamNoSys transcripts and future technology as sign spotting.
Do you like it? Gathering Feedback in an accessible way
Davy van Landuyt, European Union of the Deaf
Gathering feedback for sign language technology differs in many aspects from gathering feedback for other technological tools. To really meet one’s target group one has to think about the accessibility of common feedback systems. This class will show possible strategies to meet the signing community and offer possibilities for feedback in an accessible way, such as answering via video instead of text.
Hands-on Fair
(everybody)
The Hands-on Fair invites participants and lecturers as well as EASIER project members to present and demonstrate their data and technology. For more context we ask every interested person to provide a poster with background information on the data or tool they present.
No MoCap, no problem – MediaPipe and OpenPose
Amit Moryossef, U Zurich
Motion Capture data is very expensive in terms of production and the amount of available MoCap data is much smaller than simple video data. MediaPipe and OpenPose offer pose estimation for video data that can be used to process signing. This class will show how this technology works and what typical use cases are.
From strings on paper to signing space: Animating sign languages
Rosalee Wolfe, ATHENA RC
Displaying sign languages as output from MT systems is more difficult than displaying text of a spoken language because sign languages have no widely accepted written form. This presentation will cover sign language display strategies and discuss some of the interesting open questions regarding textual representations that promise the most effective support of sign language display.
Not just the hands – full on sign language recognition
Richard Bowden, U Surrey
Signed languages are not just produced by the hands – face and body convey grammar and other important information. This session shows how these features can be automatically recognized and what common hurdles are.
Ethical open data principles: Share and care for your data, and don’t forget to anonymise
Marc Schulder, U Hamburg
This session addresses how data can be published in ways that are both open and ethical. It introduces the FAIR and CARE principles and how they apply to sign language data. This covers topics such as data access, usage licences, the importance of metadata and documentation, persistent identifiers, but also accountability, working with and for sign language communities and respecting participants rights through the use of informed consent and appropriate anonymisation of (meta)data.
Registration
There is no registration fee to the autumn school, but registration is required. Registrations is now closed.
Workshop location
Gorch-Fock-Wall 7, 20354 Hamburg
Public transport: U2 Gänsemarkt, U1 Stephansplatz, S11/S21/S31 Dammtor