Steven Gillis
University of Antwert, Dept. of Linguistics, Center for Dutch Language and Speech
steven.gillis@uia.ua.ac.be

Working with Language Corpora: The Case of CHILDES

As more researchers turn to the analysis of spontaneous language samples, questions about various aspects of the methodology of corpus work emerge. In this paper we give an overview of the problems that turn up in the collection, transcription and analysis of spontaneous language samples and we illustrate how these questions are answered in the CHILDES system.

The topics that we will address are:

  1. data collection methods and procedures;
  2. transcription of spontaneous language samples;
  3. annotation of data: different levels of annotation, standards for annotation;
  4. using the database: generality, searchability and maintanability of the transcribed and annotated data.

In our discussion of these topics we will take CHILDES as a reference point, but we will also refer to the practical experience of various research groups from around the world with collecting, transcribing and annotating (spoken) language corpora. The main theme will be: how can we find a balance between methodological soundness and practical feasibility?