Eesti keele spontaanse kõne foneetiline korpus v.1.0.0 Phonetic Corpus of Estonian Spontaneous Speech v.1.0.0

PID

The aim of the corpus is to compile a large amount of quality recordings of spontaneous Estonian and segment it phonetically on different levels. The project started in autumn 2006.

The total size of the corpus is approximately 60 hours of speech from 100 speakers with different dialectological and social background. Speakers are from different age groups. They are asked to participate with face-to-face invitation and they are aware of the purpose of the recordings.

Most of the recordings are made in a recording studio, some also on fieldwork. The signal of each speaker is recorded in a separate channel. The distance between the speakers is about 3 meters to minimize the effect of overlaps. For the field-work recordings head-set microphones are used. Recordings are saved in PCM wav-format and are not compressed. Background information about the recordings is collected in a text-file. Segmentation and annotation files are saved as Praat TextGrid files and get same filenames as recordings segmented.

Segmentation and annotation Segmentation and annotation is done with the Praat program (www.praat.org). Recordings are segmented manually on different levels (automatic segmentation program is also elaborated and tested). Following tiers are used: -Words (in orthographic spelling), -Phonemes (SAMPA adjusted for Estonian is used for transcription), -Syllables (short – long, open – closed), -Prosodic feet, -Intonation phrases or inter-pausal units; -Changes in voice quality (e.g. creaky voice); More info at http://www.keel.ut.ee/foneetikakorpus/

Identifier
PID http://hdl.handle.net/11297/1-00-0000-0000-0000-0003-1
Metadata Access https://metashare.ut.ee/oai_pmh/?verb=GetRecord&metadataPrefix=olac&identifier=fd53d03a5a5b11e2a6e4005056b4002422b8100905c64934aeda1f06b0c77fb8
Provenance
Creator Pärtel Lippus, partel.lippus[at]ut.ee, Tartu Ülikool, Pire Teras, pire.teras[at]ut.ee, Tartu Ülikool, Tuuli Tuisk, tuuli.tuisk[at]ut.ee, Tartu Ülikool, Nele Salveste, nele.salveste[at]ut.ee, Tartu Ülikool
Publisher CLARIN
Contributor Pärtel Lippus, partel.lippus[at]ut.ee, Tartu Ülikool
Publication Year 2022
Rights CLARIN_RES
OpenAccess true
Contact info(at)keeleressursid.ee
Representation
Language Estonian
Resource Type Text; Sound
Format wav
Size 450 000 words, 60 hours
Discipline Linguistics