Dialogue act annotated spoken corpus GORDAN 1.0 (transcription)

Dataset

PID

The GORDAN 1.0 corpus contains authentic data of spoken communication, annotated for dialogue acts according to the GORDAN 1.0 dialogue act annotation scheme, included in the data. The corpus data were selected from existing Slovene speech corpora: GOS (http://hdl.handle.net/11356/1040), Gos Videolectures (http://hdl.handle.net/11356/1223) and BERTA. Four criteria were taken into account in the selection: public/non-public, interactive/monologic, channel and intention. The total length of the data is 1 hour of recordings (6,909 words). The selected data were annotated using the Transcriber 1.5.1 tool and its function Event. Annotation was done based on multimodal data, listening to the audio or watching the video recording, where available.

This resource contains only annotated transcriptions of the corpus – audio and video recordings are available at http://hdl.handle.net/11356/1292.

Identifier
PID	http://hdl.handle.net/11356/1291
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1291

Provenance
Creator	Verdonik, Darinka
Publisher	Faculty of Electrical Engineering and Computer Science, University of Maribor
Publication Year	2020
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	corpus
Format	application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline	Linguistics