Prague DaTabase of Spoken Czech 1.0

Dataset

PID

PDTSC 1.0 is a multi-purpose corpus of spoken language. 768,888 tokens, 73,374 sentences and 7,324 minutes of spontaneous dialog speech have been recorded, transcribed and edited in several interlinked layers: audio recordings, automatic and manual transcription and manually reconstructed text.

PDTSC 1.0 is a delayed release of data annotated in 2012. It is an update of Prague Dependency Treebank of Spoken Language (PDTSL) 0.5 (published in 2009). In 2017, Prague Dependency Treebank of Spoken Czech (PDTSC) 2.0 was published as an update of PDTSC 1.0.

Identifier
PID	http://hdl.handle.net/11234/1-2375
Related Identifier	http://hdl.handle.net/11234/1-3189
Related Identifier	http://hdl.handle.net/11234/1-3185
Related Identifier	https://ufal.mff.cuni.cz/pdtsc1.0/
Metadata Access	http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-2375

Provenance
Creator	Hajič, Jan; Pajas, Petr; Ircing, Pavel; Romportl, Jan; Peterek, Nino; Spousta, Miroslav; Mikulová, Marie; Grůber, Martin; Legát, Milan
Publisher	Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL); University of West Bohemia
Publication Year	2017
Rights	Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); http://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess	true
Contact	lindat-help(at)ufal.mff.cuni.cz

Representation
Language	Czech
Resource Type	corpus
Format	text/plain; charset=utf-8; text/html; application/zip; downloadable_files_count: 2
Discipline	Linguistics