Prague Dependency Treebank 2.0 - sample data

PID

A small subset of PDT 2.0 made available under a permissive license.

Prague Dependency Treebank 2.0 (PDT 2.0) contains a large amount of Czech texts with complex and interlinked morphological (2 million words), syntactic (1.5 MW) and complex semantic annotation (0.8 MW); in addition, certain properties of sentence information structure and coreference relations are annotated at the semantic level.

PDT 2.0 is based on the long-standing Praguian linguistic tradition, adapted for the current Computational Linguistics research needs. The corpus itself uses the latest annotation technology. Software tools for corpus search, annotation and language analysis are included. Extensive documentation (in English) is provided as well.

Identifier
PID http://hdl.handle.net/11858/00-097C-0000-0001-B43E-6
Related Identifier http://hdl.handle.net/11858/00-097C-0000-0001-B098-5
Related Identifier http://ufal.mff.cuni.cz/pdt2.0/doc/pdt-guide/en/html/ch03.html#a-data-sample
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11858/00-097C-0000-0001-B43E-6
Provenance
Creator Hajič, Jan; Panevová, Jarmila; Sgall, Petr; Pajas, Petr; Štěpánek, Jan; Havelka, Jiří; Mikulová, Marie; Žabokrtský, Zdeněk; Ševčíková-Razímová, Magda
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2006
Rights Creative Commons - Attribution 3.0 Unported (CC BY 3.0); http://creativecommons.org/licenses/by/3.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Czech
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics