Dataset - B2FIND

German Twitter Titling Corpus

The German Titling Twitter Corpus consists of 1904 stance-annotated tweets collected in June/July 2018 mentioning 24 German politicians with a doctoral degree. The Addendum...

WikiWarsDE Corpus

The WikiWarsDE corpus is a German corpus containing Wikipedia articles with annotations of temporal expressions. Its creation was motivated by the English WikiWars corpus (Mazur...

Czech RST Discourse Treebank 1.0

The Czech RST Discourse Treebank 1.0 (CzRST-DT 1.0) is a dataset of 54 Czech journalistic texts manually annotated using the Rhetorical Structure Theory (RST). Each text...

KPWr annotation guidelines - phrase lemmatization

Annotation guidelines for manual phrase lemmatisation in KPWr (Polish Corpus of Wrocław University of Technology).

PELCRA EMO corpus

The corpus comprises 30 focused structured interviews (17 hours and ca. 200000 word tokens) centred on the topic of emotions. The corpus has bibliographic, morphosyntactic and...

KPWr annotation guidelines - keywords (1.0)

Annotation guidelines (first version) for keywords in KPWr (Polish Corpus of Wrocław University of Technology (https://clarin-pl.eu/dspace/handle/11321/270).

Polish Corpus of Wrocław University of Technology 1.3 Korpus Języka Polskieg...

KPWr (Polish Corpus of Wrocław University of Technology, pol. Korpus Języka Polskiego Politechniki Wrocławskiej) is a corpus of written and spoken documents available on the...

DiaBiz.Kom sample 1.0

DiaBiz.Kom sample is a sample of DiaBiz.Kom corpus, which is a dialog corpus comprising transcriptions of phone-based customer-agent interactions in several key business domains...

Polish Spatial Texts (PST) 2.0

The extended version of Polish Spatial Text corpus. Texts derived from polish travel blogs manually annotated with spatial expressions. A spatial expression is a text fragment...

The Adventure of the Speckled Band 1.0 (manually tagged)

"The Adventure of the Speckled Band" (pol. "Sherlock Holmes i Pstrokata Opaska") by Arthur Conan Doyle - modern Polish translation manually tagged with morphological...

KPWr chunks 2021

357 documents from KPWr corpus annotated manually at syntactic level (chunks). Please cite as: Oleksy, M., Walentynowicz, W., & Wieczorek, J. (2021). New approach to the...

Polish Spatial Texts (PST) 1.0

Texts derived from polish travel blogs manually annotated with spatial expressions, A spatial expression is a text fragment which describes a relative location of two or more...

STYX 1.0

STYX 1.0 is a corpus of Czech sentences selected from the Prague Dependency treebank. The criterion for including sentences into STYX was their suitability for practicing Czech...

Etalon 1.0

Etalon is a manually annotated corpus of contemporary Czech. The corpus contains 1,885,589 words (2,265,722 tokens) and is annotated in the same way as SYN2020 of the Czech...

HetWiK: Heterogene Widerstandskulturen

The representative full-text digitalized HetWiK corpus is composed of 140 manually annotated texts of the German Resistance between 1933 and 1945. This includes both well-known...

KAMOKO: KAsseler MOrgenstern KOrpus (2021-02-09)

KAMOKO is a structured and commented french learner-corpus. It addresses the central structures of the French language from a linguistic perspective (18 different courses). The...

KAMOKO: KAsseler MOrgenstern KOrpus

KAMOKO is a structured and commented french learner-corpus. It addresses the central structures of the French language from a linguistic perspective (18 different courses). The...

Szeged Corpus 1.0

written, monolingual, general, manually POS annotated reference corpus; 1,247,546 tokens; MSD tagset, XML (TEIxLite) files

Szeged Corpus 2.0

written, monolingual, general, manually POS annotated reference corpus; 1,459,288 tokens; MSD tagset, XML (TEI P4) files

Czech Malach Cross-lingual Speech Retrieval Test Collection

The package contains Czech recordings of the Visual History Archive which consists of the interviews with the Holocaust survivors. The archive consists of audio recordings, four...

27 datasets found