-
UFAL Speech Corpus of North Levantine Arabic 1.0 - Part 2
The corpus contains recordings by the native speakers of the North Levantine Arabic (apc) acquired during 2020, 2021, and 2023 in Prague, Paris, Kabardia, and St. Petersburg.... -
UFAL Speech Corpus of North Levantine Arabic 1.0 - Part 1
The corpus contains recordings by the native speakers of the North Levantine Arabic (apc) acquired during 2020, 2021, and 2023 in Prague, Paris, Kabardia, and St. Petersburg.... -
Hindi Visual Genome 1.1
Data Hindi Visual Genome 1.1 is an updated version of Hindi Visual Genome 1.0. The update concerns primarily the text part of Hindi Visual Genome, fixing translation issues... -
Prague Czech-English Dependency Treebank 2.0 Coref
The Prague Czech-English Dependency Treebank 2.0 Coref (PCEDT 2.0 Coref) is a parallel treebank building upon the original PCEDT 2.0 release and enriching it with the extended... -
Large-Scale Colloquial Persian 0.5
"Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a... -
Multilingual corpus of juridical texts
International conventions and treaties arranged as a paralell corpus aligned on paragraph level -
QTLeap WSD/NED corpus
This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). The texts are Q&A interactions from the... -
BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)
BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed,...