-
Corpus of scientific texts of contemporary Slovenian KZB 1.0
The Corpus of scientific texts of contemporary Slovenian consists of 25 million words from scientific monographs and scientific papers written mainly between 2000 and 2023. It... -
Corpus of Slovenian periodicals (1771-1914) sPeriodika 1.0
The corpus of Slovenian periodicals sPeriodika contains linguistically annotated periodicals published during the 18th, 19th, and beginning of 20th century (1771-1914). The... -
The Sarajevo Corpus of SMS Messages in Bosnian 1.1
This corpus is specialized, static (i.e., no future growth is planned), diachronic and covers the period from 2002 to 2022. The SMS messages included in this corpus were... -
The Sarajevo Corpus of SMS Messages in Bosnian 1.0
This corpus is specialized, static (i.e., no future growth is planned), diachronic and covers the period from 2002 to 2022. The SMS messages included in this corpus were... -
Parallel Corpus (EN-FR-LT) of EU Financial Documents (ELEXIS)
Parallel corpus is comprised of 154 EU legislative documents (English documents and their translations into French and Lithuanian) related to various financial issues and... -
Parallel Corpus (EN-LT) of EUR-Lex Documents That Include Terms with the Adje...
Bilingual parallel corpus of the EU English documents containing terms with the adjective 'green' and their Lithuanian translations. The size of the corpus is 4,447,683 words in... -
Parallel Corpus (EN-LT-DA) of General Data Protection Regulation (ELEXIS)
Trilingual parallel corpus on general data protection regulation. The size of the corpus is 54,468 words in English, 42,566 words in Lithuanian, and 47,740 words in Danish. -
Monolingual Mining Corpus - RudKorP (ELEXIS)
RudKorP - Rudarski javno dostupan korpus - Serbian Public Mining Corpus, specialized corpus in the field of mining and mineral resource exploitation, containing research papers,... -
Bilingual Corpus of Underground Mining (ELEXIS)
PodzemniRadovi-sr-en, dvojezični poravnati korpus radova iz oblasti rudarstva. Undeground-mining-sr-en: bilingual texts from the Underground Mining Engineering journal (55... -
Parallel Corpus (EN-LT-FR) of EUR-Lex Document Extracts That Include Terms wi...
Trilingual parallel corpus of EUR-Lex Document Extracts that include terms with colour names (black, white and grey). The size of the corpus is 23,198 words in English, 19,262... -
Concordance of Trubar's Gospel of St. Matthew (1555) (ELEXIS)
Konkordance Trubarjevega Evangelija sv. Matevža (1555). The 23603 concordances represent a transcription of the book "Ta evangeli sv. Matevža" (1555) by Primož Trubar. See also:... -
Dataset of Slovene medical texts PoVeJMo-VeMo-Med 1.0
PoVeJMo-VeMo-Med is a dataset containing Slovene medical texts. The bulk of it is comprised of instructions of use for different prescribed drugs. The texts were extracted from... -
TED-ELH Parallel Corpus (ELEXIS)
The corpus contains parallelly aligned scripts of TED Talks in English, Lithuanian, and Hebrew. It contains spoken language data. See also: http://hdl.handle.net/20.500.11821/34