MULTEXT-East "1984" annotated corpus 4.0
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original... -
Tourism English-Croatian Parallel Corpus 2.0
Sentence aligned parallel corpus built by automatically crawling 25 websites from the tourism domain. -
DSI-enriched ParaCrawl 9 en-nl corpus
This is a derivative work based on Paracrawl release 9 English-Dutch (https://paracrawl.eu/). This version of the corpus includes a set of probabilities corresponding to the... -
Parallel Corpus (EN-LT-DA) of General Data Protection Regulation (ELEXIS)
Trilingual parallel corpus on general data protection regulation. The size of the corpus is 54,468 words in English, 42,566 words in Lithuanian, and 47,740 words in Danish. -
Parallel corpus EN-SL RSDO4 1.0
The RSDO4 parallel corpus of English-Slovene and Slovene-English translation pairs was collected as part of work package 4 of the Slovene in the Digital Environment project. It... -
Bilingual Corpus of Underground Mining (ELEXIS)
PodzemniRadovi-sr-en, dvojezični poravnati korpus radova iz oblasti rudarstva. Undeground-mining-sr-en: bilingual texts from the Underground Mining Engineering journal (55... -
Parallel sense-annotated corpus ELEXIS-WSD 1.1
ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.1 contains sentences for 10... -
Icelandic-English parallel corpus MaCoCu-is-en 1.0
The Icelandic-English parallel corpus MaCoCu-is-en 1.0 was built by crawling the ".is" internet top-level domain in 2021, extending the crawl dynamically to other domains as... -
Parallel Corpus (EN-LT-FR) of EUR-Lex Document Extracts That Include Terms wi...
Trilingual parallel corpus of EUR-Lex Document Extracts that include terms with colour names (black, white and grey). The size of the corpus is 23,198 words in English, 19,262... -
Albanian-English parallel corpus MaCoCu-sq-en 1.0
The Albanian-English parallel corpus MaCoCu-sq-en 1.0 was built by crawling the “.al” internet top-level domain in 2022, extending the crawl dynamically to other domains as... -
Parallel corpus EN-SL RSDO4 2.0
The RSDO4 parallel corpus of English-Slovene and Slovene-English translation pairs was collected as part of work package 4 of the Slovene in the Digital Environment project. It... -
Serbian-English parallel corpus srenWaC 1.0
The srenWaC corpus consists of sentence aligned parallel Serbian-English texts crawled from the .rs top-level domain for Serbia. The corpus was built with Spidextor... -
Croatian-English parallel corpus MaCoCu-hr-en 2.0
The Croatian-English parallel corpus MaCoCu-hr-en 2.0 was built by crawling the “.hr” internet top-level domain in 2021 and 2022, extending the crawl dynamically to other... -
Greek-English parallel corpus MaCoCu-el-en 1.0
The Greek-English parallel corpus MaCoCu-el-en 1.0 was built by crawling the “.gr", ".ελ", ".cy" and ".eu" internet top-level domain in 2023, extending the crawl dynamically to... -
Macedonian-English parallel corpus MaCoCu-mk-en 1.0
The Macedonian-English parallel corpus MaCoCu-mk-en 1.0 was built by crawling the ".mk" and ".мкд" internet top-level domains in 2021, extending the crawl dynamically to other... -
Parallel corpus of idiomatic text ParaDiom 1.0
ParaDiom is a parallel corpus with sentences sampled from existing corpora. The corpus contains 1,000 Slovene sentences with their English translation and 1,000 English... -
Bulgarian-English parallel corpus MaCoCu-bg-en 2.0
The Bulgarian-English parallel corpus MaCoCu-bg-en 2.0 was built by crawling the “.bg” and “.бг” internet top-level domains in 2021, extending the crawl dynamically to other... -
Slovene-English parallel corpus slenWaC 1.0
The slenWaC corpus version 1.0 consists of parallel Slovene-English texts crawled from the .si top-level domain for Slovenia. The corpus was built with Spidextor... -
Bulgarian-English parallel corpus MaCoCu-bg-en 1.0
The Bulgarian-English parallel corpus MaCoCu-bg-en 1.0 was built by crawling the ".bg" and ".бг" internet top-level domains in 2021, extending the crawl dynamically to other... -
Bosnian-English parallel corpus MaCoCu-bs-en 1.0
The Bosnian-English parallel corpus MaCoCu-bs-en 1.0 was built by crawling the “.ba” internet top-level domain in 2021 and 2022, extending the crawl dynamically to other domains...