-
Ukrainian-English parallel corpus MaCoCu-uk-en 1.0
The Ukrainian-English parallel corpus MaCoCu-uk-en 1.0 was built by crawling the ".ua" and ".укр" internet top-level domain in 2022, extending the crawl dynamically to other... -
Catalan-English parallel corpus MaCoCu-ca-en 1.0
The Catalan-English parallel corpus MaCoCu-ca-en 1.0 was built by crawling the ".cat", ".es", ".ad", ".fr", ".it" and ".eu” internet top-level domain in 2022, extending the... -
Greek-English parallel corpus MaCoCu-el-en 1.0
The Greek-English parallel corpus MaCoCu-el-en 1.0 was built by crawling the “.gr", ".ελ", ".cy" and ".eu" internet top-level domain in 2023, extending the crawl dynamically to... -
JRC EU DGT Translation Memory Parsebank DGT-UD 1.0
DGT-UD is a 2 billion word 23-language parallel syntactically parsed corpus, which consists of the JRC DGT translation memory of European law, automatically annotated with... -
English-Montenegrin parallel corpus of subtitles Opus-MontenegrinSubs 1.0
This corpus contains parallel English-Montenegrin subtitles collected in the scope of conducting a linguistic and translatological research by Petar Božović for his PhD thesis... -
Paralela corpus and search engine
Paralela is as an open-ended, opportunistic parallel corpus of Polish-English and English-Polish translations. It currently contains 262 million words in 10,877,000 translation... -
MULTEXT-East "1984" document corpus 4.0
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original... -
TED-ELH Parallel Corpus (ELEXIS)
The corpus contains parallelly aligned scripts of TED Talks in English, Lithuanian, and Hebrew. It contains spoken language data. See also: http://hdl.handle.net/20.500.11821/34 -
Parallel Corpus (EN-LT-DA) of General Data Protection Regulation (ELEXIS)
Trilingual parallel corpus on general data protection regulation. The size of the corpus is 54,468 words in English, 42,566 words in Lithuanian, and 47,740 words in Danish. -
Parallel sense-annotated corpus ELEXIS-WSD 1.1
ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.1 contains sentences for 10... -
Slovene-English parallel corpus MaCoCu-sl-en 1.0
The Slovene-English parallel corpus MaCoCu-sl-en 1.0 was built by crawling the ".si" internet top-level domain in 2021, extending the crawl dynamically to other domains as well.... -
Maltese-English parallel corpus MaCoCu-mt-en 1.0
The Maltese-English parallel corpus MaCoCu-mt-en 1.0 was built by crawling the ".mt" internet top-level domain in 2021, extending the crawl dynamically to other domains as well.... -
Tourism English-Croatian Parallel Corpus 2.0
Sentence aligned parallel corpus built by automatically crawling 25 websites from the tourism domain. -
Croatian-English parallel corpus hrenWaC 2.0
The hrenWaC corpus version 2.0 consists of parallel Croatian-English texts crawled from the .hr top-level domain for Croatia. The corpus was built with Spidextor... -
Albanian-English parallel corpus MaCoCu-sq-en 1.0
The Albanian-English parallel corpus MaCoCu-sq-en 1.0 was built by crawling the “.al” internet top-level domain in 2022, extending the crawl dynamically to other domains as... -
MULTEXT-East "1984" annotated corpus 4.0
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original... -
Bosnian-English parallel corpus MaCoCu-bs-en 1.0
The Bosnian-English parallel corpus MaCoCu-bs-en 1.0 was built by crawling the “.ba” internet top-level domain in 2021 and 2022, extending the crawl dynamically to other domains... -
Montenegrin-English parallel corpus MaCoCu-cnr-en 1.0
The Montenegrin-English parallel corpus MaCoCu-cnr-en 1.0 was built by crawling the “.me” internet top-level domain in 2021 and 2022, extending the crawl dynamically to other... -
Slovene-English parallel corpus MaCoCu-sl-en 2.0
The Slovene-English parallel corpus MaCoCu-sl-en 2.0 was built by crawling the “.si” internet top-level domain in 2021 and 2022, extending the crawl dynamically to other domains... -
Parallel sense-annotated corpus ELEXIS-WSD 1.0
ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.0 contains sentences for 10...