-
TED-ELH Parallel Corpus
The corpus contains parallelly aligned scripts of TED Talks in English, Lithuanian, and Hebrew. It contains spoken language data. -
English-Lithuanian Parallel Cybersecurity Corpus - DVITAS v2.0
English-Lithuanian parallel corpus DVITAS v2 includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. Version 1 of the... -
English-Lithuanian Parallel Cybersecurity Corpus - DVITAS
English-Lithuanian parallel corpus DVITAS includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. The corpus was... -
English-French-Lithuanian Parallel Corpus of EU Financial Documents
The corpus is comprised of 154 EU legislative documents (English documents and their translations into French and Lithuanian) related to various financial issues and enacted in... -
Paralela corpus and search engine
Paralela is as an open-ended, opportunistic parallel corpus of Polish-English and English-Polish translations. It currently contains 262 million words in 10,877,000 translation... -
Post-edited and error annotated machine translation corpus PErr 1.0
The PE²rr corpus contains source language texts from different domains along with their automatically generated translations into several morphologically rich languages, their... -
Croatian-English parallel corpus MaCoCu-hr-en 1.0
The Croatian-English parallel corpus MaCoCu-hr-en 1.0 was built by crawling the ".hr" internet top-level domain in 2021, extending the crawl dynamically to other domains as... -
JRC EU DGT Translation Memory Parsebank DGT-UD 1.0
DGT-UD is a 2 billion word 23-language parallel syntactically parsed corpus, which consists of the JRC DGT translation memory of European law, automatically annotated with... -
Parallel sense-annotated corpus ELEXIS-WSD 1.0
ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.0 contains sentences for 10... -
DSI-enriched ParaCrawl 9 en-es corpus
This is a derivative work based on Paracrawl release 9 English-Spanish (https://paracrawl.eu/). This version of the corpus includes a set of probabilities corresponding to the... -
Serbian-English parallel corpus MaCoCu-sr-en 1.0
The Serbian-English parallel corpus MaCoCu-sr-en 1.0 was built by crawling the “.rs” and “.срб” internet top-level domains in 2021 and 2022, extending the crawl dynamically to... -
MULTEXT-East "1984" document corpus 4.0
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original... -
Turkish-English parallel corpus MaCoCu-tr-en 2.0
The Turkish-English parallel corpus MaCoCu-tr-en 2.0 was built by crawling the “.tr” and “.cy” internet top-level domains in 2021, extending the crawl dynamically to other... -
Croatian-English parallel corpus hrenWaC 2.0
The hrenWaC corpus version 2.0 consists of parallel Croatian-English texts crawled from the .hr top-level domain for Croatia. The corpus was built with Spidextor... -
Maltese-English parallel corpus MaCoCu-mt-en 1.0
The Maltese-English parallel corpus MaCoCu-mt-en 1.0 was built by crawling the ".mt" internet top-level domain in 2021, extending the crawl dynamically to other domains as well.... -
Ukrainian-English parallel corpus MaCoCu-uk-en 1.0
The Ukrainian-English parallel corpus MaCoCu-uk-en 1.0 was built by crawling the ".ua" and ".укр" internet top-level domain in 2022, extending the crawl dynamically to other... -
Parallel Corpus (EN-FR-LT) of EU Financial Documents (ELEXIS)
Parallel corpus is comprised of 154 EU legislative documents (English documents and their translations into French and Lithuanian) related to various financial issues and... -
Slovene-English parallel corpus MaCoCu-sl-en 1.0
The Slovene-English parallel corpus MaCoCu-sl-en 1.0 was built by crawling the ".si" internet top-level domain in 2021, extending the crawl dynamically to other domains as well.... -
Parallel Corpus (EN-LT) of EUR-Lex Documents That Include Terms with the Adje...
Bilingual parallel corpus of the EU English documents containing terms with the adjective 'green' and their Lithuanian translations. The size of the corpus is 4,447,683 words in... -
English-Montenegrin parallel corpus of subtitles Opus-MontenegrinSubs 1.0
This corpus contains parallel English-Montenegrin subtitles collected in the scope of conducting a linguistic and translatological research by Petar Božović for his PhD thesis...