85 datasets found

Keywords: parallel corpus

Filter Results
  • IDENTICv1.0-raw

    Raw Text
  • Additional German-Czech reference translations of the WMT'11 test set

    Additional three Czech reference translations of the whole WMT 2011 data set (http://www.statmt.org/wmt11/test.tgz), translated from the German originals. Original segmentation...
  • ParCorFull: A Parallel Corpus Annotated with Full Coreference

    ParCorFull is a parallel corpus annotated with full coreference chains that has been created to address an important problem that machine translation and other multilingual...
  • Czech-English Parallel Corpus 1.0 (CzEng 1.0)

    CzEng 1.0 is the fourth release of a sentence-parallel Czech-English corpus compiled at the Institute of Formal and Applied Linguistics (ÚFAL) freely available for...
  • Synthetic part of CzEng 2.0

    CzEng is a sentence-parallel Czech-English corpus compiled at the Institute of Formal and Applied Linguistics (ÚFAL). While the full CzEng 2.0 is freely available for...
  • Czech-English Manual Word Alignment

    Corpus of manually aligned Czech-English parallel sentences. It comprises 2500 parallel sentences from 7 different sources.
  • FAUST 0.5

    Syntactic (including deep-syntactic - tectogrammatical) annotation of user-generated noisy sentences. The annotation was made on Czech-English and English-Czech Faust Dev/Test...
  • English-Slovak Parallel Corpus

    English-Slovak parallel corpus consisting of several freely available corpora (Acquis [1], Europarl [2], Official Journal of the European Union [3] and part of OPUS corpus [4] –...
  • Prague Czech-English Dependency Treebank 2.0

    Texts The Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) is a major update of the Prague Czech-English Dependency Treebank 1.0 (LDC2004T25). It is a manually parsed...
  • KonText Web Demo

    An interactive web demo for querying selected ÚFAL and LINDAT corpora. LINDAT/CLARIN KonText is a fork of ÚČNK KonText (https://github.com/czcorpus/kontext, maintained by Tomáš...
  • English-Urdu Religious Parallel Corpus

    English-Urdu parallel corpus is a collection of religious texts (Quran, Bible) in English and Urdu language with sentence alignments. The corpus can be used for experiments with...
  • Multilingual corpus of juridical texts

    International conventions and treaties arranged as a paralell corpus aligned on paragraph level
  • OdiEnCorp 2.0

    Data We have collected English-Odia parallel data for the purposes of NLP research of the Odia language. The data for the parallel corpus was extracted from existing parallel...
  • LongEval Test Collection

    The collection consists of queries and documents provided by the Qwant search Engine (https://www.qwant.com). The queries, which were issued by the users of Qwant, are based on...
  • LongEval Train Collection

    The collection consists of queries and documents provided by the Qwant search Engine (https://www.qwant.com). The queries, which were issued by the users of Qwant, are based on...
  • EnTam: An English-Tamil Parallel Corpus (EnTam v2.0)

    EnTam is a sentence aligned English-Tamil bilingual corpus from some of the publicly available websites that we have collected for NLP research involving Tamil. The standard set...
  • Czech and English abstracts of ÚFAL papers (2022-11-11)

    This is a parallel corpus of Czech and mostly English abstracts of scientific papers and presentations published by authors from the Institute of Formal and Applied Linguistics,...
  • IDENTICv1.0

    IDENTIC is an Indonesian-English parallel corpus for research purposes. The corpus is a bilingual corpus paired with English. The aim of this work is to build and provide...
  • HindEnCorp 0.5

    HindEnCorp parallel texts (sentence-aligned) come from the following sources: Tides, which contains 50K sentence pairs taken mainly from news articles. This dataset was...
  • Covert translation: Business Communication (new)

    Translation corpora of original texts with translations and comparable texts from the genre external business communication. Übersetzungs- und Vergleichskorpus mit authentischen...
You can also access this registry using the API (see API Docs).