-
Extended CLEF eHealth 2013-2015 IR Test Collection
This package contains an extended version of the test collection used in the CLEF eHealth Information Retrieval tasks in 2013--2015. Compared to the original version, it... -
Khresmoi Summary Translation Test Data 1.1
This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German. -
WMT17 Quality Estimation Shared Test Data
Test data for the WMT17 QE task. Train data can be downloaded from http://hdl.handle.net/11372/LRT-1974 This shared task will build on its previous five editions to further... -
MTMonkey
MTMonkey is a web service which handles and distributes JSON-encoded HTTP requests for machine translation (MT) among multiple machines running an MT system, including text pre-... -
Plain-Moses-Chimera
Statistical component of Chimera, a state-of-the-art MT system. -
QT21 Data
Post-editing and MQM annotations produced by the QT21 project. As described in @InProceedings{specia-etal_MTSummit:2017, author = {Specia, Lucia and Kim Harris and... -
DiscoMT 2016 Shared Task on Cross-lingual Pronoun Prediction
Files for the DiscoMT 2016 shared task on cross-lingual pronoun prediction -
WMT16 Tuning Shared Task Models (English-to-Czech)
This item contains models to tune for the WMT16 Tuning shared task for English-to-Czech. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training... -
ParaCrawl Corpus version 1.0
The January 2018 release of the ParaCrawl is the first version of the corpus. It contains parallel corpora for 11 languages paired with English, crawled from a large number of... -
WMT16 Tuning Shared Task Models (Czech-to-English)
The item contains models to tune for the WMT16 Tuning shared task for Czech-to-English. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training... -
WMT18 Quality Estimation Shared Task Training and Development Data
Training and development data for the WMT18 QE task. Test data will be published as a separate item. This shared task will build on its previous six editions to further examine... -
WMT18 APE Shared Task: En-DE NMT Train and Dev Data
Training and development data for the WMT 2018 Automatic post-editing task. They consist in English-German triplets (source, target and post-edit) belonging to the information... -
Cesilko Web Service for Weblicht
Weblicht integration of Cesilko (http://hdl.handle.net/11858/00-097C-0000-0006-AAFE-A) -
WMT17 De-En APE Shared Task Data
Training and development data for the WMT 2017 Automatic post-editing task (the same used for the Sentence-level Quality Estimation task). They consist in German-English... -
Hindi Visual Genome 1.0
Data Hindi Visual Genome 1.0, a multimodal dataset consisting of text and images suitable for English-to-Hindi multimodal machine translation task and multimodal research. We... -
Tensor2tensor Translation for Docker
This submission contains Dockerfile for creating a Docker image with compiled Tensor2tensor backend with compatible (TensorFlow Serving) models available in the Lindat... -
Manually Ranked Translation Outputs
Manually ranked outputs of Czech-Slovak translations. Three annotators manually ranked outputs of five MT systems (Česílko, Česílko2, Google Translate and two Moses setups) on... -
Czech image captioning, machine translation, and sentiment analysis (Neural M...
This submission contains trained end-to-end models for the Neural Monkey toolkit for Czech and English, solving three NLP tasks: machine translation, image captioning, and... -
WMT17 Quality Estimation Shared Task Training and Development Data
Training and development data for the WMT17 QE task. Test data will be published as a separate item. This shared task will build on its previous five editions to further examine... -
WMT16 APE Shared Task Data - Reference sentences
Training, development and test data consist in German sentences belonging to the IT domain and already tokenized. These sentences are the references of the data released for the...