Dataset - B2FIND

Test Data EN-DE MT_NMT APE Shared Task WMT18

Test data for the WMT 2018 Automatic post-editing task. They consist in English-German pairs (source and target) belonging to the information technology domain and already...

Automatic Paraphrases of Czech Reference Sentences for WMT11, 13 and 14

This dataset contains automatic paraphrases of Czech official reference translations for the Workshop on Statistical Machine Translation shared task. The data covers the years...

Machine Translation Testsuite for Gender-Consistent Translation

Document-level testsuite for evaluation of gender translation consistency. Our Document-Level test set consists of selected English documents from the WMT21 newstest annotated...

TMODS:ENG-CZE -- query translation

AMALACH project component TMODS:ENG-CZE; machine translation of queries from Czech to English. This archive contains models for the Moses decoder (binarized, pruned to allow for...

DiscoMT 2015 Shared Task on Pronoun Translation

The data set includes training, development and test data from the shared tasks on pronoun-focused machine translation and cross-lingual pronoun prediction from the EMNLP 2015...

APE Shared Task WMT17: Human Post-edits Test Data DE-EN

Human post-edited test sentences for the WMT 2017 Automatic post-editing task. This consists in 2,000 English sentences belonging to the IT domain and already tokenized. Source...

LiStr: Linguistic Structure Induction Tookit

This toolkit comprises the tools and supporting scripts for unsupervised induction of dependency trees from raw texts or texts with already assigned part-of-speech tags. There...

Test Data EN-DE MT_PBSMT APE Shared Task WMT18

Test data for the WMT 2018 Automatic post-editing task. They consist in English-German pairs (source and target) belonging to the information technology domain and already...

Czech image captioning, machine translation, sentiment analysis and summariza...

This submission contains trained end-to-end models for the Neural Monkey toolkit for Czech and English, solving four NLP tasks: machine translation, image captioning, sentiment...

Large-Scale Colloquial Persian 0.5

"Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a...

WMT16 APE Shared Task Data

Training, development and text data (the same used for the Sentence-level Quality Estimation task) consist in English-German triplets (source, target and post-edit) belonging to...

Manually Classified Errors in En->Sk Translation

Manual classification of errors of English-Slovak translation according to the classification introduced by Vilar et al. [1]. 50 sentences randomly selected from WMT 2011 test...

WMT16 Quality Estimation Shared Task Training and Development Data

Training and development data for the WMT16 QE task. Test data will be published as a separate item. This shared task will build on its previous four editions to further examine...

Moses Web Demo

An interactive web demo of selected ÚFAL MT systems.

74 datasets found