Dataset - B2FIND

AKCES-GEC Grammatical Error Correction Dataset for Czech

AKCES-GEC is a grammar error correction corpus for Czech generated from a subset of AKCES. It contains train, dev and test files annotated in M2 format. Note that in comparison...
Automatically generated spelling correction corpus for Czech (Czech-SEC-AG)

Automatically generated spelling correction corpus for Czech (Czesl-SEC-AG) is a corpus containg text with automatically generated spelling errors. To create spelling errors, a...
Corpus for training and evaluating diacritics restoration systems

Corpus of texts in 12 languages. For each language, we provide one training, one development and one testing set acquired from Wikipedia articles. Moreover, each language...
CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)

CzeSL-GEC is a corpus containing sentence pairs of original and corrected versions of Czech sentences collected from essays written by both non-native learners of Czech and...

You can also access this registry using the API (see API Docs).

4 datasets found