AKCES-GEC Grammatical Error Correction Dataset for Czech

Dataset

PID

AKCES-GEC is a grammar error correction corpus for Czech generated from a subset of AKCES. It contains train, dev and test files annotated in M2 format.

Note that in comparison to CZESL-GEC dataset, this dataset contains separated edits together with their type annotations in M2 format and also has two times more sentences.

If you use this dataset, please use following citation: @article{naplava2019wnut,
title={Grammatical Error Correction in Low-Resource Scenarios},
author={N{\'a}plava, Jakub and Straka, Milan},
journal={arXiv preprint arXiv:1910.00353},
year={2019}
}

Identifier
PID	http://hdl.handle.net/11234/1-3057
Related Identifier	https://arxiv.org/abs/1910.00353
Related Identifier	http://hdl.handle.net/11234/1-2143
Metadata Access	http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-3057

Provenance
Creator	Šebesta, Karel; Bedřichová, Zuzanna; Šormová, Kateřina; Štindlová, Barbora; Hrdlička, Milan; Hrdličková, Tereza; Hana, Jiří; Petkevič, Vladimír; Jelínek, Tomáš; Škodová, Svatava; Janeš, Petr; Lundáková, Kateřina; Skoumalová, Hana; Sládek, Šimon; Pierscieniak, Piotr; Toufarová, Dagmar; Straka, Milan; Rosen, Alexandr; Náplava, Jakub; Poláčková, Marie
Publisher	Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year	2019
Rights	Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); http://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess	true
Contact	lindat-help(at)ufal.mff.cuni.cz

Representation
Language	Czech
Resource Type	corpus
Format	text/plain; charset=utf-8; application/zip; downloadable_files_count: 1
Discipline	Linguistics