CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)

PID

CzeSL-GEC is a corpus containing sentence pairs of original and corrected versions of Czech sentences collected from essays written by both non-native learners of Czech and Czech pupils with Romani background. To create this corpus, unreleased CzeSL-man corpus (http://utkl.ff.cuni.cz/learncorp/) was utilized. All sentences in the corpus are word tokenized.

Identifier
PID http://hdl.handle.net/11234/1-2143
Related Identifier http://hdl.handle.net/11234/1-3057
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-2143
Provenance
Creator Šebesta, Karel; Bedřichová, Zuzanna; Šormová, Kateřina; Štindlová, Barbora; Hrdlička, Milan; Hrdličková, Tereza; Hana, Jiří; Petkevič, Vladimír; Jelínek, Tomáš; Škodová, Svatava; Janeš, Petr; Lundáková, Kateřina; Skoumalová, Hana; Sládek, Šimon; Pierscieniak, Piotr; Toufarová, Dagmar; Straka, Milan; Rosen, Alexandr; Náplava, Jakub; Poláčková, Marie
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2017
Rights Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0); http://creativecommons.org/licenses/by-sa/3.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Czech
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 1
Discipline Linguistics