FAspell

Dataset

PID

FASpell dataset was developed for the evaluation of spell checking algorithms. It contains a set of pairs of misspelled Persian words and their corresponding corrected forms similar to the ASpell dataset used for English.

The dataset consists of two parts: a) faspell_main: list of 5050 pairs collected from errors made by elementary school pupils and professional typists. b) faspell_ocr: list of 800 pairs collected from the output of a Farsi OCR system.

Identifier
PID	http://hdl.handle.net/11372/LRT-1547
Related Identifier	http://pars.ie/lr/faspell_dataset
Metadata Access	http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11372/LRT-1547

Provenance
Creator	QasemiZadeh, Behrang
Publisher	Behrang-QasemiZadeh
Publication Year	2015
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); http://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess	true
Contact	lindat-help(at)ufal.mff.cuni.cz

Representation
Language	Persian; Farsi
Resource Type	lexicalConceptualResource
Format	application/octet-stream; text/plain; text/plain; charset=utf-8; downloadable_files_count: 4
Discipline	Linguistics