KrdWrd CANOLA Corpus 1.1 - Dataset - B2FIND

Dataset

KrdWrd CANOLA Corpus 1.1

PID

The CANOLA Corpus is a visually annotated English web corpus for training classification engines to remove boiler plate on unseen Web pages. It was harvested, annotated and evaluated by the tools and infrastructure of the KrdWrd Project.

Identifier
PID	http://hdl.handle.net/20.500.12124/9
Related Identifier	https://github.com/krdwrd/data/releases/tag/v1.1
Related Identifier	https://github.com/krdwrd/doc_CANOLA/releases/tag/v1.1
Related Identifier	https://www.sigwac.org.uk/raw-attachment/wiki/WAC5/WAC5_proceedings.pdf
Related Identifier	http://hdl.handle.net/20.500.12124/8
Related Identifier	https://krdwrd.github.io
Metadata Access	http://clarin.eurac.edu/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.eurac.edu:20.500.12124/9

Provenance
Creator	Stemle, Egon W.; Steger, Johannes M.
Publisher	Institute for Applied Linguistics, Eurac Research
Publication Year	2010
Rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess	true
Contact	clarin(at)eurac.edu

Representation
Language	English
Resource Type	corpus
Format	application/pdf; application/gzip; text/plain; charset=utf-8; text/plain; downloadable_files_count: 2
Discipline	Linguistics