Developmental corpus (without language corrections) Šolar 2.0 Clear

PID

Šolar 2.0 Clear is an adapted version of the Šolar 2.0 corpus, cf. http://hdl.handle.net/11356/1214.

The Šolar 2.0 Clear corpus consists of texts written by students in Slovene primary and secondary schools. School essays form the majority of the corpus while other material includes texts created during lessons, such as text recapitulations or descriptions, examples of formal applications etc. For each text, the information on school (elementary or secondary), subject, level (grade or year), type of text, region and date of production is provided.

Unlike the original Šolar 2.0 corpus (http://hdl.handle.net/11356/1214), Šolar 2.0 Clear includes student texts only: error annotations and other types of feedback from the teachers have been removed. The corpus can thus be used for processing tasks where the inclusion of corrections hinders or complicates the procedures (e.g. for comparative data extraction, training of language models etc).

Identifier
PID http://hdl.handle.net/11356/1219
Related Identifier http://hdl.handle.net/11356/1150
Related Identifier http://hdl.handle.net/11356/1589
Related Identifier https://solar.trojina.si/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1219
Provenance
Creator Kosem, Iztok; Arhar Holdt, Špela; Stritar Kučuk, Mojca; Krek, Simon; Krapš Vodopivec, Irena; Stabej, Marko; Kocjančič, Polonca; Laskowski, Cyprian; Klemenc, Bojan; Pori, Eva; Rozman, Tadeja
Publisher Trojina, Institute for Applied Slovene Studies; Centre for Language Resources and Technologies, University of Ljubljana
Publication Year 2019
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); https://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline Linguistics