News comment corpus Janes-News 1.0

PID

Janes-News is an annotated corpus of comments on online news articles from websites rtvslo.si, mladina.si, and reporter.si from the period 2007-03 to 2015-01. The corpus is structured into individual texts containing the comments on a news article, together with their metadata. The texts in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities. Due to protection of privacy, usernames are not included in the metadata and 'person' as well as 'person derivative' named entities have been removed from the texts.

Identifier
PID http://hdl.handle.net/11356/1140
Related Identifier https://doi.org/10.4312/slo2.0.2016.2.67-99
Related Identifier https://nl.ijs.si/janes/viri/avtomatsko-oznaceni-korpusi/#Janes-News
Related Identifier https://doi.org/10.1007/s10579-018-9425-z
Related Identifier http://nl.ijs.si/janes/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1140
Provenance
Creator Erjavec, Tomaž; Ljubešić, Nikola; Fišer, Darja
Publisher Jožef Stefan Institute
Publication Year 2017
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline Linguistics