Wikipedia talk corpus Janes-Wiki 1.0

PID

Janes-Wiki is an annotated corpus of discussion pages from the Slovene Wikipedia from the period 2003-08 to 2017-06. The corpus contains page and user talks and is structured into individual pages and their comments, together with their metadata. The texts in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities.

Identifier
PID http://hdl.handle.net/11356/1137
Related Identifier https://revije.ff.uni-lj.si/slovenscina2/article/view/7003
Related Identifier http://nl.ijs.si/janes/viri/avtomatsko-oznaceni-korpusi/#Janes-Wiki
Related Identifier https://doi.org/10.1007/s10579-018-9425-z
Related Identifier http://nl.ijs.si/janes/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1137
Provenance
Creator Ljubešić, Nikola; Erjavec, Tomaž; Fišer, Darja
Publisher Jožef Stefan Institute
Publication Year 2017
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline Linguistics