Slovenian Twitter dataset 2018-2020 1.0

PID

The dataset represents the Twitter production in Slovenian in the period from 2018 until 2020. It consists of tweet IDs, retweet IDs, pseudo-anonymized user IDs, publication dates, and automatically assigned hate labels (acceptable, inappropriate, offensive, violent) with https://huggingface.co/IMSyPP/hate_speech_slo.

The dataset is the basis for the two following papers: - "Retweet communities reveal the main source of hate speech" - https://arxiv.org/pdf/2105.14898.pdf - "Community evolution in retweet networks" - https://arxiv.org/pdf/2105.06214.pdf

Identifier
PID http://hdl.handle.net/11356/1423
Related Identifier https://arxiv.org/pdf/2105.14898.pdf
Related Identifier https://arxiv.org/pdf/2105.06214.pdf
Related Identifier http://imsypp.ijs.si
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1423
Provenance
Creator Evkoski, Bojan; Pelicon, Andraž; Mozetič, Igor; Ljubešić, Nikola; Kralj Novak, Petra
Publisher Jožef Stefan Institute
Publication Year 2021
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 1
Discipline Linguistics