The dataset represents the Twitter production in Slovenian in the period from 2018 until 2020. It consists of tweet IDs, retweet IDs, pseudo-anonymized user IDs, publication dates, and automatically assigned hate labels (acceptable, inappropriate, offensive, violent) with https://huggingface.co/IMSyPP/hate_speech_slo.
The dataset is the basis for the two following papers:
- "Retweet communities reveal the main source of hate speech" - https://arxiv.org/pdf/2105.14898.pdf
- "Community evolution in retweet networks" - https://arxiv.org/pdf/2105.06214.pdf