Dataset - B2FIND

GermEval-2018 Corpus (DE)

This dataset comprises the training and test data (German tweets) from the GermEval 2018 Shared on Offensive Language Detection.

Ekspress user comment dataset 1.0

This dataset is an archive of reader comments on the Ekspress Meedia news site from 2009-2019, containing approximately 31M comments, mostly in the Estonian language, with some...

English YouTube Hate Speech Corpus

We present an English YouTube dataset manually annotated for hate speech types and targets. The comments to be annotated were sampled from the English YouTube comments on videos...

24sata news comment dataset 1.0

The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news...

Latvian user comment dataset 1.0

The dataset is an archive of reader comments from the Delfi news site from 2014-2019, containing approximately 12M comments, mostly in the Latvian language, with some in...

Slovenian Twitter hate speech dataset IMSyPP-sl

A hand-labeled training (50,000 tweets labeled twice) and evaluation set (10,000 tweets labeled twice) for hate speech on Slovenian Twitter. The data files contain tweet IDs,...

Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.0

The FRENK dataset consists of comments to Facebook posts (news articles) of mainstream media outlets from Croatia, Great Britain, and Slovenia, on the topics of migrants and...

Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.1

The FRENK dataset consists of comments to Facebook posts (news articles) of mainstream media outlets from Croatia, Great Britain, and Slovenia, on the topics of migrants and...

8 datasets found