StudEmo - corpus of consumer reviews annotated with emotions - Dataset

Dataset

StudEmo - corpus of consumer reviews annotated with emotions

PID

Humans' emotional perception is subjective by nature, in which each individual could express different emotions regarding the same textual content. Existing datasets for emotion analysis commonly depend on a single ground truth per data sample, derived from majority voting or averaging the opinions of all annotators. We introduce a new non-aggregated dataset, namely StudEmo, that contains 5,182 customer reviews, each annotated by 25 people with intensities of eight emotions from Plutchik's model, extended with valence and arousal. We also propose three personalized models that use not only textual content but also the individual human perspective, providing the model with different approaches to learning human representations. The experiments were carried out as a multitask classification on two datasets: our StudEmo dataset and GoEmotions dataset, which contains 28 emotional categories. The proposed personalized methods significantly improve prediction results, especially for emotions that have low inter-annotator agreement.

Identifier
PID	http://hdl.handle.net/11321/895
Related Identifier	https://github.com/CLARIN-PL/personalized-nlp/tree/nlperspectives
Metadata Access	https://clarin-pl.eu/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin-pl.eu:11321/895

Provenance
Creator	Ngo, Anh; Candri, Argi; Ferdinan, Teddy; Kocoń, Jan; Korczyński, Wojciech
Publisher	Wrocław University of Science and Technology
Publication Year	2022
Rights	Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0); https://creativecommons.org/licenses/by-nc-nd/4.0/; PUB
OpenAccess	true
Contact	clarin-pl(at)pwr.edu.pl

Representation
Language	English
Resource Type	corpus
Format	text/plain; charset=utf-8; application/zip; downloadable_files_count: 1
Discipline	Linguistics