MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of Consumer Reviews

PID

MultiEmo, a new benchmark data set for the multilingual sentiment analysis task including 11 languages. The collection contains consumer reviews from four domains: medicine, hotels, products and university. The original reviews in Polish contained 8,216 documents consisting of 57,466 sentences. The reviews were manually annotated with sentiment at the level of the whole document and at the level of a sentence (3 annotators per element). We achieved a high Positive Specific Agreement value of 0.91 for texts and 0.88 for sentences. The collection was then translated automatically into English, Chinese, Italian, Japanese, Russian, German, Spanish, French, Dutch and Portuguese. MultiEmo is publicly available under a Creative Commons Attribution 4.0 International Licence.

More information: https://github.com/CLARIN-PL/multiemo

Citation: @inproceedings{kocon2021multiemo, title={Multiemo: Multilingual, multilevel, multidomain sentiment analysis corpus of consumer reviews}, author={Koco{\'n}, Jan and Mi{\l}kowski, Piotr and Kanclerz, Kamil}, booktitle={International Conference on Computational Science}, pages={297--312}, year={2021}, organization={Springer} }

Identifier
PID http://hdl.handle.net/11321/798
Related Identifier https://github.com/CLARIN-PL/multiemo
Metadata Access https://clarin-pl.eu/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin-pl.eu:11321/798
Provenance
Creator Kocoń, Jan; Miłkowski, Piotr; Kanclerz, Kamil
Publisher Wrocław University of Science and Technology
Publication Year 2021
Rights The MIT License; https://opensource.org/licenses/MIT; PUB
OpenAccess true
Contact clarin-pl(at)pwr.edu.pl
Representation
Language Polish; English; Chinese; Italian; Japanese; Russian; German; Spanish; Castilian; French; Dutch; Flemish; Portuguese
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; application/zip; downloadable_files_count: 2
Discipline Linguistics