PolEmo 1.0 + MultiEmo-Test 1.0 Multilingual Sentiment Analysis Dataset for KES2020

Dataset

PID

PolEmo 1.0 + MultiEmo-Test 1.0: Corpus of Multi-Domain Consumer Reviews. Test dataset from PolEmo 1.0 was translated to eight different languages: Dutch, English, French, German, Italian, Portuguese, Russian and Spanish.

Citation: @article{KANCLERZ2020128, title = {Cross-lingual deep neural transfer learning in sentiment analysis}, journal = {Procedia Computer Science}, volume = {176}, pages = {128-137}, year = {2020}, note = {Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 24th International Conference KES2020}, issn = {1877-0509}, doi = {https://doi.org/10.1016/j.procs.2020.08.014}, url = {https://www.sciencedirect.com/science/article/pii/S187705092031838X}, author = {Kamil Kanclerz and Piotr Miłkowski and Jan Kocoń}, keywords = {natural language processing, sentiment analysis, polarity recognition, transfer learning, deep learning, multilingual approach}, abstract = {In this article, we present a novel technique for the use of language-agnostic sentence representations to adapt the model trained on texts in Polish (as a low-resource language) to recognize polarity in texts in other (high-resource) languages. The first model focuses on the creation of a language-agnostic representation of each sentence. The second one aims to predict the sentiment of the text based on these sentence representations. Besides models evaluation on PolEmo 1.0 Sentiment Corpus, we also conduct a proof of concept for using a deep neural network model trained only on language-agnostic embeddings of texts in Polish to predict the sentiment of the texts in MultiEmo-Test 1.0 Sentiment Corpus, containing PolEmo 1.0 test datasets translated into eight different languages: Dutch, English, French, German, Italian, Portuguese, Russian and Spanish. Both corpora are publicly available under a Creative Commons copyright license.} }

Identifier
PID	http://hdl.handle.net/11321/737
Metadata Access	https://clarin-pl.eu/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin-pl.eu:11321/737

Provenance
Creator	Kocoń, Jan; Kanclerz, Kamil; Miłkowski, Piotr; Bojanowski, Bartosz; Zaśko-Zielińska, Monika
Publisher	Wrocław University of Science and Technology
Publication Year	2020
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; CC
OpenAccess	true
Contact	clarin-pl(at)pwr.edu.pl

Representation
Language	Polish; English; Dutch; Flemish; French; German; Italian; Portuguese; Russian; Spanish; Castilian
Resource Type	corpus
Format	text/plain; charset=utf-8; application/octet-stream; application/zip; downloadable_files_count: 2
Discipline	Linguistics