PolEmo 1.0 + MultiEmo-Test 1.0 Multilingual Sentiment Analysis Dataset for KES2020


PolEmo 1.0 + MultiEmo-Test 1.0: Corpus of Multi-Domain Consumer Reviews. Test dataset from PolEmo 1.0 was translated to eight different languages: Dutch, English, French, German, Italian, Portuguese, Russian and Spanish.

Citation: @article{KANCLERZ2020128, title = {Cross-lingual deep neural transfer learning in sentiment analysis}, journal = {Procedia Computer Science}, volume = {176}, pages = {128-137}, year = {2020}, note = {Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 24th International Conference KES2020}, issn = {1877-0509}, doi = {https://doi.org/10.1016/j.procs.2020.08.014}, url = {https://www.sciencedirect.com/science/article/pii/S187705092031838X}, author = {Kamil Kanclerz and Piotr Miłkowski and Jan Kocoń}, keywords = {natural language processing, sentiment analysis, polarity recognition, transfer learning, deep learning, multilingual approach}, abstract = {In this article, we present a novel technique for the use of language-agnostic sentence representations to adapt the model trained on texts in Polish (as a low-resource language) to recognize polarity in texts in other (high-resource) languages. The first model focuses on the creation of a language-agnostic representation of each sentence. The second one aims to predict the sentiment of the text based on these sentence representations. Besides models evaluation on PolEmo 1.0 Sentiment Corpus, we also conduct a proof of concept for using a deep neural network model trained only on language-agnostic embeddings of texts in Polish to predict the sentiment of the texts in MultiEmo-Test 1.0 Sentiment Corpus, containing PolEmo 1.0 test datasets translated into eight different languages: Dutch, English, French, German, Italian, Portuguese, Russian and Spanish. Both corpora are publicly available under a Creative Commons copyright license.} }

Creator Kocoń, Jan; Kanclerz, Kamil; Miłkowski, Piotr; Bojanowski, Bartosz; Zaśko-Zielińska, Monika
Publisher Wrocław University of Science and Technology
Publication Year 2020
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; CC
OpenAccess true
Language Polish; English; Dutch; Flemish; French; German; Italian; Portuguese; Russian; Spanish; Castilian
Resource Type corpus
Discipline Linguistics