PolEmo 1.0 + MultiEmo-Test 1.0 Multilingual Sentiment Analysis Dataset for KES2020

PID

PolEmo 1.0 + MultiEmo-Test 1.0: Corpus of Multi-Domain Consumer Reviews. Test dataset from PolEmo 1.0 was translated to eight different languages: Dutch, English, French, German, Italian, Portuguese, Russian and Spanish.

Citation: @article{KANCLERZ2020128, title = {Cross-lingual deep neural transfer learning in sentiment analysis}, journal = {Procedia Computer Science}, volume = {176}, pages = {128-137}, year = {2020}, note = {Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 24th International Conference KES2020}, issn = {1877-0509}, doi = {https://doi.org/10.1016/j.procs.2020.08.014}, url = {https://www.sciencedirect.com/science/article/pii/S187705092031838X}, author = {Kamil Kanclerz and Piotr Miłkowski and Jan Kocoń}, keywords = {natural language processing, sentiment analysis, polarity recognition, transfer learning, deep learning, multilingual approach}, abstract = {In this article, we present a novel technique for the use of language-agnostic sentence representations to adapt the model trained on texts in Polish (as a low-resource language) to recognize polarity in texts in other (high-resource) languages. The first model focuses on the creation of a language-agnostic representation of each sentence. The second one aims to predict the sentiment of the text based on these sentence representations. Besides models evaluation on PolEmo 1.0 Sentiment Corpus, we also conduct a proof of concept for using a deep neural network model trained only on language-agnostic embeddings of texts in Polish to predict the sentiment of the texts in MultiEmo-Test 1.0 Sentiment Corpus, containing PolEmo 1.0 test datasets translated into eight different languages: Dutch, English, French, German, Italian, Portuguese, Russian and Spanish. Both corpora are publicly available under a Creative Commons copyright license.} }

Identifier
PID http://hdl.handle.net/11321/737
Metadata Access https://clarin-pl.eu/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin-pl.eu:11321/737
Provenance
Creator Kocoń, Jan; Kanclerz, Kamil; Miłkowski, Piotr; Bojanowski, Bartosz; Zaśko-Zielińska, Monika
Publisher Wrocław University of Science and Technology
Publication Year 2020
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; CC
OpenAccess true
Contact clarin-pl(at)pwr.edu.pl
Representation
Language Polish; English; Dutch; Flemish; French; German; Italian; Portuguese; Russian; Spanish; Castilian
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; application/zip; downloadable_files_count: 2
Discipline Linguistics