Large-Scale Colloquial Persian 0.5

PID

"Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in five different languages (EN, CS, DE, IT, HI).

Identifier
PID http://hdl.handle.net/11234/1-3195
Related Identifier https://arxiv.org/abs/2003.06499
Related Identifier https://iasbs.ac.ir/~ansari/lscp/
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-3195
Provenance
Creator Abdi Khojasteh, Hadi; Ansari, Ebrahim; Bohlouli, Mahdi
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL); Institute for Advanced Studies in Basic Sciences (IASBS)
Publication Year 2020
Rights Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0); http://creativecommons.org/licenses/by-nc-nd/4.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Persian; Farsi; English; German; Czech; Italian; Hindi
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 9
Discipline Linguistics