Annotated corpora and tools of the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)

PID

This multilingual resource contains corpora in which verbal MWEs have been manually annotated, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information ­­­­– not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.2

Identifier
PID http://hdl.handle.net/11234/1-3367
Related Identifier http://hdl.handle.net/11372/LRT-2842
Related Identifier http://hdl.handle.net/11372/LRT-5124
Related Identifier http://multiword.sf.net/sharedtask2020
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-3367
Provenance
Creator Ramisch, Carlos; Guillaume, Bruno; Savary, Agata; Waszczuk, Jakub; Candito, Marie; Vaidya, Ashwini; Barbu Mititelu, Verginica; Bhatia, Archna; Iñurrieta, Uxoa; Giouli, Voula; Güngör, Tunga; Jiang, Menghan; Lichte, Timm; Liebeskind, Chaya; Monti, Johanna; Ramisch, Renata; Stymme, Sara; Walsh, Abigail; Xu, Hongzhi; Palka-Binkiewicz, Emilia; Ehren, Rafael; Stymne, Sara; Constant, Matthieu; Pasquer, Caroline; Parmentier, Yannick; Antoine, Jean-Yves; Carlino, Carola; Caruso, Valeria; Di Buono, Maria Pia; Pascucci, Antonio; Raffone, Annalisa; Riccio, Anna; Sangati, Federico; Speranza, Giulia; Cordeiro, Silvio Ricardo; de Medeiros Caseli, Helena; Miranda, Isaac; Rademaker, Alexandre; Vale, Oto; Villavicencio, Aline; Wick Pedro, Gabriela; Wilkens, Rodrigo; Zilio, Leonardo; Rizea, Monica-Mihaela; Ionescu, Mihaela; Onofrei, Mihaela; Chen, Jia; Ge, Xiaomin; Hu, Fangyuan; Hu, Sha; Li, Minli; Liu, Siyuan; Qin, Zhenzhen; Sun, Ruilong; Wang, Chenweng; Xiao, Huangyang; Yan, Peiyi; Yih, Tsy; Yu, Ke; Yu, Songping; Zeng, Si; Zhang, Yongchen; Zhao, Yun; Foufi, Vassiliki; Fotopoulou, Aggeliki; Markantonatou, Stella; Papadelli, Stella; Louizou, Sevasti; Aduriz, Itziar; Estarrona, Ainara; Gonzalez, Itziar; Gurrutxaga, Antton; Uria, Larraitz; Urizar, Ruben; Foster, Jennifer; Lynn, Teresa; Elyovitch, Hevi; Ha-Cohen Kerner, Yaakov; Malka, Ruth; Jain, Kanishka; Puri, Vandana; Ratori, Shraddha; Shukla, Vishakha; Srivastava, Shubham; Berk, Gozde; Erden, Berna; Yirmibeşoğlu, Zeynep
Publisher PARSEME
Publication Year 2020
Rights PARSEME Shared Task Data (v. 1.2) Agreement; https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.2; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language German; Greek, Modern (1453-); Greek; Basque; French; Irish; Hebrew; Hindi; Italian; Polish; Portuguese; Romanian; Moldavian; Moldovan; Swedish; Turkish; Chinese
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; application/x-gzip; downloadable_files_count: 17
Discipline Linguistics