Multilingual corpus of literal occurrences of multiword expressions

PID

The corpus contains sentences with idiomatic, literal and coincidental occurrences of verbal multiword expressions (VMWEs) in Basque, German, Greek, Polish and Portuguese. The source corpus is the PARSEME multilingual corpus of VMWEs v 1.1 (cf. http://hdl.handle.net/11372/LRT-2842). The sentences with VMWEs were extracted from the source corpus and potential co-occurrences of the same lexemes were automatically extracted from the same corpus. These candidates were then manually annotated by native experts into 6 classes, including literal and coincidental occurrences, as well as various annotation errors.

The construction of the corpus is described by the following publication: Agata Savary, Silvio Ricardo Cordeiro, Timm Lichte, Carlos Ramisch, Uxoa Iñurrieta, Voula Giouli (forthcoming) "Literal occurrences of multiword expressions: Rare birds that cause a stir", to appear in Prague Bulletin of Mathematical Linguistics.

Identifier
PID http://hdl.handle.net/11372/LRT-2966
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11372/LRT-2966
Provenance
Creator Savary, Agata; Cordeiro, Silvio Ricardo; Lichte, Timm; Ramisch, Carlos; Iñurrieta, Uxoa; Giouli, Voula
Publisher PARSEME
Publication Year 2019
Rights License agreement for The Multilingual corpus of literal occurrences of multiword expressions; https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-literal; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Basque; German; Greek, Modern (1453-); Greek; Polish; Portuguese
Resource Type corpus
Format text/plain; charset=utf-8; application/x-gzip; downloadable_files_count: 5
Discipline Linguistics