Multiword Expressions lexicon extracted from the Gigafida 2.1 corpus

PID

The MWE lexicon was extracted from the Gigafida 2.1 Corpus of Written Standard Slovene (https://www.clarin.si/noske/run.cgi/corp_info?corpname=gfida21) using specialized scripts for extracting data from corpora containing syntactic dependency annotations. The lexicon contains 5,242 Multiword Expressions with 12,358 examples from Gigafida 2.1. Each MWE entry (or sense) contains at least one and up to three extracted examples.

MWEs were analysed using the JOS dependency parser system (http://nl.ijs.si/jos/bib/jos-skladnja-navodila.pdf) and were assigned matching syntactic structure IDs. The corpus sentences containing the MWE components and matching syntactic structure features were identified in the corpus and assigned to the corresponding headword or sense.

MWEs variants (or variant senses) are linked with the "senseKey" attribute values, forming a MWE cluster of related variants or variant senses. A sample of MWE headwords also contains manually created sense division with descriptions of meaning for each sense.

Identifier
PID http://hdl.handle.net/11356/1421
Related Identifier http://slovnica.ijs.si/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1421
Provenance
Creator Krek, Simon; Gantar, Apolonija; Laskowski, Cyprian; Krsnik, Luka; Kosem, Iztok; Brank, Janez; Dobrovoljc, Kaja; Arhar Holdt, Špela; Čibej, Jaka; Robnik-Šikonja, Marko; Klemenc, Bojan; Gorjanc, Vojko
Publisher Centre for Language Resources and Technologies, University of Ljubljana
Publication Year 2021
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type lexicalConceptualResource
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics