List of word relations from the Sloleks 2.0 lexicon 1.0

PID

This entry consists of a TSV file containing a list of 66,347 Slovene word pairs from the Sloleks Morphological Lexicon of Slovene (v2.0; http://hdl.handle.net/11356/1230) that have been automatically identified as morphologically related according to a number of manually designed morphological relation rules (e.g. "dež" -> "deževen", "pisati" -> "pisatelj", "prijatelj" -> "prijateljica").

Each line in the list contains the following columns: - original lemma (e.g. "pisati"), - related lemma (e.g. "pisatelj"), - original lemma, automatically deconstructed into individual word parts (e.g. "pis_ati"), - related lemma, automatically deconstructed into individual word parts (e.g. "pis_at_elj"), - MTE-6 lexical features of the original lemma (e.g. "G"), - MTE-6 lexical features of the related lemma (e.g. "Som"), - ID of the original lemma from Sloleks 2.0, - ID of the related lemma from Sloleks 2.0, - the overlapping or central part (common to both the original and the related lemmas; e.g. "pis") - the ID of the morphological relation rule used to identify the relation (e.g. "G.Som.5.2.1"), - the morphological relation rule (e.g. "[G]_ati -> [G]_at_elj").

  • MTE-6 refers to MULTEXT-East Version 6 morphosyntactic specifications for Slovenian, available at http://nl.ijs.si/ME/V6/

Each rule constitutes a pattern to form a morphological relation. For instance, "[G]_ati -> [G]_at_elj" indicates that a verb (G) ending with the word part "ati" is related to the lemma formed by replacing "_ati" with "_at_elj".

Note that the list contains no proper nouns and no relations for 38 morphological rules that have been included in the hierarchy of rules (listed in the accompanying file nssss_sloleks_word_relation_rules.tsv), but need to take into account additional rules that have not yet been implemented in the current version of the extraction process (such as irregular conversions in overlapping word parts: "gri_sti" - "griz_enj_e", "sneg" - "snež_ak").

Identifier
PID http://hdl.handle.net/11356/1386
Related Identifier http://slovnica.ijs.si/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1386
Provenance
Creator Čibej, Jaka; Arhar Holdt, Špela; Krek, Simon
Publisher Centre for Language Resources and Technologies, University of Ljubljana; Jožef Stefan Institute
Publication Year 2020
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type lexicalConceptualResource
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 1
Discipline Linguistics