Background Data for: The complexity principle and the morphosyntactic alternation between case affixes and postpositions in Estonian

DOI

Manually annotated dataset of 3,000 uses of exterior locative constructions (specifically cases and postpositions) in present-day Estonian. The data is extracted from the Estonian National Corpus (ENC 2017; 1.1 billion words, mainly web-based texts). The data includes 500 uses of each of the following constructions: allative, adessive, ablative, peale, peal, pealt. The data sampling procedure and more details about the dataset is given in Klavan & Schützler (to appear in Cognitive Linguistics). The data is annotated for 9 variables: postpos (outcome variable: case, postposition), position (post, pre), complexity (simple, compound), length (length in syllables of landmark phrase), frequency (raw frequency of landmark form in association with the respective semantic relation), function (adverbial, modifier), verb_lemma (224 levels for lative, 279 levels for locative, 252 levels for separative), lm_lemma (592 levels for lative, 438 levels for locative, 528 levels for separative), sem_rel (lative, locative, separative). The dataset was collected by the PI of the project PUT1358 "The Making and Breaking of Models: Experimentally Validating Classification Models in Linguistics" (1.01.2017−31.12.2020) funded by the Estonian Research Council.

Sketch Engine, 2.36.5

Identifier
DOI https://doi.org/10.18710/KDSZEP
Related Identifier https://doi.org/10.1515/cog-2021-0114
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/KDSZEP
Provenance
Creator Klavan, Jane ORCID logo
Publisher DataverseNO
Contributor Klavan, Jane; The Tromsø Repository of Language and Linguistics
Publication Year 2023
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Klavan, Jane (University of Tartu)
Representation
Resource Type Manually annotated corpus data; Dataset
Format text/plain
Size 7552; 215658
Version 1.1
Discipline Humanities
Spatial Coverage (23.330W, 59.610S, 28.130E, 59.610N)