Background Data for: The complexity principle and the morphosyntactic alternation between case affixes and postpositions in Estonian

Dataset

DOI

Manually annotated dataset of 3,000 uses of exterior locative constructions (specifically cases and postpositions) in present-day Estonian. The data is extracted from the Estonian National Corpus (ENC 2017; 1.1 billion words, mainly web-based texts). The data includes 500 uses of each of the following constructions: allative, adessive, ablative, peale, peal, pealt. The data sampling procedure and more details about the dataset is given in Klavan & Schützler (to appear in Cognitive Linguistics). The data is annotated for 9 variables: postpos (outcome variable: case, postposition), position (post, pre), complexity (simple, compound), length (length in syllables of landmark phrase), frequency (raw frequency of landmark form in association with the respective semantic relation), function (adverbial, modifier), verb_lemma (224 levels for lative, 279 levels for locative, 252 levels for separative), lm_lemma (592 levels for lative, 438 levels for locative, 528 levels for separative), sem_rel (lative, locative, separative). The dataset was collected by the PI of the project PUT1358 "The Making and Breaking of Models: Experimentally Validating Classification Models in Linguistics" (1.01.2017−31.12.2020) funded by the Estonian Research Council.

Sketch Engine, 2.36.5

Identifier
DOI	https://doi.org/10.18710/KDSZEP
Related Identifier	https://doi.org/10.1515/cog-2021-0114
Metadata Access	https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/KDSZEP

Provenance
Creator	Klavan, Jane
Publisher	DataverseNO
Contributor	Klavan, Jane; The Tromsø Repository of Language and Linguistics
Publication Year	2023
Rights	info:eu-repo/semantics/openAccess
OpenAccess	true
Contact	Klavan, Jane (University of Tartu)

Representation
Resource Type	Manually annotated corpus data; Dataset
Format	text/plain
Size	7552; 215658
Version	1.1
Discipline	Humanities
Spatial Coverage	(23.330W, 59.610S, 28.130E, 59.610N)