Foneetikakorpuse sagedussõnastik

Dataset

DOI

Eesti keele spontaanse kõne foneetilise korpuse sagedussõnastik on koostatud korpuse v.1.0.5 (20.06.2019, doi:10.15155/1-00-0000-0000-0000-001A3L) versiooni põhjal, kui korpuses oli märgendatud 685 750 sõna (89 tundi ja 18 minutit kõnet). Vt korpuse kohta lähemalt https://www.keel.ut.ee/et/foneetikakorpus

Korpus lemmatiseeriti ESTMORF morfoloogilise analüsaatoriga (https://www.filosoft.ee/html_morf_et/morfoutinfo.html -- vt ka sõnaliikide loendit).

Tabelis EKSKFK_sagedussonastik_2019-06-20.txt on esitatud 1000 sagedasema sõna lemma, sõnaliik ning sagedus.

The frequency table of the 1000 most frequent words in the Phonetic Corpus of Estonian Spontaneous Speech is based on a the v.1.0.5 (20.06.2019, doi:10.15155/1-00-0000-0000-0000-001A3L) version of the corpus, which has a total of 685 750 words (89 h 18 minutes of speech). For more info about the corpus: https://www.keel.ut.ee/en/languages-resourceslanguages-resources/phonetic-corpus-estonian-spontaneous-speech

The words were lemmatisized using ESTMORF morphological analyzer (see (https://www.filosoft.ee/html_morf_et/morfoutinfo.html for more info, inc. the list of word classes).

The table in the file EKSKFK_sagedussonastik_2019-06-20.txt presents the 1000 most frequent lemmas, their word class and frequency.

Identifier
DOI	http://datadoi.ee/handle/33/93
Related Identifier	https://doi.org/10.15155/1-00-0000-0000-0000-001A3L
Metadata Access	https://datadoi.ee/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:datadoi.ee:33/93

Provenance
Creator	Lippus, Pärtel
Publisher	University of Tartu
Publication Year	2019
Rights	info:eu-repo/semantics/openAccess
OpenAccess	true

Representation
Resource Type	info:eu-repo/semantics/dataset; word frequency table
Format	text/plain
Discipline	Other