Wordlist of the Contemporary Corpus of Lithuanian language

PID

Dabartinės lietuvių kalbos tekstyno žodžių formų dažniniai sąrašai Worlists of Wordforms of the Contemporary Corpus of Lithuanian language

Tekstyno struktūra/Corpus Structure

Patekstynis/Subcorpus Words,m Proportion Grožinė lit./Fiction 15.54 12.6% Negrožinė lit./Non-fiction 19.99 16.2% Administracinė lit./ Documents 11.19 9.1% Periodika/Periodicals 76.24 61.8% Sakytinė kalba/Speech Corpus 0.49 0.4%


Visas/Total 123.45 100%

Tinklalapiai/Website: tekstynas.vdu.lt corpus.vdu.lt

Data/Date: 2016.10.17 2022.11.15* * upgraded method of handling punctuation and format

Metodas/Method: sed -e 's/]>//g' .txt | tr q'[:punct:]' ' ' | tr -s ' ' | tr ' ' '\n' | tr '[:upper:]' '[:lower:]' | grep -v '[^a-z]' | grep -v "^\s*$" | sort | uniq -c | sort -rn > freq-visas.txt

Kaip cituoti/Reference Rimkutė E., Kovalevskaitė J., Melninkaitė V., Utka A., Vitkutė-Adžgauskienė D. 2010: Corpus of Contemporary Lithuanian Language – the Standardised Way. Proceedings of the Fourth International Conference Human Language Technologies – The Baltic Perspective, 154–160.

Licencija/Licence: CLARIN-LT PUB

Identifier
PID http://hdl.handle.net/20.500.11821/8
Metadata Access https://clarin.vdu.lt/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.vdu.lt:20.500.11821/8
Provenance
Creator Utka, Andrius
Publisher Vytautas Magnus University
Publication Year 2016
Rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT; https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm; PUB
OpenAccess true
Contact info(at)clarin.vdu.lt
Representation
Language Lithuanian
Resource Type lexicalConceptualResource
Format application/zip; text/plain; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline Linguistics