Lithuanian morphologically annotated corpus - MATAS v3.0

PID

MATAS corpus (version 3.0)

DESCRIPTION Updated, manually checked, morphologically annotated corpus MATAS

LANGUAGE Lithuanian

PREVIOUS VERSIONS 1. MATAS v0.2 (http://hdl.handle.net/20.500.11821/9) 2. MATAS v1.0 (http://hdl.handle.net/20.500.11821/33)

FORMATS, STANDARTS 1. CoNLL-U (https://universaldependencies.org/format.html); 2. JABLONSKIS tagset v2 (https://sitti.vdu.lt/jablonskis-en/); 3. MULTEXT-East tagset (http://nl.ijs.si/ME/V4/msd/html/index.html) 4. UTF-8

SIZE Tokens (incl. punctuation): 2,137,287 Words: 1,694,819 Sentences: 144,047 Documents: 1,234

GENRES Contains 5 genres: Documents (14%), Fiction (19%), Periodicals (36%), Scientific texts (24%), Transcripts(7%)

PUBLISHER Institute of Digital Resources and Interdisciplinary Research (SITTI), Vytautas Magnus University

Identifier
PID http://hdl.handle.net/20.500.11821/61
Metadata Access https://clarin.vdu.lt/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.vdu.lt:20.500.11821/61
Provenance
Creator Rimkutė, Erika; Bielinskienė, Agnė; Dadurkevičius, Virginijus; Kovalevskaitė, Jolanta; Utka, Andrius; Boizou, Loïc
Publisher Institute of Digital Resources and Interdisciplinary Research (SITTI), Vytautas Magnus University
Publication Year 2024
Rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT; https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm; PUB
OpenAccess true
Contact info(at)clarin.vdu.lt
Representation
Language Lithuanian
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; text/plain; downloadable_files_count: 3
Discipline Linguistics