ORVELIT v3

PID

ORVELIT v3 (Lith.Originalios ir Vertimų Lietuvių Kalbos Tekstynas) is a comparable monolingual corpus of original and translated Lithuanian consisting of four sub-corpora of original and translated fiction and popular science literature (approx. 1m words each). A detailed information on the composition and lexical and morphological features of the raw (ORVELIT v1) and morphologically annotated (ORVELIT v2) versions of the corpus can be found in: Vaičenonienė, Jurgita, Kovalevskaitė, Jolanta, and Ringailienė, Teresė. 2017. Tekstynais paremti vertimų kalbos tyrimai ir šaltiniai. Kalbų studijos/ Studies about Languages, Nr. 30, pp. 42-55. https://www.vdu.lt/cris/handle/20.500.12259/56648?mode=simple Vaičenonienė, Jurgita, Kovalevskaitė, Jolanta. 2019. Leksinės ir morfologinės vertimų kalbos ypatybės. Darnioji daugiakalbystė/ Sustainable Multilingualism Nr. 14, pp. 208-235. https://www.vdu.lt/cris/handle/20.500.12259/98861 ORVELIT v3 has been modified by deleting the title, content, bibliographical lists, indexes and author(s) of the texts as well as mixing the individual texts at paragraph level. Cases when some other information was deleted were marked as . The corpus encoding is UTF-8. ORVELIT v3 includes a raw (ORVELIT v3_raw) and morphologically annotated (ORVELIT v3_annotated) corpus versions. The corpus was automatically morphologically annotated with Semantika.lt analyser.

Identifier
PID http://hdl.handle.net/20.500.11821/40
Metadata Access https://clarin.vdu.lt/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.vdu.lt:20.500.11821/40
Provenance
Creator Vaičenonienė, Jurgita; Kovalevskaitė, Jolanta; Boizou, Loïc
Publisher Vytautas Magnus University
Publication Year 2020
Rights ACA_CLARIN-LT_End-User-Licence-Agreement_EN-LT; https://clarin.vdu.lt/licenses/eula/ACA_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm; ACA
OpenAccess true
Contact info(at)clarin.vdu.lt
Representation
Language Lithuanian
Resource Type corpus
Format application/zip; application/octet-stream; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline Linguistics