DELFI.lt corpus

PID

DELFI.lt is corpus made of articles published by news portal DELFI.lt since March 2014 till November 2016. Metadata was collected with articles as well: author, title, date, source, link, category, number of words. This corpus is made of 190 000 news articles from 12 thematic categories: DELFI Faces (DELFI Veidai), Projects (Projektai), DELFI Science (DELFI Mokslas), DELFI Auto, Unidentified category, Sport, DELFI Life (DELFI Gyvenimas), DELFI People (DELFI Žmonės), DELFI CItizen (DELFI Pilietis), Business (Verslas), DELFI FIT, DELFI News (DELFI Žinios). All in all DELFI.lt corpus consists of 70 million words. The corpus is morphologically annotated with Universal Dependencies tags and is freely accessible for online search at http://tekstynas.mwe.lt/.

Identifier
PID http://hdl.handle.net/20.500.11821/30
Related Identifier http://mwe.lt/
Metadata Access https://clarin.vdu.lt/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.vdu.lt:20.500.11821/30
Provenance
Creator Bielinskienė, Agnė; Boizou, Loïc; Bumbulienė, Ieva; Kovalevskaitė, Jolanta; Krilavičius, Tomas; Mandravickaitė, Justina; Rimkutė, Erika; Vilkaitė-Lozdienė, Laura
Publisher Baltic Institute of Advanced Technology; Vytautas Magnus University
Publication Year 2019
OpenAccess true
Contact info(at)clarin.vdu.lt
Representation
Language Lithuanian
Resource Type corpus
Format downloadable_files_count: 0
Discipline Linguistics