Lithuanian Parliament Corpus for Authorship Attribution

PID

23.9 m word Lithuanian Parliament corpus is specially designed for authorship attribution task. The corpus consists of 111 thousand samples of speech transcripts by 147 parliamentarians in Lithuanian Seimas. It covers the period of March, 1990 – December, 2013. Each line in a corpus file contains a different text feature that can be used in the authorship attribution task (Kapočiūtė Dzikienė et al. 2014). References: Kapočiūtė-Dzikienė, Jurgita, Utka, Andrius, Šarkutė, Ligita. 2014. Feature exploration for authorship attribution of Lithuanian parliamentary speeches. Text, speech and dialogue: 17th international conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014: proceedings, 93-100. Kapočiūtė-Dzikienė, Jurgita; Nivre, Joakim; Krupavičius, Algis. 2013. Lithuanian Dependency Parsing with Rich Morphological Features. Empirical Methods in Natural Language Processing - 4th Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL'2013), psl. 12-21. Zinkevičius, Vytautas. 2000. Lemuoklis - morfologinei analizei. Gudaitis, L. (ed.) Darbai ir Dienos, 24: 246-273.

Identifier
PID http://hdl.handle.net/20.500.11821/17
Related Identifier http://dangus.vdu.lt/~jkd/eng/
Metadata Access https://clarin.vdu.lt/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.vdu.lt:20.500.11821/17
Provenance
Creator Kapočiūtė-Dzikienė, Jurgita; Šarkutė, Ligita; Utka, Andrius
Publisher Vytautas Magnus University
Publication Year 2017
Rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT; https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm; PUB
OpenAccess true
Contact info(at)clarin.vdu.lt
Representation
Language Lithuanian
Resource Type corpus
Format application/pdf; application/zip; text/plain; charset=utf-8; downloadable_files_count: 4
Discipline Linguistics