-
Corpus of Academic Slovene (BSc/BA theses) KAS-dipl 1.0
The KAS-dipl corpus of Slovene BSc/BA theses consists of almost 65,000 texts (3,5 million pages or 1,1 billion tokens) written 2000 - 2018 and gathered from the digital... -
Corpus of academic Slovene KAS 1.0
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600 PhD theses (82 thousand texts, 5 million pages or 1,7 billion tokens)... -
Abstracts from the KAS corpus KAS-Abs 2.0
The KAS-abs 2.0 corpus contains 125,202 automatically identified Slovenian and/or English abstracts from BSc/BA, MSc/MA, and PhD theses included in the KAS Corpus of Academic... -
English-Slovene term candidates KAS-biterm 1.0
KAS-biterm is an automatically generated glossary of English terms with their translations into Slovene. The pairs, possibly with their English and Slovene acronyms, were... -
Abstracts from the KAS corpus KAS-Abs 1.0
The KAS-abs corpus contains 108,254 automatically identified Slovenian and/or English abstracts (30 million words) from 62,000 BSc/BA, MSc/MA, and PhD theses included in the KAS... -
Summarization datasets from the KAS corpus KAS-Sum 1.0
Summarization datasets were created from the text bodies in the KAS 2.0 corpus (http://hdl.handle.net/11356/1448) and the abstracts from the KAS-Abs 2.0 corpus... -
Machine Translation datasets from the KAS corpus KAS-MT 1.0
The Machine Translation datasets KAS-MT 1.0 contain automatically sentence-aligned Slovene and English plain-text abstracts from KAS-Abs 2.0 (http://hdl.handle.net/11356/1449)... -
Corpus of academic Slovene KAS 2.0
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600 PhD theses (82 thousand texts, 5 million pages or 1,5 billion tokens)...