-
Abstracts from the KAS corpus KAS-Abs 2.0
The KAS-abs 2.0 corpus contains 125,202 automatically identified Slovenian and/or English abstracts from BSc/BA, MSc/MA, and PhD theses included in the KAS Corpus of Academic... -
Corpus of Slovene linguistic scientific writing JezKor
JezKor is a collection of linguistic scientific writing in the Slovenian language. It consists of 43 monographs published between 2009 and 2022 by Fran Ramovš institute of... -
Machine Translation datasets from the KAS corpus KAS-MT 1.0
The Machine Translation datasets KAS-MT 1.0 contain automatically sentence-aligned Slovene and English plain-text abstracts from KAS-Abs 2.0 (http://hdl.handle.net/11356/1449)... -
Corpus of Academic Slovene (BSc/BA theses) KAS-dipl 1.0
The KAS-dipl corpus of Slovene BSc/BA theses consists of almost 65,000 texts (3,5 million pages or 1,1 billion tokens) written 2000 - 2018 and gathered from the digital... -
English-Slovene term candidates KAS-biterm 1.0
KAS-biterm is an automatically generated glossary of English terms with their translations into Slovene. The pairs, possibly with their English and Slovene acronyms, were... -
Corpus of academic Slovene KAS 2.0
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600 PhD theses (82 thousand texts, 5 million pages or 1,5 billion tokens)... -
Summarization datasets from the KAS corpus KAS-Sum 1.0
Summarization datasets were created from the text bodies in the KAS 2.0 corpus (http://hdl.handle.net/11356/1448) and the abstracts from the KAS-Abs 2.0 corpus... -
Corpus of Academic Slovene (MSc/MA theses) KAS-mag 1.0
The KAS-mag corpus of Slovene MSc/MA theses consists of almost 16,000 texts (1,360 thousand pages or 500 million tokens) written 2000 - 2018 and gathered from the digital... -
Corpus of Academic Slovene (PhD theses) KAS-dr 1.0
The KAS-dr corpus of Slovene PhD theses consists of almost 1,600 texts (266 thousand pages or 100 million tokens) written 2000 - 2018 and gathered from the digital libraries of... -
Corpus of Romanian Academic Genres ROGER
The corpus contains academic papers from eight disciplines, written by the Romanian students in native Romanian and English L2. The corpus was collected over a three-year period... -
Corpus of academic Slovene KAS 1.0
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600 PhD theses (82 thousand texts, 5 million pages or 1,7 billion tokens)... -
Czech and English abstracts of ÚFAL papers
This is a document-aligned parallel corpus of English and Czech abstracts of scientific papers published by authors from the Institute of Formal and Applied Linguistics, Charles... -
Czech and English abstracts of ÚFAL papers (2022-11-11)
This is a parallel corpus of Czech and mostly English abstracts of scientific papers and presentations published by authors from the Institute of Formal and Applied Linguistics,...