-
Representation in Wikipedia: Intersectional Insights on Gender and Diversity ...
This dataset shows values taken from biography articles that have appeared in the "From today's featured article", "Did you know..." and "On this day" sections of the Front... -
Gender and Intersectional Disparities in Biographies on English and Spanish W...
El següent dataset conté dos carpetes amb dades diferents, les quals inclouen: El conjunt de dades de la carpeta amb nom "Gender" proporciona la distribució per gènere de les... -
Bibliographic data for Wikipedia gender gap: a scoping review
The following dataset corresponds to the bibliographic data analysed in the publication titled 'Wikipedia Gender Gap: A Scoping Review'. The data is used for a scoping review... -
The Online conversation threads repository
This repository contains datasets with online conversation threads collected and analyzed by different researchers. Currently, you can find datsets from different news... -
Evolution of Wikipedia Categories
Knowledge Space Lab: Design versus Emergence. Comparison between the structure and evolution of categories in the Wikipedia and the Universal Decimal Classification. 2009-2011.... -
Wikipedia Discussion Corpora
Various annotated Wikipedia resources -
Wikipedia Edit Category Corpus
For the corpus itself, please refer to/cite: Johannes Daxenberger and Iryna Gurevych (2012). "A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia... -
Wikipedia Edit-Turn-Pairs
Corresponding and Non-Corresponding Edit-Turn-Pairs from the English Wikipedia. The ETP-gold corpus is based on article edits and discussion page turns from the English... -
Comparable corpora of South-Slavic Wikipedias CLASSLA-Wikipedia 1.0
This comparable corpus collection consists of Wikipedia dumps of the Bosnian, Croatian, Macedonian, Montenegrin, Serbian, Serbo-Croatian and Slovenian Wikipedia, harvested on... -
Wikipedia talk corpus Janes-Wiki 1.0
Janes-Wiki is an annotated corpus of discussion pages from the Slovene Wikipedia from the period 2003-08 to 2017-06. The corpus contains page and user talks and is structured... -
Slovene corpus for general relation extraction SloREL 1.0
The SloREL corpus contains annotations for training relation extraction models on Slovene documents. It contains documents from Slovene Wikipedia with annotated entities and... -
Slovene corpus for general relation extraction SloREL 1.1
The SloREL corpus contains annotations for training relation extraction models on Slovene documents. It contains documents from Slovene Wikipedia with annotated entities and... -
Slovenian Definition Extraction training dataset DF_NDF_wiki_slo 1.0
The Slovenian definition extraction training dataset DF_NDF_wiki_slo contains 38613 sentences extracted from the Slovenian Wikipedia. The first sentence of a term's description... -
python-g419wikitools-1.0
Zestaw skryptów w języku Python do wygenerowania słownika odmiany fraz w oparciu o linki wewnętrzne Wikipedii. Efektem analizy dumpa Wikipedii jest zestaw plików, zawierających:... -
CorpusExplorer
Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks... -
English-Czech Corpus from Wikipedia
Sentence-parallel corpus made from English and Czech Wikipedias based on translated articles from English into Czech. The work done is described in the paper: ŠTROMAJEROVÁ,... -
Plaintext Wikipedia dump 2018
Wikipedia plain text data obtained from Wikipedia dumps with WikiExtractor in February 2018. The data come from all Wikipedias for which dumps could be downloaded at...