-
BWS Argument Similarity Corpus
The BWS Argument Similarity Corpus includes 3,400 sentence pairs for 8 controversial topics with 425 argument pairs each for every topic. Each argument-pair was annotated via... -
Re-rating Studies
A Reflective View on Text Similarity -
Predictive Whittle Networks for Time Series
Dataset for paper "Predictive Whittle Networks for Time Series" Use with code at: https://github.com/ml-research/PWN -
OSS-Net trained models
Trained OSS-Net models of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data". -
Wikipedia Article Feedback
The corpus lists article IDs of biographies of living and dead people, rated as above average or below average along four categories (trustowrthy, objective, well written,... -
Text Reuse Annotations
Text Reuse Detection Using a Composition of Text Similarity Measures -
German-English Modality Verbclasses
This is a semantic classification of more than 600 German lexical verbs and their English translation introduced in the paper: Judith Eckle-Kohler. Verbs Taking Clausal and... -
Turk Bootstrap Word Sense Inventory (TWSI) 2.0
Turk Bootstrap Word Sense Inventory (TWSI) 2.0. This lexical resource, created by a crowdsourcing process using Amazon Mechanical Turk (http://www.mturk.com), encompasses a... -
EUR-Lex Dataset
The EUR-Lex text collection is a collection of documents about European Union law. It contains many different types of documents, including treaties, legislation, case-law and... -
Wikipedia Edit Category Corpus
For the corpus itself, please refer to/cite: Johannes Daxenberger and Iryna Gurevych (2012). "A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia... -
Sense Similarity
Sense and Similarity: A Study of Sense-level Similarity Measures -
Domain-specific context-sensitive semantic verb relations
This is a data set of semantic verb relations in English from the domain of everyday educational topics. The data set consists of 12403 pairs of propositions which have been... -
CLEVR-Hans7
A compositionally complex data set for investigating confounders and explainability. -
Hierarchy Identification
The page list data sets and experiments presented in the paper Hierarchy Identification for Automatically Generating Table-of-Contents. -
Quality Flaw Prediction in Wikipedia
Dataset to extract reliable training instances from Wikipedia -
German Relatedness Datasets
The datasets on this page were obtained by asking human subjects to assign a similarity or relatedness judgment to a number of German word pairs. The datasets have been used to... -
Difficulty Prediction for Language Tests
This collection includes various resources for predicting the difficulty of language proficiency tests. -
Context-Aware Representations for Knowledge Base Relation Extraction
We provide a subcorpus of Wikipedia that was annotated with Wikidata relations using a distant supervision procedure. The corpus contains two types of annotations: entities and... -
Insufficiently Supported Arguments in Argumentative Essays
This corpus includes 1029 arguments taken from argumentative essays. Each argument is annotated as “insufficient” if its premises do not provide enough evidence for accepting or... -
Supplementary materials: Mining Legal Arguments in Court Decisions
Pre-trained transformer models; accompanying materials to the paper and its GitHub repository