Dataset - B2FIND

Verb Sense Labelling

Vocabulary used for the creation of sense patterns:

Whittle Networks datasets

Datasets for paper "Whittle Networks: A Deep Likelihood Model for Time Series" Paper at http://proceedings.mlr.press/v139/yu21c.html Code at...

Visual Feature Track Dataset

This dataset contains 282 visual feature tracks. A visual feature track is a sequence of feature observations of the same real 3D-landmark in consecutive image frames. These...

WWW 2019 X-Ling Question Retrieval Data v1

This repository contains the data and code to reproduce the results of our paper "Improved Cross-Lingual Question Retrieval for Community Question Answering"...

Personality Profiling of Fictional Characters using Sense-Level Links between...

This dataset contains the personality gold standard of 298 book characters annotated for their MBTI traits, gathered manually from the http://mbti-databank.com/ website and...

Football Coreference Corpus

This script generates: the original sentence-level Football Coreference Corpus (FCC), a version of the sentence-level FCC which was cleaned and updated after manual review,...

Forum Post Quality Dataset

The dataset has been compiled from Nabble.com. It has been used and is described in the papers listed below. The data can be obtained on request.

Fine-tuned model weights for Stance Detection Benchmark System

This collection includes model weights (BERT-based), fine-tuned in a multi-task setting on 10 heterogeneous stance detection datasets. For more information, please refer to the...

Cognate pairs for several languages

Cognates for the following language pairs can be used for research purposes: en-es, en-de, en-ru, en-el, en-fa, de-cz. Includes: * The training and test data for the en-es...

UKP Convincing Arguments v1

Corpus content UKPConvArg1-full-XML This is the full corpus as referred in the article (Table 2, UKPConvArgAll). It contains 32 xml files, each file corresponding to one...

RWSE Wikipedia Revision Dataset

Real-word spelling error datasets mined from the Wikipedia revision history. Each instance consists of the original sentence with an error and the sentence where the error has...

Inverted Polarity Bigram Lexicons

Sentiment prediction from Twitter is of the utmost interest for research and commercial organizations. Systems are usually using lexicons, where each word is positive or...

COALA StackExchange Answer Selection Datasets v1

This resource contains the five non-factoid answer selection datasets based on StackExchange of our paper "COALA: A Neural Coverage-Based Approach for Long Answer Selection with...

Wikipedia Discussion Corpora

Various annotated Wikipedia resources

Lexical Substitution Dataset for German.

This article describes a lexical substitution dataset for German. The whole dataset contains 2,040 sentences from the German Wikipedia,with one target word in each sentence....

Umfragedaten zu "NFDI4Ing - Rückmeldung aus den Forschungscommunities"

Die Online-Umfrage "NFDI4Ing - Rückmeldung aus den Forschungscommunities" wurde vom Konsortium NFDI4Ing (Nationale Forschungsdateninfrastruktur für die Ingenieurwissenschaften,...

Opposing Arguments in Persuasive Essays

This corpus includes 402 persuasive essays. Each essay is annotated as “positive” if it includes an opposing argument and “negative” if it includes only arguments supporting the...

Elementary Concept Reasoning (ECR)

Elementary Concept Reasoning (ECR) is a novel dataset focusing on object-centric visual concept learning. It contains RGB images (64×64×3) of 2D geometric objects on a constant...

GLASS

A newly sense-annotated version of the German lexical substitution data set.

Germeval 2017 Embeddings

Word Embeddings to our paper and conll converted data of the shared task

87 datasets found