Dataset - B2FIND

Constrained C-Test Generation via Mixed-Integer Programming (Supplementary Ma...

This work proposes a novel method to generate C-Tests; a deviated form of cloze tests (a gap filling exercise) where only the last part of a word is turned into a gap. In...

Dataset for color terms, 2012

This dataset comprises adjective-noun phrases with color terms.

AMR parse quality prediction [Source Code]

Accuracy prediction for AMR parsing predicts 33 accuracy metrics for a given sentence and its (automatic) AMR parse Abstract (Opitz and Frank, 2019): Semantic proto-role...

NLP in Diagnostic Texts from Nephropathology [Research Data]

This data set contains all annotated topic word tables from the work "NLP in Diagnostic Texts from Nephropathology", as well as all pre-processed and tf-idf-vectorized text...

WikiEvents Dataset from January 2020 to December 2022

WikiEvents is a knowledge graph based dataset for NLP and event-related machine learning tasks. This dataset includes RDF data in JSON-LD about events between January 2020 and...

Propositional Claim Detection (NLP Datensatz)

Es handelt sich um einen natural language processing (NLP) Trainingsdatensatz. Modelle, die auf diesen Daten trainiert werden, sollen Behauptungen klassifizieren können, die...

Combining text and vision in compound semantics: Towards a cognitively plausi...

In the current state-of-the art distributionalsemantics model of the meaning of noun-noun compounds (such aschainsaw, but-terfly, home phone),CAOSS(Marelli...

Exploring Jiu-Jitsu Argumentation for Writing Peer Review Rebuttals

This is the resource for the dataset and models released as a part of our EMNLP 2023 paper "Exploring Jiu-Jitsu Argumentation for Writing Peer Review Rebuttals"

Data Linking Workshop 2023: Computer Vision and Natural Language Processing –...

The humanities meet computer science to create new synergies using computer vision and natural language processing. Aim & Scope Historians are increasingly using...

Data Linking Workshop 2023: Computer Vision and Natural Language Processing –...

The humanities meet computer science to create new synergies using computer vision and natural language processing. Aim & Scope Historians are increasingly using...

Annotation Curricula to Implicitly Train Non-Expert Annotators

Annotation studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain. This can be overwhelming in the beginning,...

Wikipedia Discussion Corpora

Various annotated Wikipedia resources

Opinion Mining Corpus on German Tweets about the Covid-19 Pandemic

The UKP Covid-19 Twitter Corpus includes 2,785 tweets annotated by student annotators and 200 expert-annotated tweets in German. Each tweet was annotated as either a supporting...

From Zero to Hero: Human-In-The-Loop Entity Linking in Low Resource Domains

This dataset has no description

ChunkRel WS

ChunkRel-WS is a prototype service for recognition of three syntactic relations between chunks. The service may be run against plain text (input format: text), then the...

Cinderella - tool for Clustering and Classifications of Texts in Polish

System for clustering and classifications of Texts in Polish. Source code.

WebStylo

Web based, open stylometry system based on Multilevel Text Analysis. Runs cluto and stylo (R system) clusterisation methods. Based on Natural Language Processing Workflow...

Chunker WS

Chunker-WS provides shallow parsing of Polish. The parser may be run against plain text (input format: text, then it runs WCRFT for tagging) or already tagged input (other input...

Movie Title Puns

Context The data is based on the following paper on pun generation: Hämäläinen, M., & Alnajjar, K. (2019). Modelling the Socialization of Creative Agents in a...

CorpusExplorer

Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks...

29 datasets found