-
NCSE v2.0: A Dataset of OCR-Processed 19th Century English Newspapers
NCSE v2.0 Dataset RepositoryThis repository contains the NCSE v2.0 dataset and associated supporting data used in the paper "Reading the unreadable: Creating a dataset of 19th... -
Dataset for color terms, 2012
This dataset comprises adjective-noun phrases with color terms. -
AMR parse quality prediction [Source Code]
Accuracy prediction for AMR parsing predicts 33 accuracy metrics for a given sentence and its (automatic) AMR parse Abstract (Opitz and Frank, 2019): Semantic proto-role... -
NLP in Diagnostic Texts from Nephropathology [Research Data]
This data set contains all annotated topic word tables from the work "NLP in Diagnostic Texts from Nephropathology", as well as all pre-processed and tf-idf-vectorized text... -
Movie Title Puns
Context The data is based on the following paper on pun generation: Hämäläinen, M., & Alnajjar, K. (2019). Modelling the Socialization of Creative Agents in a... -
WebStylo
Web based, open stylometry system based on Multilevel Text Analysis. Runs cluto and stylo (R system) clusterisation methods. Based on Natural Language Processing Workflow... -
Cinderella - tool for Clustering and Classifications of Texts in Polish
System for clustering and classifications of Texts in Polish. Source code. -
Chunker WS
Chunker-WS provides shallow parsing of Polish. The parser may be run against plain text (input format: text, then it runs WCRFT for tagging) or already tagged input (other input... -
ChunkRel WS
ChunkRel-WS is a prototype service for recognition of three syntactic relations between chunks. The service may be run against plain text (input format: text), then the... -
OpenLegalData (2022 - Corpus)
OpenLegalData is a free and open platform that makes legal documents and information available to the public. The aim of this platform is to improve the transparency of... -
CorpusExplorer
Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks... -
MSTperl parser
MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html). MST parser (Maximum Spanning Tree parser)... -
MSTperl parser (2015-05-19)
MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html). MST parser (Maximum Spanning Tree parser)... -
DZ Interset
DZ Interset is a means of converting among various tag sets in natural language processing. The core idea is similar to interlingua-based machine translation. DZ Interset... -
Transcribed newspaper articles from the NCSE collection
CLOCR-C: Transcribed newspaper articles from the NCSE collection This dataset contains 91 pairs of newspaper articles from the Nineteenth Century Serials Edition (NCSE). The... -
Scrambled text: training Language Models to correct OCR errors using syntheti...
This data repository contains the key datasets required to reproduce the paper "Scrambled text: training Language Models to correct OCR errors using synthetic data". In addition... -
Combining text and vision in compound semantics: Towards a cognitively plausi...
In the current state-of-the art distributionalsemantics model of the meaning of noun-noun compounds (such aschainsaw, but-terfly, home phone),CAOSS(Marelli... -
Propositional Claim Detection (NLP Datensatz)
Es handelt sich um einen natural language processing (NLP) Trainingsdatensatz. Modelle, die auf diesen Daten trainiert werden, sollen Behauptungen klassifizieren können, die... -
Evidence - Computer-assisted Interactive Extraction of Dictionary Examples fr...
Anonymized models from the expert and lay-user studies conducted in the project Evidence. Each model was train for 50-60 iterations on a specific word class (adjective, verb,... -
Re3: A Holistic Framework and Dataset for Modeling Collaborative Document Rev...
A dataset of aligned scientific paper revisions manually labeled according to their action and intent, and supplemented with the respective peer reviews and human-written edit...