Dataset - B2FIND

German causal language annotations and lexicon (verbs, nouns, prepositions) (DE)

Annotations of causal verbs, nouns and prepositions in context and lexicon file for causal verbs, nouns and prepositions.

CITADEL: Computational Investigation of the Topographical and Architectural D...

The data found in this repository contain the basis for the historical, architectural, and geo-spatial analyses discussed in the dissertation entitled: CITADEL – Computation...

Accompanying Code and Models for Chapter 5 of the PhD Thesis "Global Inferenc...

This release contains the source code used for Chapter 5 of the PhD thesis "Global Inference and Local Syntax Representations for Event Extraction". It contains the...

Negative Sampling for Learning Knowledge Graph Embeddings

Reimplementation of four KG factorization methods and six negative sampling methods. Abstract Knowledge graphs are large, useful, but incomplete knowledge repositories. They...

Topological Field Labeler for German

This resource contains the code of the topological labeler used in the paper: Do and Rehbein (2020). "Parsers Know Best: German PP Attachment Revisited". For this tool, labeling...

Genre-sensitive Neural Situation Entity classifier (DE, EN)

This is a Classifier for situation entity types as described in Becker et al., 2017. These clause types depend on a combination of syntactic-semantic and contextual features. We...

Pre-trained POS tagging models for German social media

Pre-trained POS tagging models for the HunPos tagger (Halácsy et al. 2007) the biLSTM-char-CRF tagger (Reimers & Gurevych 2017) Online-Flors (Yin et al. 2015)....

ACL word segmentation correction

The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other...

BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)

BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed,...

6D Object Pose Estimation using 3D Object Coordinates [Data]

Supplementary training data and binaries for 6D object pose estimation, particularly a dataset of 20 objects under various lighting conditions with RGB-D images, ground truth...

ErKon3D - Quantifying Deformation in Aegean Sealing Practices [Dataset]

In Bronze Aegean society, seals played an important role by authenticating, securing and marking. The study of the seals and their engraved motifs provides valuable insight into...

Abstract Anaphora Resolution [Source Code]

Abstract Anaphora Resolution (AAR) aims to find the interpretation of nominal expressions (e.g., this result, those two actions) and pronominal expressions (e.g., this, that,...

Encoder-Decoder Model for Semantic Role Labeling

Abstract (Daza & Frank 2019): We propose a Cross-lingual Encoder-Decoder model that simultaneously translates and generates sentences with Semantic Role Labeling annotations...

LibriVoxDeEn - A Corpus for German-to-English Speech Translation and Speech R...

This dataset is a corpus of sentence-aligned triples of German audio, German text, and English translation, based on German audio books. The corpus consists of over 100 hours of...

Test data for the Pooch library

Pooch is an open-source Python library for data download. This archive contains testing data for Pooch's DataVerse download functionality.

MACE-AL-TREE

An method for detecting noise in automatically annotated dependency parse trees, combining MACE (Hovy et al. 2013) with Active Learning.

AMR parse quality prediction [Source Code]

Accuracy prediction for AMR parsing predicts 33 accuracy metrics for a given sentence and its (automatic) AMR parse Abstract (Opitz and Frank, 2019): Semantic proto-role...

CO-NNECT

This repository contains our path generation framework Co-NNECT, in which we combine two models for establishing knowledge relations and paths between concepts from sentences,...

tweeDe

A German UD Twitter treebank, with >12,000 tokens from 519 tweets, annotated in the Universal Dependencies framework

IKAT-EN

A corpus consisting of high-quality human annotations of missing and implied information in argumentative texts (English version). The data is further annotated with semantic...

3,524 datasets found