-
German causal language annotations and lexicon (verbs, nouns, prepositions) (DE)
Annotations of causal verbs, nouns and prepositions in context and lexicon file for causal verbs, nouns and prepositions. -
CITADEL: Computational Investigation of the Topographical and Architectural D...
The data found in this repository contain the basis for the historical, architectural, and geo-spatial analyses discussed in the dissertation entitled: CITADEL – Computation... -
Accompanying Code and Models for Chapter 5 of the PhD Thesis "Global Inferenc...
This release contains the source code used for Chapter 5 of the PhD thesis "Global Inference and Local Syntax Representations for Event Extraction". It contains the... -
Negative Sampling for Learning Knowledge Graph Embeddings
Reimplementation of four KG factorization methods and six negative sampling methods. Abstract Knowledge graphs are large, useful, but incomplete knowledge repositories. They... -
Topological Field Labeler for German
This resource contains the code of the topological labeler used in the paper: Do and Rehbein (2020). "Parsers Know Best: German PP Attachment Revisited". For this tool, labeling... -
Genre-sensitive Neural Situation Entity classifier (DE, EN)
This is a Classifier for situation entity types as described in Becker et al., 2017. These clause types depend on a combination of syntactic-semantic and contextual features. We... -
Pre-trained POS tagging models for German social media
Pre-trained POS tagging models for the HunPos tagger (Halácsy et al. 2007) the biLSTM-char-CRF tagger (Reimers & Gurevych 2017) Online-Flors (Yin et al. 2015).... -
ACL word segmentation correction
The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other... -
BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)
BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed,... -
6D Object Pose Estimation using 3D Object Coordinates [Data]
Supplementary training data and binaries for 6D object pose estimation, particularly a dataset of 20 objects under various lighting conditions with RGB-D images, ground truth... -
ErKon3D - Quantifying Deformation in Aegean Sealing Practices [Dataset]
In Bronze Aegean society, seals played an important role by authenticating, securing and marking. The study of the seals and their engraved motifs provides valuable insight into... -
Abstract Anaphora Resolution [Source Code]
Abstract Anaphora Resolution (AAR) aims to find the interpretation of nominal expressions (e.g., this result, those two actions) and pronominal expressions (e.g., this, that,... -
Encoder-Decoder Model for Semantic Role Labeling
Abstract (Daza & Frank 2019): We propose a Cross-lingual Encoder-Decoder model that simultaneously translates and generates sentences with Semantic Role Labeling annotations... -
LibriVoxDeEn - A Corpus for German-to-English Speech Translation and Speech R...
This dataset is a corpus of sentence-aligned triples of German audio, German text, and English translation, based on German audio books. The corpus consists of over 100 hours of... -
Test data for the Pooch library
Pooch is an open-source Python library for data download. This archive contains testing data for Pooch's DataVerse download functionality. -
MACE-AL-TREE
An method for detecting noise in automatically annotated dependency parse trees, combining MACE (Hovy et al. 2013) with Active Learning. -
AMR parse quality prediction [Source Code]
Accuracy prediction for AMR parsing predicts 33 accuracy metrics for a given sentence and its (automatic) AMR parse Abstract (Opitz and Frank, 2019): Semantic proto-role... -
CO-NNECT
This repository contains our path generation framework Co-NNECT, in which we combine two models for establishing knowledge relations and paths between concepts from sentences,... -
tweeDe
A German UD Twitter treebank, with >12,000 tokens from 519 tweets, annotated in the Universal Dependencies framework -
IKAT-EN
A corpus consisting of high-quality human annotations of missing and implied information in argumentative texts (English version). The data is further annotated with semantic...