-
German causal language annotations and lexicon (verbs, nouns, prepositions) (DE)
Annotations of causal verbs, nouns and prepositions in context and lexicon file for causal verbs, nouns and prepositions. -
Genre-sensitive Neural Situation Entity classifier (DE, EN)
This is a Classifier for situation entity types as described in Becker et al., 2017. These clause types depend on a combination of syntactic-semantic and contextual features. We... -
Pre-trained POS tagging models for German social media
Pre-trained POS tagging models for the HunPos tagger (Halácsy et al. 2007) the biLSTM-char-CRF tagger (Reimers & Gurevych 2017) Online-Flors (Yin et al. 2015).... -
tweeDe
A German UD Twitter treebank, with >12,000 tokens from 519 tweets, annotated in the Universal Dependencies framework -
Affixoid Dataset (DE)
The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca... -
Sentiment Compound Data (DE)
This dataset contains gold standards that are required for building a classifier that automatically extracts opinion (noun) compounds. -
A harmonised testsuite for social media POS tagging (DE)
A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically... -
GER_SET: Situation Entity Type labelled corpus for German
Semantic clause types, also called Situation Entity (SE) types (Smith, 2003) are linguistic characterizations of aspectual properties shown to be useful for tasks like... -
GermEval-2018 Corpus (DE)
This dataset comprises the training and test data (German tweets) from the GermEval 2018 Shared on Offensive Language Detection. -
Replication Data for: A corpus-based analysis of the Dat-Nom/Nom-Dat alternat...
Dataset abstract The dataset includes an annotated sample of N = 13292 German written sentences with a Nominative and a Dative argument. The sentences comprise 76 different... -
Preliminary investigation of materials from the Hamburger Rotes Stadtsbuch 11...
Dataset of the preliminary investigation of materials from the Hamburger Rotes Stadtsbuch 111-1_RSH. Devices used: Elio (Bruker/XGLab): 40 kV and 80 µA, 60s... -
German in the Netherlands
The project Deutsch in den Niederlanden [German in the Netherlands] is a student-led research project conducted at Leiden University in 2024-2025. Its main aim is to explore how... -
Wittgenstein Archives at the University of Bergen (WAB): WiTTLex - The WiTTFi...
WiTTLex - The WiTTFind Lexicon of Wittgenstein’s Philosophical Nachlass, with Frequency Lists and Indication of the Words’ Sources in the Nachlass WiTTLex is an electronic... -
Parallel Corpus of documents from the Technical Regulations Information Syste...
Specialized parallel corpus Spanish-German (ES-ES, DE-AT and DE-DE), texts from the European Commission between 1997-2010. The texts are technical regulations in a variety of... -
OpenEDGeS (2021-05-24)
The public license subset of the EDGeS Diachronic Bible Corpus, a diachronically and synchronically parallel corpus of Bible translations in Dutch,English, German and Swedish,... -
Europarl – svenska-tyska (2013-11-18) Europarl – Swedish-German (2013-11-18)
Part of European Parliament Proceedings Parallel Corpus Del av European Parliament Proceedings Parallel Corpus -
Khresmoi Summary Translation Test Data 2.0
This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech,... -
Word-final /s/ durations in spoken German
German has various homophonous sibilant fricatives of phonemic or morphemic nature that can appear in word-final position. In English, the functional status of a word-final \s\... -
Khresmoi Summary Translation Test Data 1.1
This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German. -
Khresmoi Query Translation Test Data 2.0
This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans...