-
Khresmoi Summary Translation Test Data 1.1
This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German. -
Khresmoi Query Translation Test Data 2.0
This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans... -
Dictionary of Bavarian Dialects
The database offers access to over 6 million dialectal linguistic evidences of the project "Dictionary of Bavarian Dialects" (German: Das Bayerische Wörterbuch) as image... -
The Franconian Dictionary
The database currently contains about 1 million dialectal linguistic evidences of the project "The Franconian Dictionary" (German: Das Fränkische Wörterbuch), each of which... -
CEHugeWebCorpus
This corpus was originally created for performance testing (server infrastructure CorpusExplorer - see: diskurslinguistik.net / diskursmonitor.de). It includes the filtered... -
HWC2023 –Hamburg.de Website Corpus 2023
A petition for a referendum (called: "Schluss mit Gendersprache in Verwaltung und Bildung" / eng.: "abolition of gender language in administration and education") was formed in... -
MT@BZ translation corpus v1.0
The MT@BZ is a translation corpus that consists of 52 decrees published by the Autonomous Province of Bolzano (South Tyrol) aligned with their machine translated versions. More... -
MT@BZ annotation guidelines v1.0
The MT@BZ annotation guidelines are guidelines for legal Italian-German machine translation quality assessment. Particularly, they cover the South Tyrolean German variety. They... -
Multispektraler Datensatz zu der Handschrift Zentralbibliothek Zürich, RP 3 "...
The manuscript RP3 of the Zentralbibliothek in Zurich contains six love letters, the ‘Zürcher Liebesbriefe’ (‘Zurich Love Letters’) and one... -
Background data for: Sprachliches Place-Making. Eine sprachwissenschaftliche ...
This dataset contains corpus statistical calculations that were used to investigate patterns of linguistic place-making in the German language. Patterns are defined here... -
Fachcurricula der Primar- und Sekundarstufe in Deutschland [Curricula for pri...
+++ english version below +++ Diese Daten bilden einen Ausschnitt von Lehrplänen / Bildungsplänen / Rahmenplänen (im Folgenden als “Curricula”... -
Multispectral Imaging Data of Manuscripts Ms. Bos. q. 19, Ms. El. f. 83, Ms. ...
Multispectral Imaging Data of following objects owned by the Thüringer Universitäts- und Landesbibliothek (ThULB) in Jena: Manuscripts: Ms. Bos. q. 19 Ms.... -
Zwei-Wellen-Panel Hamburger Primarschulstudierender in der Bachelorphase Two...
Das Projekt untersucht die Einführung des neuen Studiengangs zum Lehramt an Grundschulen (LAGS) und zum Lehramt Sonderpädagogik Profil Grundschule (LAS-G) an der Universität... -
Biblia Pauperum-Transcriptions. A Pilot
This presentation introduces the conceptual framework behind Biblia pauperum-Transcriptions, a browser-based viewer for manuscript transcriptions and digital facsimiles. This... -
German causal language annotations and lexicon (verbs, nouns, prepositions) (DE)
Annotations of causal verbs, nouns and prepositions in context and lexicon file for causal verbs, nouns and prepositions. -
Genre-sensitive Neural Situation Entity classifier (DE, EN)
This is a Classifier for situation entity types as described in Becker et al., 2017. These clause types depend on a combination of syntactic-semantic and contextual features. We... -
Pre-trained POS tagging models for German social media
Pre-trained POS tagging models for the HunPos tagger (Halácsy et al. 2007) the biLSTM-char-CRF tagger (Reimers & Gurevych 2017) Online-Flors (Yin et al. 2015).... -
tweeDe
A German UD Twitter treebank, with >12,000 tokens from 519 tweets, annotated in the Universal Dependencies framework -
Affixoid Dataset (DE)
The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca... -
Sentiment Compound Data (DE)
This dataset contains gold standards that are required for building a classifier that automatically extracts opinion (noun) compounds.