-
PANACEA Environment Corpus n-grams FR (French)
This data set contains French word n-grams and French word/tag/lemma n-grams in the "Environment" (ENV) domain. N-grams are accompanied by their observed frequency counts. The... -
Replication data for: CHIR99021 causes inactivation of Tyrosine Hydroxylase a...
CHIR99021, also known as laduviglusib or CT99021, is a Glycogen-synthase kinase 3β (GSK3β) inhibitor, which has been reported as a promising drug for cardiomyocyte regeneration... -
Database of Catalan Adjectives
The database contains 2,296 alphabetically ordered adjective lemmata (rows) and 45 columns with various types of linguistic information about each lemma. The adjectives... -
Corpus de les construccions comparatives intensificadores de la lletjor en ca...
Corpus de les construccions comparatives intensificadores en català, espanyol, anglès i francés. Les ocurrències que composen cadascun dels corpus han estat extretes a partir... -
PANACEA Environment Corpus n-grams ES (Spanish)
This data set contains Spanish word n-grams and Spanish word/tag/lemma n-grams in the "Environment" (ENV) domain. N-grams are accompanied by their observed frequency counts. The... -
PANACEA Labour Legislation Corpus n-grams EN (English)
This data set contains English word n-grams and English word/tag/lemma n-grams in the "labour Legislation" (LAB) domain. N-grams are accompanied by their observed frequency... -
PANACEA Labour Legislation Corpus n-grams FR (French)
This data set contains French word n-grams and French word/tag/lemma n-grams in the "Labour" (LAB) domain. N-grams are accompanied by their observed frequency counts. The length... -
PANACEA Environment Corpus n-grams EN (English)
- This data set contains English word n-grams and English word/tag/lemma n-grams in the "Environment" (ENV) domain. N-grams are accompanied by their observed frequency counts.... -
PANACEA Environment Corpus n-grams IT (Italian)
This data set contains Italian word n-grams and Italian word/tag/lemma n-grams in the "Environment" (ENV) domain. N-grams are accompanied by their observed frequency counts. The... -
PANACEA Labour Legislation Corpus n-grams IT (Italian)
- This data set contains Italian word n-grams and Italian word/tag/lemma n-grams in the "Labour" (LAB) domain. N-grams are accompanied by their observed frequency counts. The... -
Pre-trained POS tagging models for German social media
Pre-trained POS tagging models for the HunPos tagger (Halácsy et al. 2007) the biLSTM-char-CRF tagger (Reimers & Gurevych 2017) Online-Flors (Yin et al. 2015).... -
Replication Data for: “Threat” in Russian – A Linguistic Perspective
The dataset includes examples of usages of groza and ugroza from the Russian National Corpus (RNC). The dataset covers the period from 1700 to 2020 and consists of 4858... -
Corpus_Sienkiewicz_Novels
Sienkiewicz Novels -
Parallel Corpus of documents from the Technical Regulations Information Syste...
Specialized parallel corpus Spanish-German (ES-ES, DE-AT and DE-DE), texts from the European Commission between 1997-2010. The texts are technical regulations in a variety of... -
Parallel Corpus of documents from the Technical Regulations Information Syste...
TRIS Spanish-German parallel corpus (v0.3) Specialized parallel corpus Spanish-German (ES-ES, DE-AT and DE-DE), texts from the European Commission between 1997-2010. The texts... -
8 SIDOR (2017-10-16)
News articles from 8 SIDOR. The material is sentence scrambled. Nyhetsartiklar från 8 SIDOR. Materialet är meningsomkastat. -
Af Soomaali 1971-79 (2017-10-16)
Af Soomaali 1971-79. The material is sentence scrambled. Af Soomaali 1971-79. Materialet är meningsomkastat. -
Laws of 1734 (2017-10-16) 1734 års lag (2017-10-16)
The Swedish Laws of 1734. 1734 års lag. -
Af-Soomaali 2001 (2017-10-16)
Af-Soomaali 2001. The material is sentence scrambled. Af-Soomaali 2001. Materialet är meningsomkastat.