-
Serbian Twitter training corpus ReLDI-NormTagNER-sr 2.0
ReLDI-NormTagNER-sr 2.0 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation,... -
Croatian Twitter training corpus ReLDI-NormTagNER-hr 3.0
ReLDI-NormTagNER-hr 3.0 is a manually annotated corpus of Croatian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation,... -
Deep Sequoia corpus - PARSEME-FR corpus - FrSemCor
The Sequoia corpus is a set of 3,099 linguistically-annotated French sentences, originating from four sources (Europarl, European Agency Reports, French regional journal L'Est... -
Parallel Global Voices, Czech-English NER+NEL
Annotation of named entities to the existing source Parallel Global Voices, ces-eng language pair. The named entity annotations distinguish four classes: Person, Organization,... -
Czech Legal Text Treebank 2.0
The Czech Legal Text Treebank 2.0 (CLTT 2.0) annotates the same texts as the CLTT 1.0. These texts come from the legal domain and they are manually syntactically annotated. The... -
Czech Court Decisions Dataset
We present the Czech Court Decisions Dataset (CCDD) -- a dataset of 300 manually annotated court decisions published by The Supreme Court of the Czech Republic and the... -
ACTER (Annotated Corpora for Term Extraction Research) v1.3
The ACTER (Annotated Corpora for Term Extraction Research) is an annotated dataset for term extraction. Terms and Named Entities have been manually annotated in specialised... -
ACTER (Annotated Corpora for Term Extraction Research) v1.4
The ACTER (Annotated Corpora for Term Extraction Research) is an annotated dataset for term extraction. Terms and Named Entities have been manually annotated in specialised... -
ACTER (Annotated Corpora for Term Extraction Research) v1.5
ACTER (Annotated Corpora for Term Extraction Research) is a manually annotated dataset for term extraction, covering 3 languages (English, French, and Dutch), and 4 domains... -
Archaeological entities and timespans extracted from all archaeology document...
We trained a BERT language model for Dutch Archaeology, and fine-tuned it to perform Named Entity Recognition for 6 categories of entity: artefacts, materials, time periods,...