Archaeological entities and timespans extracted from all archaeology documents available in DANS EASY in 2017

We trained a BERT language model for Dutch Archaeology, and fine-tuned it to perform Named Entity Recognition for 6 categories of entity: artefacts, materials, time periods, places, contexts and species.

For each document, we extracted all entities, and translated time periods to year ranges. All this information is stored - together with DANS metadata such as author, title, etc - in a JSON file for each document.

This is research output of the PhD research by Alex Brandsen, for the AGNES search engine project.

Identifier
DOI https://doi.org/10.17026/dans-zcs-7b72
PID https://nbn-resolving.org/urn:nbn:nl:ui:13-ic-d8mp
Metadata Access https://easy.dans.knaw.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:easy.dans.knaw.nl:easy-dataset:215805
Provenance
Creator Brandsen, A ORCID logo
Publisher Data Archiving and Networked Services (DANS)
Contributor Brandsen, A; A Brandsen (Leiden University)
Publication Year 2021
Rights info:eu-repo/semantics/openAccess; License: http://creativecommons.org/licenses/by-nc/4.0/; http://creativecommons.org/licenses/by-nc/4.0/
OpenAccess true
Representation
Language Dutch; Flemish
Resource Type Dataset
Format text/json
Discipline Ancient Cultures; Archaeology; Humanities
Spatial Coverage Netherlands