Catalogus Epistularum Neerlandicarum (CEN): Letter and person metadata (1270-1820) curated by the SKILLNET project

Dataset

DOI

If you wish to explore, query, filter, visualize or export the data with the support of a jupyter notebook, this is provided for direct use here: https://edu.nl/bn93d.

This dataset contains curated files derived from the cleaning process of a slice of the Catalogus Epistolarum Neerlandicaron (CEN) metadata. This dataset can be used by researchers interested in the correspondence exchanged during the Early modern period, especially related to Dutch learned men and women of the time.

The CEN is the Dutch national letter catalog, which aggregates letter metadata from different universities in the Netherlands and from the National Library of the Netherlands (KB), among others, since the years 1980's to the present. Since January 2020, one can consult the CEN via Worldcat (https://picarta.on.worldcat.org,last accessed November 9, 2022). The entire database contains more than a million records (according to the KB website consulted on November 9, 2022: https://www.kb.nl/over-ons/diensten/cen). This database is curated by the KB, who owns the rights together with OCLC.

A data dump in XML of the CEN database was obtained by Ingeborg van Vugt (https://orcid.org/0000-0002-7703-1791) from the KB and OCLC in October 2019. The dataset has been sliced (years between 1200 to 1820) and cleaned in two phases: (a) manually during Ingeborg's Ph.D. thesis and (b) semi-automatically during the SKILLNET project. The second part of the cleaning process has been carried out together by data manager Liliana Melgar Estrada (https://orcid.org/0000-0003-2003-4200) and Ingeborg van Vugt, receiving the input from different collaborators from the SKILLNET team, student assistant Rosalie Versmissen (https://orcid.org/0000-0001-9558-8510), and some external collaborators.

The curated version of a slice of the entire data is offered in this dataset. It includes the letters between 1270 and 1820 plus some undated letters, which is of interest for the study of Early Modern correspondence. The XML file was converted to a .csv file with the support of the Digital Humanities Center at Utrecht University. The initial XML data dump is not provided in this dataset. It has more than 500 thousand rows (in which each row represents either a letter or a group of letters), however, the cleaning process was non-destructive, which means that the original metadata and links to the original source can be found together with the curated data.

These data underwent a data cleaning process by semi-automatic and manual processes, which resulted in two files that are deposited in this dataset: one containing the letter's metadata, and another one containing the unique person's metadata (which also includes the mappings, i.e., identifiers, to other datasets).

The curation consisted on applying data wrangling operations (parsing, harmonization), adding missing metadata or correction of inaccuracies (dates of birth/death of correspondents, letter dates), validations of correctness (between dates of birth/death and letter dates) and partial reconciliation (adding external identifiers from other letter databases).

The CEN data is also available for free access online via Picarta (https://picarta.oclc.org/psi/xslt/DB=3.23/) and Worldcat (https://www.worldcat.org/). Some data may have been updated since 2020 in the online versions but, to the best of our knowledge, this doesn't happen often for the period of time (subset) that was cleaned by SKILLNET. In any case, if you use this dataset, it is recommended to give a proper citation to it, which includes the version number and date.

If you wish to explore, query, filter, visualize or export the data with the support of a jupyter notebook, this is provided for direct use here: https://edu.nl/bn93d.

The datasets provided by the SKILLNET project have been curated (i.e., cleaned, harmonized, reconciled) using manual and semi-automatic methods. Even though a lot of dedication and effort was put in curating the datasets provided here, some errors, inaccuracies and/or missing data still exist.

Until January 2023 the files will be constantly updated with more cleaned data and mappings. For that reason, it is important that users of the dataset always include the version number in their reports, or wait until January 2023 when the latest version will be deposited.

Identifier
DOI	https://doi.org/10.34894/G8XQI0
Related Identifier	https://doi.org/10.5281/zenodo.7309977
Metadata Access	https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/G8XQI0

Provenance
Creator	SKILLNET project
Publisher	DataverseNL
Contributor	Liliana Melgar; Dirk van Miert; Nationale Bibliotheek (KB); OCLC; Ingeborg van Vugt; Liliana Melgar-Estrada; Rosalie Versmissen
Publication Year	2022
Funding Reference	ERC, ERC 2016 COG
Rights	CC-BY-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess	true
Contact	Liliana Melgar (Utrecht University); Dirk van Miert (Utrecht University)

Representation
Resource Type	Letter and person metadata; Dataset
Format	application/pdf; text/csv
Size	195742; 44429235; 14209542
Version	4.1
Discipline	Humanities