-
NCSE v2.0: A Dataset of OCR-Processed 19th Century English Newspapers
NCSE v2.0 Dataset RepositoryThis repository contains the NCSE v2.0 dataset and associated supporting data used in the paper "Reading the unreadable: Creating a dataset of 19th... -
Early Bodleian Donations Online
The dataset contains information regarding donations of books, manuscripts, and money made to the Bodleian Library of the University of Oxford in the period 1600–1620. The data... -
Love Data Week 2025: The Research Data Management team in numbers
Roses are red, violets are blue; it’s Love Data Week 2025 and it’s time for the RDM review! In UCL tradition, we kickstart Love Data Week by publishing the annual Library... -
Transcribed newspaper articles from the NCSE collection
CLOCR-C: Transcribed newspaper articles from the NCSE collection This dataset contains 91 pairs of newspaper articles from the Nineteenth Century Serials Edition (NCSE). The... -
Scrambled text: training Language Models to correct OCR errors using syntheti...
This data repository contains the key datasets required to reproduce the paper "Scrambled text: training Language Models to correct OCR errors using synthetic data". In addition... -
Bibliometric methods for identifying AI-assisted papers
Presentation given to the LIS-Bibliometrics conference, September 2024. Abstract: AI-generated text is increasingly common in scholarly publications, with the community divided... -
LLM related keywords in Dimensions - search counts
This dataset gives counts for the number of papers matching given full-text search terms in the Dimensions database. The associated preprint discusses how these terms are likely...