3-digit occupation code images from the Norwegian census of 1950 - Manual review dataset

DOI

This dataset is made up of images containing handwritten 3-digit occupation codes from the Norwegian population census of 1950. The occupation codes were added to the census sheets by Statistics Norway after the census was concluded for the purpose of creating aggregated occupational statistics for the entire population. The coding standard used in the 1950 census is, according to Statistics Norway’s official publications (https://www.ssb.no/historisk-statistikk/folketellinger/folketellingen-1950, booklet 4, page 81), very similar to the standards used in the census for 1920. Cf. the 13th booklet published for the 1920 census (https://www.ssb.no/historisk-statistikk/folketellinger/folketellingen-1920, note that this booklet is only available in Norwegian). In short, an occupation code is a 3-digit number that corresponds to a given occupation or type of occupation. According to the official list of occupation codes provided by Statistics Norway there are 339 unique codes. These are not all necessarily sequential or hierarchical in general, but some subgroupings are. This list can be found under Files. It is also worth noting that these images were extracted from the original census sheet images algorithmically. This process was not flawless and lead to additional images being extracted, these can contain written occupation titles or be left entirely blank. The dataset consists of 90,000 unique images, and 9,000 images that were randomly selected and copied from the unique images. These were all used for a research project (link to preprint article: https://doi.org/10.48550/arXiv.2306.16126) where we (author list can be found in preprint) tried to find a more efficient way of reviewing and correcting classification results from a Machine Learning model, where the results did not pass a pre-set confidence threshold. This was a follow-up to our previous article where we describe the initial project and creating of our model in more detail, if it is of interest (“Lessons Learned Developing and Using a Machine Learning Model to Automatically Transcribe 2.3 Million Handwritten Occupation Codes”, https://doi.org/10.51964/hlcs11331).

Python, 3.5+

Statistics Norway (https://www.ssb.no/en). Information about rules and practices for gathering the data are exhaustively covered in booklets 3 and 4 (https://www.ssb.no/historisk-statistikk/folketellinger/folketellingen-1950)

Identifier
DOI https://doi.org/10.18710/LYXKN1
Related Identifier https://doi.org/10.51964/hlcs11331
Related Identifier https://doi.org/10.18710/7JWAZX
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/LYXKN1
Provenance
Creator The Norwegian Historical Data Centre
Publisher DataverseNO
Contributor The Norwegian Historical Data Centre; UiT The Arctic University of Norway; Sommerseth, Hilde; Pedersen, Bjørn-Richard; Andersen, Trygve; Langholz, Petja; Bjørklund, Bente; Torsetnes, Elin; Foshaug, Eva; Kjosnes, Line Silja; Strand, Toril; UiT Open Research Data
Publication Year 2023
Funding Reference The Research Council of Norway, 322231; UiT The Arctic University of Norway, interdisciplinary strategic project High North Population Studies, 970422528
Rights CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Contact The Norwegian Historical Data Centre (UiT The Arctic University of Norway)
Representation
Resource Type Handwritten census data; Dataset
Format text/plain; text/comma-separated-values; application/zip
Size 7270; 54006; 1860373835
Version 1.0
Discipline Other
Spatial Coverage (4.090W, 57.760S, 31.760E, 71.380N)