3-digit occupation code images from the Norwegian census of 1950 - Manual review dataset - Dataset

Dataset

3-digit occupation code images from the Norwegian census of 1950 - Manual review dataset

DOI

This dataset is made up of images containing handwritten 3-digit occupation codes from the Norwegian population census of 1950. The occupation codes were added to the census sheets by Statistics Norway after the census was concluded for the purpose of creating aggregated occupational statistics for the entire population. The coding standard used in the 1950 census is, according to Statistics Norway’s official publications (https://www.ssb.no/historisk-statistikk/folketellinger/folketellingen-1950, booklet 4, page 81), very similar to the standards used in the census for 1920. Cf. the 13th booklet published for the 1920 census (https://www.ssb.no/historisk-statistikk/folketellinger/folketellingen-1920, note that this booklet is only available in Norwegian). In short, an occupation code is a 3-digit number that corresponds to a given occupation or type of occupation. According to the official list of occupation codes provided by Statistics Norway there are 339 unique codes. These are not all necessarily sequential or hierarchical in general, but some subgroupings are. This list can be found under Files. It is also worth noting that these images were extracted from the original census sheet images algorithmically. This process was not flawless and lead to additional images being extracted, these can contain written occupation titles or be left entirely blank. The dataset consists of 90,000 unique images, and 9,000 images that were randomly selected and copied from the unique images. These were all used for a research project (link to preprint article: https://doi.org/10.48550/arXiv.2306.16126) where we (author list can be found in preprint) tried to find a more efficient way of reviewing and correcting classification results from a Machine Learning model, where the results did not pass a pre-set confidence threshold. This was a follow-up to our previous article where we describe the initial project and creating of our model in more detail, if it is of interest (“Lessons Learned Developing and Using a Machine Learning Model to Automatically Transcribe 2.3 Million Handwritten Occupation Codes”, https://doi.org/10.51964/hlcs11331).

Python, 3.5+

Statistics Norway (https://www.ssb.no/en). Information about rules and practices for gathering the data are exhaustively covered in booklets 3 and 4 (https://www.ssb.no/historisk-statistikk/folketellinger/folketellingen-1950)

Identifier
DOI	https://doi.org/10.18710/LYXKN1
Related Identifier	https://doi.org/10.51964/hlcs11331
Related Identifier	https://doi.org/10.18710/7JWAZX
Metadata Access	https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/LYXKN1

Provenance
Creator	The Norwegian Historical Data Centre
Publisher	DataverseNO
Contributor	The Norwegian Historical Data Centre; UiT The Arctic University of Norway; Sommerseth, Hilde; Pedersen, Bjørn-Richard; Andersen, Trygve; Langholz, Petja; Bjørklund, Bente; Torsetnes, Elin; Foshaug, Eva; Kjosnes, Line Silja; Strand, Toril; UiT Open Research Data
Publication Year	2023
Funding Reference	The Research Council of Norway, 322231; UiT The Arctic University of Norway, interdisciplinary strategic project High North Population Studies, 970422528
Rights	CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess	true
Contact	The Norwegian Historical Data Centre (UiT The Arctic University of Norway)

Representation
Resource Type	Handwritten census data; Dataset
Format	text/plain; text/comma-separated-values; application/zip
Size	7270; 54006; 1860373835
Version	1.0
Discipline	Other
Spatial Coverage	(4.090W, 57.760S, 31.760E, 71.380N)