Cluster expansion (CE) has gained an increasing level of popularity in recent years, and many strategies have been proposed for training and fitting the CE models to first-principles calculation results. The paper reports a new strategy for constructing a training set based on their relevance in Monte Carlo sampling for statistical analysis and reduction of the expected error. We call the new strategy a "bootstrapping uncertainty structure selection" (BUSS) scheme and compared its performance against a popular scheme where one uses a combination of random structure and ground-state search (referred to as RGS). The provided dataset contains the training sets generated using BUSS and RGS for constructing a CE model for disordered Cu2ZnSnS4 material. The files are in the format of the Atomic Simulation Environment (ASE) database (please refer to ASE documentation for more information https://wiki.fysik.dtu.dk/ase/index.html). Each .db
file contains 100 DFT calculations, which were generated using iteration cycles. Each iteration cycle is referred to as a generation (marked with gen
key in the database) and each database contains 10 generations where each generation consists of 10 training structures. See more details in the paper.