Experimental Data for the Dissertation "Leveraging Constraints for User-Centric Feature Selection"

DOI

These are the experimental data for the dissertation

Bach, Jakob. "Leveraging Constraints for User-Centric Feature Selection"

at the Department of Informatics of the Karlsruhe Institute of Technology. See the README for details.

Many input datasets (which we also provide here) either

  • originate from OpenML and are CC-BY-licensed or
  • originate from PMLB and are MIT-licensed.

Please see the LICENSE files in the corresponding datasets/ subfolders for details.

Experimental Data for the Dissertation "Leveraging Constraints for User-Centric Feature Selection"

These are the experimental data for the dissertation

Bach, Jakob. "Leveraging Constraints for User-Centric Feature Selection"

at the Department of Informatics of the Karlsruhe Institute of Technology.

The subfolders correspond to individual chapters of the dissertation:

  • chap4-syn: Chapter 4 - "Evaluating the Impact of Constraints on Feature-Selection Results"
  • chap5-ms: Chapter 5 - "Formulating Scientific Hypotheses as Constraints - A Case Study"
  • chap6-afs: Chapter 6 - "Finding Alternative Feature Sets"
  • chap7-csd: Chapter 7 - "Discovering Sparse and Alternative Subgroup Descriptions"

See the corresponding README files in the subfolders for more information. We already published prior versions of the experimental data, as the dissertation bases on prior papers:

  • Chapters 4 and 5: Data for the paper "An Empirical Evaluation of Constrained Feature Selection"
  • Chapter 6: Data for the paper "Finding Optimal Diverse Feature Sets with Alternative Feature Selection" (Version 2)
  • Chapter 7: Data for the paper "Using Constraints to Discover Sparse and Alternative Subgroup Descriptions" (Version 1)

For Chapters 4, 5, and 7, we mainly consolidate the existing data. In particular, all *.csv files (datasets and results) remain unchanged compared to the data linked above. For Chapter 6, we reran the experimental pipeline to integrate a change for the feature-selection method "Greedy Wrapper". The other feature-selection methods have not changed, but experimental data may slightly differ regarding runtimes and for results affected by solver timeouts.

For all four chapters, the following files (in each subfolder) differ from prior versions:

  • Evaluation_console_output.txt: The dissertation's evaluation partly differs from the papers' evaluations (e.g., some analyses added, adapted, or removed).
  • README.md: We adapted these files to the context of the dissertation, added some explanations, and proofread them.
Identifier
DOI https://doi.org/10.35097/4kjyeg0z2bxmr6eh
Related Identifier IsIdenticalTo https://publikationen.bibliothek.kit.edu/1000175819
Metadata Access https://www.radar-service.eu/oai/OAIHandler?verb=GetRecord&metadataPrefix=datacite&identifier=10.35097/4kjyeg0z2bxmr6eh
Provenance
Creator Bach, Jakob ORCID logo
Publisher Karlsruhe Institute of Technology
Contributor RADAR
Publication Year 2024
Rights Open Access; Creative Commons Attribution 4.0 International; info:eu-repo/semantics/openAccess; https://creativecommons.org/licenses/by/4.0/legalcode
OpenAccess true
Representation
Resource Type Dataset
Format application/x-tar
Discipline Computer Science; Computer Science, Electrical and System Engineering; Engineering Sciences