SDOclust Evaluation Tests v2
conducted for the paper: Parameterization-Free Clustering with Sparse Data Observers
Context and methodology
SDOclust is a clustering extension of the Sparse Data Observers (SDO) algorithm. SDOclust uses data observers as graph nodes and cluster them considering connected components and local thresholding. Observers' labels are subsequently propagated to data points.
In this repository, SDOclust is evaluated with 235 datasets (both synthetic and real) taken from the literature about clustering evaluation, and compared with HDBSCAN, k-means--, CLASSIX, N2D (Deep Learning Clustering), Fuzzy Clustering, and Hierarchical Clustering algorithms.
This repository is framed within the research on the following domains: algorithm evaluation, clustering, unsupervised learning, machine learning, data mining, data analysis. Datasets and algorithms can be used for experiment replication and for further clustering evaluation and comparison.
Technical details
Experiments are conducted in Python 3. The file and folder structure is as follows:
[datasets] contains datasets as CSV files (last column is the label).
[comparisons] contains boxplots and latex tables with algorithm comparisons summarized from the [results] folder.
[results] contains CSV files with tables that collect algorithms' performances obtained from running the "run.py" script.
[algorithms] contains scripts wrapping algorithm classes used and parameter adjustment phases.
[utils] contains scripts for clustering validation and measurement of dataset propierties.
"dependencies.sh" installs python dependencies.
"run.py" runs evaluation experiments.
"comparison.py" summarizes performances in TEX tables and boxplots.
"LICENSE" file.
"README.md" for further details, link to sources and instructions for reproducibility.
License
The CC-BY license applies to all data generated with MDCgen. All distributed code is under the MIT license.