SDOstreamclust: Stream Clustering Robust to Concept Drift - Evaluation Tests

DOI

SDOstreamclust Evaluation Tests

conducted for the paper: Stream Clustering Robust to Concept Drift 

Context and methodology

SDOstreamclust is a stream clustering algorithm able to process data incrementally or per batches. It is a combination of the previous SDOstream (anomaly detection in data streams) and SDOclust (static clustering). SDOstreamclust holds the characteristics of SDO algoritmhs: lightweight, intuitive, self-adjusting, resistant to noise, capable of identifying non-convex clusters, and constructed upon robust parameters and interpretable models. Moreover, it shows excellent adaptation to concept drift 

In this repository, SDOclust is evaluated with 165 datasets (both synthetic and real) and compared with CluStream, DBstream, DenStream, StreamKMeans.

This repository is framed within the research on the following domains: algorithm evaluation, stream clustering, unsupervised learning, machine learning, data mining, streaming data analysis. Datasets and algorithms can be used for experiment replication and for further evaluation and comparison.

Docker

A Docker version is also available in: https://hub.docker.com/r/fiv5/sdostreamclust

Technical details

Experiments are conducted in Python v3.8.14. The file and folder structure is as follows:- [algorithms] contains a script with functions related to algorithm configurations.

[data] contains datasets in ARFF format.

[results] contains CSV files with algorithms' performances obtained from running the "run.sh" script (as shown in the paper).

"dependencies.sh" lists and installs python dependencies.

"pysdoclust-stream-main.zip" contains the SDOstreamclust python package. 

"README.md" shows details and intructions to use this repository.

"run.sh" runs the complete experiments.

"run_comp.py"for running experiments specified by arguments.

"TSindex.py" implements functions for the Temporal Silhouette index.

Note: if codes in SDOstreamclust are modified, SWIG (v4.2.1) wrappers have to be rebuilt and SDOstreamclust consequently reinstalled with pip. 

License

The CC-BY license applies to all data generated with MDCgen. All distributed code is under the GPLv3+  license.

Identifier
DOI https://doi.org/10.48436/xh0w2-q5x18
Related Identifier IsVariantFormOf https://hub.docker.com/r/fiv5/sdostreamclust
Related Identifier IsVersionOf https://github.com/CN-TU/pysdoclust-stream/tree/evaluation/evaluation_tests
Related Identifier HasPart https://doi.org/10.17632/c43kr4t7h8.1
Related Identifier HasPart https://doi.org/10.1109/ACCESS.2023.3319213
Related Identifier HasPart https://doi.org/10.1007/s10994-023-06462-2
Related Identifier HasPart https://doi.org/10.48436/ss6a3-3r720
Related Identifier IsVersionOf https://doi.org/10.48436/sed9k-vtc72
Metadata Access https://researchdata.tuwien.ac.at/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:researchdata.tuwien.ac.at:xh0w2-q5x18
Provenance
Creator Iglesias Vazquez, Felix (ORCID: 0000-0001-6081-969X)
Publisher TU Wien
Publication Year 2024
Rights Creative Commons Attribution 4.0 International; GNU General Public License v3.0 or later; https://creativecommons.org/licenses/by/4.0/legalcode; https://www.gnu.org/licenses/gpl-3.0-standalone.html
OpenAccess true
Contact tudata(at)tuwien.ac.at
Representation
Resource Type Dataset
Version 1.0.0
Discipline Other