sdaas - a Python tool computing an amplitude anomaly score of seismic data and metadata using simple machine learning algorithm

DOI

The increasingly high number of big data applications in seismology has made quality control tools to filter, discard, or rank data of extreme importance. In this framework, machine learning algorithms, already established in several seismic applications, are good candidates to perform the task flexibility and efficiently. sdaas (seismic data/metadata amplitude anomaly score) is a Python library and command line tool for detecting a wide range of amplitude anomalies on any seismic waveform segment such as recording artifacts (e.g., anomalous noise, peaks, gaps, spikes), sensor problems (e.g., digitizer noise), metadata field errors (e.g., wrong stage gain in StationXML).

The underlying machine learning model, based on the isolation forest algorithm, has been trained and tested on a broad variety of seismic waveforms of different length, from local to teleseismic earthquakes to noise recordings from both broadband and accelerometers. For this reason, the software assures a high degree of flexibility and ease of use: from any given input (waveform in miniSEED format and its metadata as StationXML, either given as file path or FDSN URLs), the computed anomaly score is a probability-like numeric value in [0, 1] indicating the degree of belief that the analyzed waveform represents an anomaly (or outlier), where scores ≤0.5 indicate no distinct anomaly. sdaas can be employed for filtering malformed data in a pre-process routine, assign robustness weights, or be used as metadata checker by computing randomly selected segments from a given station/channel: in this case, a persistent sequence of high scores clearly indicates problems in the metadata

Identifier
DOI https://doi.org/10.5880/GFZ.2.6.2023.009
Related Identifier https://github.com/rizac/sdaas
Related Identifier https://doi.org/10.1785/0220200339
Metadata Access http://doidb.wdc-terra.org/oaip/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:doidb.wdc-terra.org:7798
Provenance
Creator Zaccarelli, Riccardo ORCID logo
Publisher GFZ Data Services
Contributor Zaccarelli, Riccardo
Publication Year 2022
Rights GNU General Public License Version 3 (29 June 2007); Copyright © 2023 Helmholtz Centre Potsdam GFZ German Research Centre for Geosciences, Potsdam, Germany (Riccardo Zaccarelli); https://www.gnu.org/licenses/gpl-3.0.html
OpenAccess true
Contact Zaccarelli, Riccardo (GFZ German Research Centre for Geosciences, Potsdam, Germany)
Representation
Resource Type Software
Discipline Geosciences