The increasingly high number of big data applications in seismology has made quality control tools to filter, discard, or rank data of extreme importance. In this framework, machine learning algorithms, already established in several seismic applications, are good candidates to perform the task flexibility and efficiently. sdaas (seismic data/metadata amplitude anomaly score) is a Python library and command line tool for detecting a wide range of amplitude anomalies on any seismic waveform segment such as recording artifacts (e.g., anomalous noise, peaks, gaps, spikes), sensor problems (e.g., digitizer noise), metadata field errors (e.g., wrong stage gain in StationXML).
The underlying machine learning model, based on the isolation forest algorithm, has been trained and tested on a broad variety of seismic waveforms of different length, from local to teleseismic earthquakes to noise recordings from both broadband and accelerometers. For this reason, the software assures a high degree of flexibility and ease of use: from any given input (waveform in miniSEED format and its metadata as StationXML, either given as file path or FDSN URLs), the computed anomaly score is a probability-like numeric value in [0, 1] indicating the degree of belief that the analyzed waveform represents an anomaly (or outlier), where scores ≤0.5 indicate no distinct anomaly. sdaas can be employed for filtering malformed data in a pre-process routine, assign robustness weights, or be used as metadata checker by computing randomly selected segments from a given station/channel: in this case, a persistent sequence of high scores clearly indicates problems in the metadata