Software: removal of bremsstrahlung background from SAXS signals with deep neural networks

DOI

Software for training and inference of neural network models to remove bremsstrahlung background from SAXS imaging data obtained at the European XFEL laboratory.

We thank Peter Steinbach for providing the codebase for the equivariant UNet, which we integrated into our repository.

Below we share a brief description of our method:

Introduction

Experimental data from cameras in ultra-high intensity laser interaction experiments very often con-
tains not only the desired signal, but also a large amount of traces of high-energy photons created
via the bremsstrahlung process during the interaction. For example, the Jungfrau camera detecting
small angle x-ray scattering (SAXS) signal in a combined XFEL + optical laser (OL) experiment at
the European XFEL laboratory still contains lot of bremsstrahlung background, even though strong
experimental effort (adding a mirror to reflect the signal, and a massive lead wall to block direct view)
was taken to reduce those (Šmı́d et al., 2020). Especially in the SAXS case, the signal is gradually
becoming weaker with increasing scattering angle. Therefore, the experimentally observed signal-to-
noise ratio determines the limit of the scattering angles for which the signal can be extracted, limiting
the physics that can be observed.
As the noise is produced by the high-energy photons, whose origin is very different from the signal
photons, the signal and noise are additive. The currently used Jungfrau camera has a resolution of
1024 × 512 pixels, pixel size of 75 μm, and the read values are calibrated to deposited keV per pixel.

Methods
The process of removing the noise from the data was split into three steps. First, the learning dataset
was curated and cut into patches of 128 × 128 pixels. Second, a neural network was created an trained
on those data. Splitting the data into the patches actually enables the whole process, because no
‘noise-only’ data are measured in the detector areas where signal typically is. In the third step, an
image with actual data is split into the patches, those are processed by the neural network, and merged
together to produce the final signal and noise prediction.

Data preparation
The experimental data used for training the neural network came from two sets:

• X-ray only shots: Those data are collected when only the XFEL beam was used, i.e. they do
contain an example of the useful signal, but no bremsstrahlung background at all.
• Full shots: Those data are from the real physics shots, contain both the XFEL and OL beams,
therefore have a mixture of signal and noise.

In order to train the neural network in a supervised manner, we need to provide two sets of data: the
signal and the noise patches. The signal patches are created from the x-ray only data like this: From
each image, a set of randomly positioned and randomly oriented patches is extracted. The randomness
in rotation is important, as those training x-ray data do have significant dominant directions, which
are expected to change in the real full shots data. Next, the patches are checked and only those
which have integrated intensity above a given threshold are used, to prevent close-to-empty patches
to be used for the training. In the last step, the amplitude of the patches is randomized, to keep the
algorithm more general. Note that the dynamic range of the detector as well as the signal is large,
i.e. above approximately four orders of magnitude.
The noise patches are created from the full shots data. To avoid the regions with signal to be used,
those regions are masked out. The masking is performed automatically by using a corresponding x-ray
only image. Then, patches of given size are randomly selected from the remaining data. Note that
neither rotation nor changes of amplitude are applied, as both can contain signatures of the structure
of bremsstrahlung, which could simplify the task for the neural network.

Neural network
In the modelling approach we followed, noise was assumed to be additive, i.e. a noisy input signal xin
can be decomposed into noise and clean signal components n and s, respectively via the relationship
xin = n + s.
The removal of the bremsstrahlung background n was achieved with the help of a convolutional
neural network, which estimated both the noise n̂ to be subtracted from the input and the denoised
image ŝ itself. More specifically, a UNet architecture (Ronneberger et al., 2015) was adopted with
four encoder blocks using 32, 64, 128 and 256 feature maps. Each encoder block consisted of two
separate convolutional layers and ReLU nonlinearities. No batch normalization was employed. The
corresponding decoder network matched the number of filters. The decoder output produced latent
feature maps l with 16 channels.
In preliminary experiments, we have found an equivariant version of the UNet, implemented us-
ing the ‘escnn’ library (https://github.com/QUVA-Lab/escnn) (Cesa et al., 2022), to show favorable
performance compared to the original version. It consisted of 5.88 million trainable parameters and
implemented operations to make the network equivariant to input transformations under discrete ro-
tations with angles corresponding to multiples of 90 degrees.
The input to the neural network consisted of image patches of shape 128 × 128. The training data
comprised of 1754 signal patches and another set of 4711 noise patches.
During network training, we randomly sampled a new noise patch each time a clean signal patch
was accessed, as a means of data augmentation and to avoid overfitting. The pixelwise addition of
both patches resulted in a synthetic noisy patch which was used as model input. Both summands
were treated as labels during model training. Image intensity normalization on the raw pixel values
was performed as follows: lower and upper bounds for z-score normalization were computed as the 1
and 99.95 percentiles of the noisy patch. The lower bound was subtracted from the noisy patch and
the result was divided by the difference between upper and lower bound. Subsequently, the result was
clipped to the unit range, i.e. values below zero were set to zero and values above one were reduced to
one. The same normalization and clipping strategy using the bounds obtained from the noisy patch
were subsequently applied on the signal and the noise patch, respectively.
From the latent representation of the equivariant UNet, pixelwise noise was estimated by further
applying a convolutional layer on the latent feature map, using a kernel size of three, with stride and
padding of one to retain the spatial dimensionality. A ReLU activation was applied, as the noise
contribution was known to be non-negative. The estimated noise ŝ was then subtracted from the
input. To enforce non-negativity also of the estimated signal, again, a ReLU nonlineariy was applied.
In total, the procedure worked as follows:
l = eqUNet(xin ),
n̂ = ReLU (conv(l)) ,
ŝ = ReLU (xin − n̂) .
The network was implemented using the ‘PyTorch’ library (version 1.12.1) for the Python pro-
gramming language (version 3.10.4). It was trained for 400 epochs with a batch size of 16 on a single
NVIDIA A100 GPU using the AdamW optimizer with a learning rate of 10−4 and no weight decay. For
both estimated components n̂ and ŝ, the mean absolute error loss was applied. Both loss components
were added to obtain the loss function the model was trained on.

Application
Once the model was trained, the removal of the bremsstrahlung background of full-sized experimental
imaging data was performed by applying the model on image patches, followed by a recombination
of the patch predictions to obtain full-sized model predictions. A simple sliding-window approach,
i.e. a regular splitting of image data into non-overlapping patches and consequent combination would
produce unwanted effects on the borders between patches, therefore a more complex method was
developed.
Each image is split into a grid of patches four times, with the following initial pixel offsets: [0,0],
[96,32], [32,96], [64,64]. Normalization of the patches is performed in the same way as described for the training procedure, before being processed by the network. The obtained predictions for each
patch are then rescaled to the original data range by undoing the normalization (i.e. by multiplying
the output with the difference between upper and lower bound followed by an addition of the lower
bound).
In the last step, the four predictions produced for the four offsets are combined into a final result.
Each pixel of the final image is calculated as a weighted mean of those four predictions. The weights
for the mean are calculated as
wi = 1 / ((|pi−m|/2) + 2)
where wi is the weight of i−th prediction pi , and m is the mean of all predictions for a given pixel.
This approach effectively eliminates the outliers, which are sometimes produced close to the edges of
the patches.
 
References

          [1] Cesa, G., Lang, L., & Weiler, M. (2022). A program to build e(n)-equivariant steerable CNNs. International Conference on Learning Representations. https: / /openreview.net/forum?id=WE4qe9xlnQw

          [2] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9351, 234–241. https://doi.org/10.1007/978-3-319-24574-4 28

          [3] Šmı́d, M., Baehtz, C., Pelka, A., Laso Garcı́a, A., Göde, S., Grenzer, J., Kluge, T., Konopkova, Z., Makita, M., Prencipe, I., Preston, T. R., Rödel, M., & Cowan, T. E. (2020). Mirror to measure small angle x-ray scattering signal in high energy density experiments. Review of Scientific Instruments, 91 (12), 123501. https://doi.org/10.1063/5.0021691

Identifier
DOI https://doi.org/10.14278/rodare.2586
Related Identifier IsIdenticalTo https://www.hzdr.de/publications/Publ-37977
Related Identifier IsPartOf https://doi.org/10.14278/rodare.2585
Related Identifier IsPartOf https://rodare.hzdr.de/communities/rodare
Metadata Access https://rodare.hzdr.de/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:rodare.hzdr.de:2586
Provenance
Creator Starke, Sebastian ORCID logo; Smid, Michal ORCID logo
Publisher Rodare
Publication Year 2023
Rights Closed Access; info:eu-repo/semantics/closedAccess
OpenAccess false
Contact https://rodare.hzdr.de/support
Representation
Language English
Resource Type Software
Version 1
Discipline Life Sciences; Natural Sciences; Engineering Sciences