Prospects for computer-assisted data quality monitoring at the CMS pixel detector.
Abstract
Data quality monitoring (DQM) and data certification (DC) are of vital importance to advanced detectors such as
CMS, and are key ingredients in assuring solid results of high-level physics analyses. The current approach for
DQM and DC at
CMS is mainly based on manual monitoring of reference histograms summarizing the status and
performance of the detector. This requires a large amount of person power while having a rather coarse time
granularity to keep the number of histograms to check manageable. We investigate methods for
computer-assisted DQM and DC at the
CMS detector, focusing on a case study in the pixel tracker. In particular,
using data taken in 2017, we show that autoencoder techniques are able to accurately spot anomalous detector
behaviour, with a time granularity previously inaccessible to the human certification procedure.
Definitions and terminology
Luminosity section (LS): An elementary time unit of continuous data taking in
CMS, during which the instantaneous
luminosity is assumed unchanged. A LS lasts 218 LHC orbits, or approximately 23.3 seconds.
Run: A time unit of data taking in
CMS, typically consisting of a few tens to a few hundreds of luminosity sections.
Fill: A period during which the same proton beams are circulating in the LHC, typically spanning multiple runs.
Data quality monitoring (DQM): The process of checking the quality of recorded data, with the aim of spotting potential
detector issues.
Data certification (DC): The process of checking the quality of recorded data, aiming to certify the data as good for usage in
physics analyses.
Non-negative matrix factorization (NMF): A type of factorization method for non-negative inputs, computing a set of basis
components that optimally span the space of input instances [3].
Barrel pixel (BPIX) layer and forward pixel (FPIX) disk: Components of the pixel tracker at
CMS [5]. There are four barrel
layers, numbered BPIX L1 to BPIX L4 and three forward disks on each side, numbered FPIX- D3 to FPIX+ D3.
Introduction
The histograms used in this case study represent the distributions of the collected electric charge (in elementary
charge units) per cluster, for BPIX layers and FPIX disks [5]. Each histogram contains the data collected during a
single LS. In this case study, we use BPIX L2, L3 and L4, as well as FPIX D1, D2 and D3, where the
distributions of the collected cluster charge have a relatively stable behaviour over time. An example distribution
for BPIX L2 is shown on the left. In the following, all distributions have been normalized to unity, and the last bin
of each histogram contains the overflow. The data set used in this case study is the 2017 dataset reconstructed
with Legacy reprocessing [7].
The strategy and goal of the computer-assisted DQM and DC procedures presented here is not to replace human decision-making. Instead, these
methods are intended to assist the people responsible for monitoring and certification, efficiently and effectively pointing them towards potentially
anomalous behaviour with a finer time granularity than directly accessible to those people. Furthermore, the methods presented here are prospects and
have not yet been deployed.
We study and compare several of these methods:
Moments method: The first and second order moments of the histograms in the training set are calculated. For a given histogram, a score is
assigned by comparing its moments to the average values and standard deviations of those moments in the training set.
Landau fit method: Each histogram is fitted with a Landau distribution and the mean-squared-error (MSE) between the original histogram and
the best fit is calculated.
Templates method: For each histogram to be classified, the MSE is calculated between this histogram and each of a set of reference
histograms, and the minimum value is chosen as MSE score for this histogram.
NMF method: A set of basis components is extracted from the training set using NMF. A given histogram is reconstructed as an optimized
linear combination of the basis components, and the MSE between the original histogram and its reconstruction is computed.
Autoencoder method: Similar to the NMF method, but where the histogram is reconstructed using an autoencoder. More details on the
autoencoder method can be found in [1].
https://twiki.cern.ch/twiki/pub/Sandbox/ML4DQMPixelMay2022/example.pdf
Overview of global approach
Training and testing on a full year of data taking
Schematic overview of the method: from the input histograms to a quantitative anomaly flagging performance metric
with a comparison between different models.
https://twiki.cern.ch/twiki/pub/Sandbox/ML4DQMPixelMay2022/sketch_overview_global.pdf
Good and anomalous histograms
Distributions of the collected electric charge (in elementary charge
units) per cluster, for the different BPIX layers and FPIX disks.
The blue histograms are obtained as averages from the data set and
represent the range of expected shapes for each distribution (that may
vary slightly over time due to changing detector conditions). The black
histograms correspond to an anomalous LS caused by beam dump
effects, when the proton beams in the accelerator are disposed of, and
the red histograms are the autoencoder reconstructions.
The averages shown in this figure (in blue) are calculated by partitioning
the LS in the dataset (in chronological order) into 50 approximately
equally large sets, and averaging all histograms of a given type within
each set to a single histogram. This method is used to obtain a set of
reference histograms while keeping the spectrum of expected shapes.
It can be observed that the autoencoder reconstructs the good
histograms (overlapping with the blue spectrum) accurately, and the
anomalous histograms (not overlapping with the blue spectrum) less
accurately. This results in a larger mean squared error (MSE) between
these histograms and their reconstruction.
https://twiki.cern.ch/twiki/pub/Sandbox/ML4DQMPixelMay2022/figure_run306139_ls1112.pdf
Output score distributions and correlations
The panels on the diagonal line show the output score
distributions for the good test set (in blue) and the
anomalous runs (in shades of red) for the different
models. The y-axis scale is logarithmic, and the
distributions have been normalized to unity.
The panels away from the diagonal display the
correlations between models: each dot represents one
LS with its assigned score according to one model on
the x-axis, and according to another model on the
y-axis. The scores for each of the models have been
rescaled to the range 0 to 1.
The horizontal and vertical arrays of points correspond
to lower bounds on the fitted probability density where
it cannot be numerically distinguished from zero.
https://twiki.cern.ch/twiki/pub/Sandbox/ML4DQMPixelMay2022/correlations.pdf
The training set consists of the full dataset with filters applied to select LS where the
CMS detector was fully switched on and
collected reasonable statistics. Another filter is applied in addition to remove histograms with a relatively large
mean-squared-error with respect to a set of reference histograms. These are obtained as averaged partitions from the set of
luminosity sections passing the earlier filters. This additional filter removes anomalies from the training set while maximizing
the total training data.
The anomalous test set consists of a number of anomalous runs. Resampling techniques have been applied on the
histograms belonging to the anomalous runs in order to increase their statistics. The resampling adds representative variation
to the histograms while keeping their essential shape characteristics.
The good test set is obtained in a similar way as the training set but with slightly stricter thresholds to ensure that no
anomalous luminosity sections are selected while still covering the full spectrum of good histogram shapes. Alternative
approaches have been used as a cross-check, where the good test set consists of a number of predefined good runs (with or
without resampling), or of averaged partitions from the training set (with or without resampling).
Overview of operational approach
Dedicated training for single application run
Schematic overview of the method, modified with respect to Fig. 1 to highlight the differences for the operational
application of the method. Typical applications during data taking include certifying single runs using a dedicated
training set tailored to the specific application run.
https://twiki.cern.ch/twiki/pub/Sandbox/ML4DQMPixelMay2022/sketch_overview_local.pdf
Application of operational approach
Illustration of an operational implementation of the autoencoder model.
In Figs. 1 to 3, the model was trained and tested globally on the full 2017
dataset (with some filters applied as discussed before). Here, an
alternative training and testing approach is chosen that represents more
closely the operational situation in practical applications of the model. In
this so-called local training, for each application run, the model is updated
with a dedicated training on the runs preceding the application run.
The fraction of flagged LS is low for good runs and higher for known
anomalous or otherwise special runs (indicated with vertical coloured lines
in the bottom pad), showing that the model flags anomalous LS accurately
in local training as well as in global training.
The runs with a high fraction of flagged LS can be classified in a number of categories. Indicated in red are the runs with timing scans or
other anomalies, and in orange the runs that have only a fraction of their LS showing anomalous distributions. In purple are runs with
low pileup or trigger rates, causing statistical fluctuations in the distributions. In between fills, discrete changes in accelerator or detector
conditions might take place, causing a mismatch between the training runs and the application run and hence the method is not applied
yet to the first run of a fill.
The threshold score value above which a LS is considered anomalous is calculated as the 97% percentile of the scores on the local
training set plus a fixed margin, which is (preliminarily) optimized to achieve a low false alarm rate while accurately flagging known
anomalous LS.
https://twiki.cern.ch/twiki/pub/Sandbox/ML4DQMPixelMay2022/local_playback.pdf
References
[1]
CMS Collaboration, Tracker DQM Machine Learning studies for data certification, CERN-CMS-DP-2021-034,
https://cds.cern.ch/record/2799472
.
[2] V. Azzolini et al., The Data Quality Monitoring Software for the
CMS experiment at the LHC: past, present and future, EPJ Web of
Conferences 214, 02003 (2019)
https://doi.org/10.1051/epjconf/201921402003
.
[3] Lee, D. D. and Seung, H. S., Algorithms for non-negative matrix factorization, Advances in Neural Information Processing Systems 13
- Proceedings of the 2000 Conference, NIPS 2000 (Advances in Neural Information Processing Systems), Neural information processing
systems foundation, link.
[4] K. He, X. Zhang, S. Ren, and J. Sun, arXiv:1512.03385.
[5]
CMS Collaboration, The
CMS Phase-1 Pixel Detector Upgrade, CERN-CMS-NOTE-2020-005,
http://cds.cern.ch/record/2745805
.
[6]
CMS Collaboration, The Phase-1 Pixel Detector Performance in 2018, CERN-CMS-DP-2021-007,
https://cds.cern.ch/record/2765491
.
[7]
CMS Collaboration, Strategies and performance of the
CMS silicon tracker alignment during LHC Run 2, arXiv:2111.08757v2.
--
LukaLambrecht - 2022-04-14