Position | Description | Duties | Skills |
DQM Online Shift Manager | The process of online detector and data quality monitoring in CMS involves many details. One of the core aspects of the process is the work of the online DQM primary shifters who inspect the observables related to the detector performance in the DQM-GUI, report results in Run Registry (Data Quality Book-keeping tool) and e-logs, and warn the operation crew and the sub-systems in case of problems. In addition, if unexpected behavior of the DQM infrastructure is spotted, this should be reported to relevant DQM experts. During data taking, we have 24/7 shift operation for the whole period (3 x 8h-shifts per day including weekends). The primary DQM shifters are called from the CMS collaborators-wide. In addition to the primary shifters, we have DQM experts taking DQM- on-call-shifts (DQM DOC) (7 consecutive days at a time) to provide support to the primary shifters and on the DQM infrastructure. The shift manager should: organize shifts, training-shifts and tutorials to ensure the primary shifters are well trained, and support the work of the shifters, in coordination with DQM conveners. |
| |
Online DQM Operation Manager | The process of online detector and data quality monitoring in CMS is rather complex. The DQM primary shifters inspect the histograms related to the detector performance in the DQM-GUI, report results in the RR and e-log and warn the sub-systems in case of problem. The Online DQM operation manager is responsible for running the online DQM software infrastructure (used by primary shifters and by all the CMS subsystems to monitor the relevant histograms) and for ensuring smooth and robust operation. The main softwares that should run during operations are the Online DQM-GUI and the Online Run Registry. Part of the DQM GUI software (the applications responsible for producing per-subsystem histograms) run in CMSSW environment and are always affected by integration request by subsystem experts, that should be plugged in (through a dedicated test process) before the start of a given round of data taking. Another part of the DQM GUI software is the one dedicated to the plot visualization/analysis, that should be kept up-to-date with the help of the software expert. The Run Registry is javascript application running on CERN DB, with the goal to keep track of the data quality. It is very stable, but sometimes affected by accessing/functioning problems that you may ve required to understand and adrress to the relevant experts. |
|
|
Offline DQM ML infrastructure developer |
The process of offline detector and data quality monitoring in CMS is rather complex. One of the key point of the process is the work of the DOC3 shifters that, for each run, inspect by eye dozens of histograms related to the detector performance in the DQM-GUI, compare the observed distribution with the reference ones, report results in the RR and e-log and warn the sub-systems in case of problem. This process is performed continuously during data taking, involving thents of shifters/per week, sometimes is prone to human errors and presently does not allow to monitor the performance at the lumi-section (LS) level.
The core DQM team has been studying the implementation of a package based on latest Machine learning (ML) techniques, that allows to perform the whole monitoring and certification process in a semi-automated way (also increasing the inspection granularity to the per-LS level). Some prototypes of ML models have been developed in the context of single sub-systems with successful results.
The Offline DQM ML infrastructure developer will be responsible of extending such prototypes for all the sub-system, creating a software toolkit able to define the proper signal and background dataset for each subsystem, implement, test and validate the model, combine the results obtained on sveral histograms and put the final flag in the Run Registry. She/he should collect input from the subsystems about the relevant histograms to be used as an input for the ML model, define the proper test/train/validation dataset with the DQM-core team, coordinate the inclusion of the final discriminator in the GUI and the inclusion of the final flag (per run or per LS) in the Run Registry.
|
|
|