DQM Framework
DQM is:
- A monitoring project originally created for the Event Filter Farm
- A set of monitoring tools that can be used either on-or off-line:
- Tree-like structures w/ histograms, profiles, scalars, strings
- Quality tests that produce warnings, errors, alarms
- Visualization tools
- Transfer of monitoring information to remote nodes
- A wrapper around ROOT objects that offers the above functionality
The primary goal of the DQM system is
to guarantee the quality of physics data
collected by the general data acquisition,
identifing which subdetector has a problem
and understanding if the problem is a Data flow,
trigger or DCS one.
All of this starting from a unique graphic view of CMS
and with tools to allow easy navigation to the layers with higher granularity.
The (quasi) real-time analysis of the monitoring element information
should allow to get to a first definition of a run-quality file.
DQM software is considered to be offline
since it runs on event filter farm data,
but uses calibration and full reconstruction
and can also be run in offline mode on existing data files.
The DQM framework is designed to deal with
sets of available information (
Monitor Elements)
from the creation in monitoring producers (
Sources)
to their final use by monitoring information consumers (
Clients).
All CPU-intensive tasks
(e.g. comparison to reference monitoring element,
display, database storage, etc.)
are to be carried out at the clients side.
All source-client communication is carried out
through the collector (or
Collectors).
The collectors are responsible for advertising the monitorables
to different clients and serve monitoring requests.
The above design (represented in the below summarizing diagram) aims at
- shielding the sources from connecting clients that could slow down the main application or threaten the stability of the source itself
- facilitating the quick transfer of the monitoring information from the sources to the collectors.
The DQM infrastructure supports
various monitorable types:
1D-, 2D- and 3D-histograms, 1D- and 2D-profiles,
scalars (integer and real numbers) and string messages
can be booked and filled
or updated anywhere in the context of reconstruction
and analysis code.
The infrastructure takes care of publishing, tracking updates,
and transporting these updates to subscriber processes.
Access to booking, filling and modifying monitor elements is provided
via abstract interfaces in every component,
monitor elements are organized in (UNIX-like) directory structures
with virtually unlimited depth,
from which monitor consumers can pick and choose.
In every component, it is possible at any point in time
to create ROOT-tuples with snapshots of the monitoring structure
for debugging and reference.
DQM infrastructure
The DQM infrastructure consists of:
- Source applications or Producers
- Sources are defined as individual nodes that have either direct access to or can process and produce information we are interested in. The creation and update of a set of Monitor Elements (MEs) at the source can be the result of processing input
- event data (event consumers)
- input monitor elements (monitor consumers). For each monitoring task a separate producer will be available. Updated information from an individual source is distributed to all consumers (through the collector via TCP/IP) by an update task (MonitorDaemon) running periodically in a separate thread of the source process.
- Client applications or Consumers
- At the other end of this architecture are the Clients. A generic DQM client application is distinct from a source in that it normally only deals with monitor data, and not with event data. They can
- subscribe to and receive periodic updates of any desired subset of the MEs, in a classic implementation of the publish-subscribe service, through one or more collector instances the client is connected to;
- update these references
- set thresholds, compare to reference;
- raise alarms and create error messages for use by the central error logging facility;
- write data into the DB (e.g. refence MEs)
- read detector configuration variables from the DB.
- Collector applications or Collectors
- A hierarchical system of collector nodes is responsible for
- the communication between sources and clients (e.g. subscription requests)
- the actual monitoring transfer. These nodes serve as
- collectors for the sources, which can post messages to the collector advertising available monitor contents,
- monitoring servers for the clients, which are dispatched with the entire published content available at the collector. The collector receives subscription messages from clients that are relayed to the appropriate sources. When a source sends an update message containing new data, this is relayed to all subscribed clients. Unlike sources and clients, collectors are completely standardized and do not need any customization.
- GUI or Graphical User Interface
- A graphical user interface (GUI) to look at the monitor elements has been developed to ease the shift and casual user duties. While a simple histogram browser can not be generally considered a difficult task, in the case of the CMS DQM the challenge comes from the possibly large amount of information that might be in the system at a given point which could starve the resources of the machine rendering the application unresponsive. For this reason, to leverage the load due to the monitor element updates and to minimize the interference of the backend DQM operations with the actual operations of the GUI, the application had to be implemented around and event based, asynchronous design. This way the user could always profit from a highly responsive GUI when looking at the data, no matter what was the load on the machine because of the communication with the DQM server. The application is composed of two concurrent threads: one responsible of the communication with the DQM, the other for updating the GUI.
- Web Interface
- XDAQ is a C++ framework for writing data acquisition software. Making a DQM client a XDAQ application and running it in the context of a XDAQ executive provides it with web interactivity. More specifically, a XDAQ application can
- listen for HTTP requests
- react to them by executing predefined callback functions. If a DQM client is to have a web interface and respond to such requests, it needs to have its regular interaction with the collectors handled by a separate thread. Based on JavaScript, provides:
- powerful and user-friendly interface
- histogram browsing and *TrackerMap* ("exploded view" of tracker)
- Action buttons to
- trigger analysis,
- summary plot generation,
- apply QualityTests, etc.
-- Main.MiaTosi - 22 Jun 2007