WIP towards a common condition infrastructure

In the Gaudi ecosystem, condition management has historically been relegated to experiment-specific code. Such code is typically based on non-reusable custom hooks and components, and sometimes even requires patching or replacing core Gaudi components to operate. This way of operating represents a large amount of effort duplication, especially in the light of the ongoing migration to multithreaded event processing, and to a new common condition database for some experiments. It would thus seem reasonable to revisit this design choice and strive to build a more standard Gaudi infrastructure for condition management.

Experience shows that there is little hope in trying to make all Gaudi-based experiments agree on a shared monolithic condition handling solution. Instead, the goal pursued here is to minimize the part of the condition handling logic that is experiment-dependent, and whenever feasible replace experiment-specific components with general-purpose equivalents that can be shared across experiments. As an interesting side-effect, it will become easier (though still nontrivial) to switch from one condition handling approach to another as experiment requirements evolve.

The proposed plan is to start by providing a common interface to the existing experiment-specific condition management code, then gradually build on top of this interface a full experiment-agnostic condition handling solution that any willing experiment can migrate to, with enough flexibility to satisfy the diverse needs of experiments with reasonably thin experiment-specific hooks.

This TWiki page is intended as a shared repository for design thoughts on such a common condition management infrastructure.

Current status

This project is currently at an early design stage, trying to enumerate and describe the required interfaces and functionality. Said description is annotated, where appropriate, with details on some specific design points that require special attention or must be clarified before implementation.

A prototype for the functionality whose design has mostly stabilized is available in https://gitlab.cern.ch/hgraslan/Gaudi/tree/condition-upgrade .

Part of this functionality already exists in Gaudi. For example, Gaudi::Time is more restricted in scope than the IExperimentTimePoint below, but could be extended or reused as part of its implementation. Same for Gaudi::IValidity versus ConditionIOV. I need help from experienced Gaudi developers in order to figure out what already exists, and discuss when existing classes should be extended instead of composed into new ones.

Architecture is not final, and all the names that are used here should not be taken as a precise mapping of functionality to concrete Gaudi components, but only as a way to name functionality in order to ease design discussions. Complex functionality may end up broken up in multiple classes, and closely related functionality may be merged into a single class if deemed necessary to improve code clarity or efficiency.

Some condition management use cases

To guide design discussions, here are a number of condition handling strategies of varying complexity which we may want this infrastructure to support:

  • When an event associated to new conditions is encountered, finish processing the events in flight, switch conditions, and start processing the new event.
  • As a performance optimization, queue input events for a while before performing a condition switch, in case there still are more input events associated with the same conditions (trigger systems often smear out condition transitions in the input event stream).
  • Allow for the concurrent existence of multiple detector stores. When new conditions are introduced, spawn a new detector stores corresponding to this detector state. Use a reference-counting mechanism to evict unused detector stores, and memory optimizations such as copy-on-write to avoid unnecessary state duplication. Fallback to a sequential solution if too many detector states are held in RAM simultaneously.
  • Make the creation of a new detector state asynchronous, using ATLAS' CondAlgs or pure TBB tasks, so that input event stream readout is not interrupted.
  • Support other condition storage layouts such as ATLAS' ConditionStore / AlignmentStore.

Proposed high-level workflow

In the proposed work flow, each element of the Gaudi processing pipeline which needs to process conditions accesses them through a kind of smart pointer, called a "ConditionHandle". This handle is associated to a specific detector state component, through its key, and is able to fetch the right version of that detector state component for a given event.

Under the hood, standardardized Gaudi components ensure the following properties:

  • Conditions are loaded or computed as required by the input event stream, and discarded when not used anymore.
  • A certain detector state component is made available at the moment where a Gaudi component needs it (i.e. component execution is only scheduled when ConditionHandle access is guaranteed to be valid and nonblocking)
  • The amount of concurrently held detector states is bounded, to 1 by default (for compatibility with legacy code), in a fashion that may be increased when code is compatible with concurrent condition management.
  • Once the maximal amount of concurrent detector states is reached, events associated with a new detector state are put on a bounded waiting queue, allowing for good handling of trigger smearing. The queue can be of length 0 if this process is handled in another way, such as by using a sufficiently large amount of concurrent conditions.
  • Thread synchronization overhead is kept to a minimum, without compromising the usability or encapsulation of the abstractions that are being introduced.
  • Memory usage is optimized by avoiding unnecessary state duplication.

Proposed Gaudi interfaces and functionality + known existing code

Common interface to experiment timestamps ("IExperimentTimePoint")

HEP experiments measure the passage of time in various ways:

  • Real time clocks (number of elapsed seconds since some epoch)
  • Run numbers (following an experiment-specific numbering policy)
  • Event number (guaranteed to be unique within a run)
  • Lumiblock number (in experiments that have lumiblocks)

Because event timing is highly experiment-specific, we should not rely on its precise format. However, to implement condition management, we do need a way to compare experiment time points, in order to check whether a certain event occured within the interval of validity of a certain detector state.

This is made more complicated by the fact that condition database entries do not usually provide the full timing data listed above, but only a subset of it. For example, the interval of validity of a given condition may be specified in terms of real time, or run numbers and event numbers, but not both. It will often be hard, or even impossible, to reliably relate one kind of time measurement with another. Which means, in mathematical terms, that there is only a partial ordering relationship between experiment time points, not a total one.

What I would like to see here is the following:

  • A common abstract interface to experiment-specific time points, exposing the aforementioned partial ordering relationship.
  • A well-defined protocol for scenarios where time points cannot be compared (because there is no well-defined order).
  • Although exceptions are fine as a way to signal invalid comparisons, an exception-free way to check comparability would be much welcome for performance.

This interface would be used to manipulate event time points, and to build a framework-level notion of what is condition's interval of validity (see below).

There is some existing infrastructure in GaudiKernel/Time, but it seems restricted to real-time clock timestamps. An interesting alternative to the proposed interface in the existing infrastructure is GaudiKernel/EventIDBase, however it involves imposing a single base class to all experiment time points, with a specific data format, that failed to reach unanimity across experiment. I thus propose sticking with an abstract interface, but making EventIDBase implement it as a good test bed.

Condition IoV representation ("ExperimentTimeInterval")

A single condition's interval of validity can be defined by two IExperimentTimePoints, representing the moment at which a condition starts to be valid and the moment at which it stops to be valid. For optimal usability, we will also need a method to tell whether a given event falls within a given condition's interval of validity. Simple enough so far.

Things become more interesting when multiple detector conditions have to be taken into account. In this case, checking the validity of all conditions for every input event would be wasteful. What we need instead is a kind of aggregate IoV that represents the intersection of the intervals of validity of all conditions in flight, and is only updated when conditions themselves are updated.

Because there is no total ordering relationship between experiment time points, this set intersection problem is nontrivial and may for example entail keeping track of multiple "start" time points and multiple "end" time points when incompatible time formats are used. One should either pay attention to this use case when designing the class in charge of representing condition IoVs, or dedicate a specific class to such IoV intersections.

To summarize, what we need as far as condition IoV handling is concerned is:

  • A way to build an IoV from two experiment time points.
  • A way to tell whether an experiment time point falls into an IoV.
  • A way to compute the intersection of the IoVs of two conditions, and manipulate it as an IoV.

There is some existing infrastructure in GaudiKernel/IValidity, but it uses GaudiKernel/Time timestamps (with the limitations mentioned above) and cannot compute IoV intersections. Another existing Gaudi component that should be explored is GaudiKernel/EventIDRange, which is based on GaudiKernel/EventIDBase (see above for a discussion), though the ability to compute intersections with the aforementioned semantics would also be needed.

Detector state component identifier ("IConditionKey")

Condition handles require some kind of key in order to identify individual detector state components. As of now, I do not know whether everyone has agreed on a common key format, so for the time being I'll develop against a minimal interface, which assumes just enough features to be used as a hashmap key.

Unfortunately, due to the way hash functions are defined in C++11/14, there is no way to implement an interface which represents the "is hashable" property. As a consequence, the contract above will need to be implicit, and condition keys will need to be specified as a template parameter to corresponding classes.

Consistent set of conditions ("ConditionSlot")

Another thing that condition handles need is a condition repository. We don't want them to care about the whole set of conditions currently stored in Gaudi, but only about the subset of these conditions that match the current event. This is handled using a piece of event context called the condition slot, which essentially behaves like a concurrent hash table mapping condition keys to actual condition data for the active event.

Condition slots should be aware of their IoV, which is the intersection of all the IoVs that they contain. This allows finding the right condition slot for a given event.

To be usable in a multithreaded environment with best concurrent performance, condition slots should provide an immutable view of individual conditions, which only allows inserting them once, and then reading them an indefinite amount of time before discarding them when they are not in use anymore.

Automatically discarding unused condition slots will also require some usage monitoring functionality in order to tell how many events in flight are using a condition slot. This can be as simple as having the event context hold something like a shared_ptr to the appropriate condition slot.

We do not want two condition data duplication, so condition slots should only act as a view of actual condition data, which may be shared between multiple condition slots. The actual storage can be as simple as a shared_ptr to condition data, or use a full blown centralized condition registry such as the Gaudi detector store or ATLAS' condition store at the cost of reduced concurrent performance. The interface to these storage systems should be implemented at a higher level than condition slots, such as the condition registry below.

In a normal Gaudi event pipeline, all events should access the same detector state elements, which means that all condition slots should have the same condition data model in terms of condition keys and value type. We could build on this assumption in order to add more error checking and increase storage efficiency by sharing model-related data between condition slots. Or we could leave room for this processing model to change towards a more dynamic direction in future Gaudi releases.

To allow for asynchronous condition generation, as in ATLAS, condition slots should not implement any performance optimization based on assuming that all condition data is available before event processing begins. However, accessing data which hasn't been inserted in the condition slot yet should obviously be an error. Another issue that needs to be taken care of is to find the proper way to compute a condition slot IoV when not all conditions are available yet.

Centralized conditions repository ("ConditionRegistry")

To manipulate multiple sets of conditions concurrently, we need a layer above condition slots that manage the sets of all condition slots in flight, makes a distinction between free and empty condition slots, and handles global concerns such as the allocation of slots to incoming events. It could also handle interaction with the underlying centralized condition storage subsystems, if any.

The following functionality should be provided:

  • Try to find an active condition slot for a specific event, if not try to create one, and if that also fails notify the caller so that it can move on with other things.
  • When a condition slot is created, try to reuse existing condition data as much as possible, do not create duplicates. This means that the condition registry should have a global view of all condition data that is currently available in RAM, no matter where (shared_ptr, detector store, condition store, alignment store...). A plugin-based system could help here.
  • Indicate which condition data needs to be recreated in the new condition slot.
  • Provide an interface for condition slots to notify that they are not in use anymore.
  • Limit the amount of condition slots that are held concurrently to a user-configurable amount.
TODO: Clarify what interface to the in-RAM condition storage layers we need

Smart pointer to condition data ("ConditionHandle")

This component is the main interface through which event processing components access the condition management infrastructure. After being initialized with the desired condition key, this component provides the following services:

  • Notifying Gaudi that the requested condition is needed by the host component.
  • Telling if the condition data is available for a given event (for the Gaudi scheduler).
  • Fetching said data (for the client)
  • Accounting for condition usage under the hood, in order to know when to discard a condition.
Under the hood, this functionality is mostly provided by going through the ConditionSlot of the event in flight.

Other areas to be explored

Interface to condition IO and computation ("ConditionGenerationSvc")

When the condition management infrastructure determines that some new conditions are needed, a number of operations need to be carried out:

  • Raw conditions which aren't in RAM yet must be loaded from the condition database, or a cache thereof
  • Calibrated conditions must be computed as appropriate
This process is currently experiment-dependent: some experiments perform this work synchronously using callbacks, while others try to run it asynchronously using a special flavour of Gaudi algorithms called CondAlgs. We will need to find a way to provide a common interface to these various mechanisms.

TODO: Define how we need to interact with the condition loading and computation subsystems. In particular, we may want to provide a unified interface to the new ConditionDB.

Interface to the event input subsystem ("IEventInputSvc")

The condition management infrastructure needs to interact with the event loading process, in the sense that it needs to check every incoming event before it is sent to the processing pipeline, in order to perform the appropriate condition loading and event queuing operations.

TODO: This is a part of Gaudi which I am not familiar with at all!

I need help to understand...

  • How events are loaded into Gaudi (e.g. which components decides to load a new event and when)
  • At which points a condition management infrastructure could check incoming events.
  • What is experiment-specific and what can be assumed to work identically across all experiments
TODO: Once better knowledge of the event input subsystem is reached, review the design of the following components and define which input functionality we need to interact with.

Input event handling mechanism ("InputConditionSvc")

This functionality is responsible for ensuring that appropriate conditions are loaded in RAM before input events are processed.

It requires a new Gaudi hook, which should be invoked after the timestamp for a new input event is loaded by the IO subsystem, and before said event is sent to the scheduler for processing. The hook should allow the InputConditionSvc to delay the processing of an event if deemed necessary by the active condition handling policy.

When this hook is fired, we want the following algorithm to be invoked:

  1. Gaudi notifies the InputConditionSvc that a new input event is being loaded, and provides the IExperimentTimePoint associated to this event.
    • Note that the location where time points are stored, and the "typical" way to pass them around, may soon become experiment-dependent:
      • In ATLAS, experiment time points are usually explicitly stored into and loaded from containers (EventStore / EventContext)
      • LHCb is currently moving into a design direction where function parameter passing semantics are favored over explicit whiteboard interaction
      • One experiment-independent technique would be to pass experiment time points to the InputConditionSvc hook by (const-)reference
    • Note that the exact time at which this hook is invoked is purposely ambiguous, as there can be a performance trade-off here when the condition handling strategy chosen by an experiment can involve queuing events instead of processing them immediately:
      • If the full event is loaded in one chunk, it optimizes IO throughput, at the cost of using up more RAM if an event ends up not being processed right away
      • If only the event timestamp is loaded, it will cause extra IO latency in the common case where the event DOES end up being processed right away
      • In the event where neither of the alternatives above is good enough, typically when event data is large or event queues are deep, a better compromise may be to fetch the event timestamp immediately, launch an asynchronous fetch for the remainder of event data in the background, and provide a way to pause or cancel said fetch if the event is not needed right away.
  2. After being notified that a new event is available, the InputConditionSvc checks, using the ConditionStorageSvc, if the conditions associated to this event are already in RAM.
    • Many experiments already have the infrastructure for this hidden somewhere (IOVSvc for ATLAS, UpdateManagerSvc for LHCb). We may initially just call this code, although we will ultimately want to provide an experiment-agnostic alternative (be it only to make sure that new Gaudi experiment do not need to reinvent this wheel).
  3. If the full detector state is available for the new event, then the InputConditionSvc sends it down the event processing chain and returns control to its caller.
    • Although this algorithmic step was separated from the previous one for clarity reasons, a race condition may occur if these two steps are truly independent. Consider the following two scenarios:
      • In the ATLAS condition infrastructure, the ConditionStore garbage collector is invoked right after the condition validity check has been performed, detects that the previously located conditions are not used anymore, and decides to remove them from RAM.
      • In an experiment that manages detector states using reference counting, the last client of the conditions we intend to use terminates, and the framework decides to eliminate the associated detector state.
    • To avoid this race condition, the underlying condition management infrastructure could provide an atomic operation that checks if the relevant conditions are available and, if so, reserves them before any garbage collection mechanism may have been invoked.
    • Another possibility is to design the condition garbage collection infrastructure so that it cannot be invoked asynchronously from the main input processing thread.
  4. If the required conditions are not yet available in RAM, an experiment-specific strategy is executed to decide what should be done next.
    • The reason why this step should be experiment-specific is that there are many ways to manage this event, each suited to different input event patterns and representing a different tradeoff between implementation complexity, CPU efficiency, RAM consumption, and IO efficiency. Experience shows that there is no right answer for all Gaudi-based experiments here. But although the choice of strategy is experiment-specific, we can and should write individual condition handling strategies in an experiment-agnostic manner, so that multiple experiments can share a common strategy without code duplication.
    • A Gaudi experiment should be able to select the right condition handling strategy for its needs, either at compile time (most efficient) or at job configuration time (more flexible, but is this efficiency cost justified by a clear need here?).

At the end of event processing, corresponding to the endEvent hook, we will also need to notify the ConditionStorageSvc that the conditions are not needed by the associated event processing thread anymore.

Most of the existing work here is located in experiment-specific components like LHCb's UpdateManagerSvc and ATLAS' IOVSvc.

Condition loading strategy ("IConditionLoadingStrategy")

As mentioned above, we should support multiple courses of action in the situation where a new event is loaded and the associated conditions are not available. Configuring the InputConditionSvc with a class which specifies this behaviour, following the classic Strategy design pattern, would be a nice way to handle this.

Here are some design points which such a condition loading strategy needs to answer:

  • If the condition storage subsystem is full, what do we do? Block until one condition storage slot frees up? Queue input events for a while to avoid interrupting event I/O? In the later case, until when?
  • While we are generating the conditions, what do we do? Do we block while the computation/IO is running, or do we run it asynchronously and schedule the event to be processed afterwards, meanwhile taking care of other things such as loading more input events?
In general, the more asynchronous we get, the best event processing throughput we can hope for in situations where condition loading is slow and frequently needed, at the cost of more complex code and bigger migration hurdles for existing experiments.

TODO: Be more specific about what design decisions must be taken here, and which default implementations we could provide

Glossary

Much of the terminology related to conditions varies subtly from one experiment to another, and even sometimes from one developer to another. A first step towards unification is thus to define a common terminology to be used when discussing condition handling in Gaudi:

  • Detector state: A software representation of the physical state of a HEP detector, at a given time during a data-taking experiment. Considered separate from event data, in the sense of being defined even when no event is occuring. Typically stored within the Gaudi detector store at runtime.
    • Detector state component: A logical subset of the detector state information (e.g. "electromagnetic calorimeter state", "liquid argon temperature"). Has an internal identifier, called a "key".
  • Condition: A detector state component that is only valid for a certain time interval, called its interval of validity. To process an event entails loading the associated conditions for all time-dependent detector state.
    • Raw condition: An authoritative measurement of detector state.
    • Calibrated condition: A quantity that is computed from raw conditions during event processing.
  • IoV: Shorthand for the interval of validity of a condition.
  • Condition database: A centralized repository hosting the time evolution of all raw conditions.
Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r17 - 2016-09-06 - HadrienGrasland
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Gaudi All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback