Upgrading framework-level condition management


The infrastructure that is currently used to manage conditions within the LHCb framework is not compatible with the concurrent event processing mechanism that was introduced by GaudiHive. This will need to change in the context of the ongoing computing upgrade.

Considering that every active Gaudi-based experiment is having the same problem at the same time, and that there are ongoing discussions regarding a possible joint migration of ATLAS, LHCb and FCC towards a new common condition database, the most sensible option would be to solve this problem in a cross-experiment way. This means moving as much of the condition management infrastructure as possible to Gaudi, instead of keeping most of it in experiment-specific modules like LHCb's UpdateManagerSvc and ATLAS' IOVSvc.

However, I have already ruled out the possibility of all Gaudi-based experiments moving to the exact same condition handling strategy, so part of the code will need to remain experiment-specific. Moreover, in the case of LHCb, the tight upgrade schedule mean that we should strive to get something working quickly, in a fashion that makes it possible to refine the resulting design later on.



What we currently have is an infrastructure for loading the conditions associated to a specific detector state in RAM, and updating said detector state as needed when incompatible input events are received.

This infrastructure is not compatible with multithreaded event processing because a detector state switch may occur as other event processing threads are still using the "old" detector state, resulting in incorrect computations.

The simplest way to make this infrastructure thread-safe is to wait for all events in flight to be processed before performing a detector state switch. As a CPU performance optimization, it is also possible to keep accepting input events by queueing incompatible ones, see [1] for more details.

The approach above is not suitable for all HEP experiments, in the sense that it can only achieve good performance if condition switches are rare. That happens to be the case for LHCb (~1 condition switch / hour online, homogeneous jobs offline).

Alternatives allowing for multiple conditions to be stored in RAM simultaneously would be much more complex to implement, as the entire condition handling infrastructure must be reviewed in order to eliminate all instances of the Singleton design pattern. This migration is not currently needed, so we should refrain from going through it in the context of the current upgrade. But we should open the possibility of performing that migration in a later upgrade, if it becomes necessary due to changes in the LHCb condition usage patterns or the event I/O subsystem.

Keeping this possibility open is also in line with the aforementioned goal of moving more condition management work to Gaudi: if we implement a Gaudi-level condition management interface that can plug into multiple concrete implementations, it will ease the pain of migrating to another condition management strategy in the future. See [2] for ongoing work in this direction.


TODO: Study the existing LHCb infrastructure to find out where and how we can hook into it. The underlying design questions heavily overlap with those in [2].


This is the work that I have done so far. If you know of any other relevant resource regarding condition management in LHCb and Gaudi, please link it here!

  1. Study of possible condition management strategies, with a focus on LHCb and ATLAS: https://cernbox.cern.ch/index.php/s/DoBcszHkN9E4LxX
  2. Design work towards a common Gaudi condition infrastructure: https://twiki.cern.ch/twiki/bin/view/Gaudi/CommonConditionInfrastructure

See also

-- HadrienGrasland - 2016-07-05
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2016-07-05 - HadrienGrasland
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback