GaudiExcise, slice out an algorithm from Gaudi to run in its own sandbox

excise tr. vb._ (ĕk′sīz′) :
  • To remove by or as if by cutting.

(the result of task #49868 )

....... Excise, Wizards of the Coast

Introduction

One of the biggest challenges to running Gaudi is the complexity of how we use and abuse the framework in LHCb code. It is exceedingly difficult to envisage a unit-test which only runs one algorithm extracted from a huge intricately scheduled web of different algorithms, tools and services existing within Gaudi. It is often impossible to identify each prerequisite. This leads to very time consuming development procedures and test-over-coverage and over-sensitivity. A lot of our tests are integration tests, when unit-tests are better suited to quickly detect the defects we really wish to observe.

For timing tests, hunting memory leaks, and other tasks, one would ideally wish the ability to run the minimal amount of software which demonstrates the problem, however, Algorithms rely on tools and services which inside LHCb often follow no specific convention and wildly differ between subdetectors (just as an example) as such the learning curve for even starting a debugging process is prohibitive.

gaudiexcise.py is a wrapper around gaudirun.py which creates a sandbox around a given algorithm, such that the user is able to run just one algorithm if required, improve, check, for a tighter development cycle.

How does it work

gaudiexcise.py assumes that the output of an algorithm is the combination of Configuration and Transient Event Store state.

  • Configuration: The entire configuration of your original job is preserved as-is and your entire job configuration is always enacted.
  • TES: the entire content of the TES is dumped just before the running of your algorithm, into a DST-like ROOT-formatted file.

Controlling what runs and what doesn't run:

  • Enable: Each GaudiAlgorithm has a (hidden) property Enable.
  • Finding: gaudiexcise.py walks through the tree of all running Algorithms from the ApplicationMgr().TopAlg and OutputStream, to identify the first occurrance of the algorithm of a certain name
  • Disable: gaudiexcercise.py needs to Deactivate algorithms (Enable=False) for all algs either before or after the one you found (depending on what it is you are doing) and insert an OutputStream just before your alg
  • PostConfig: This must be done after all other configurations, and can only touch Enable, and for that it needs to use postConfigAction.

User guide

Prerequisites:

  • you will have a Gaudi job from which you want to excise an algorithm.
  • you will know the full Gaudi name of the algorithm you want to excise "GaudiTypeName/InstanceName"
  • you will have write access somewhere to store the sandbox.dst
  • you will need to be running within a significantly up-to-date software stack which at least contains the module GaudiConf.Manipulations. Preferrable LHCb > 37r0 (to see what stack you depend on, try svnProjectDeps -P SomeProject vXrYpZ

Help?:

  • gaudiexcise.py with no options will print a load of help information and exit.

#setup some project and check your options to gaudirun result in what you expect. For example:
$ SetupProject Brunel
$ gaudirun.py $BRUNELROOT/tests/qmtest/brunel.qms/brunel2012magdown.qmt --option="from Configurables import Brunel; Brunel().EvtMax=10;"
 
#identify an algorithm that you want to extract and run, and pass it to GaudiExcise
$ gaudiexcise.py "PatForward/PatForward" $BRUNELROOT/tests/qmtest/brunel.qms/brunel2012magdown.qmt --option="from Configurables import Brunel; Brunel().EvtMax=10;"

....

#check the sandbox file was created...

$ ls
... sandbox.dst ...

#check the contents of that file
$ gaudiexcise.py -d

#run a second time and only the algorithm you asked for will be run, check the Timing Table!
$ gaudiexcise.py "PatForward/PatForward" $BRUNELROOT/tests/qmtest/brunel.qms/brunel2012magdown.qmt --option="from Configurables import Brunel; Brunel().EvtMax=10;"

In this example, the second running will fail, and this can only happen if the assumptions of gaudiexcise.py are incorrect, so let's examine the assumptions and see how to fix them in this case.

When do the assumptions break down

  1. When hidden data break the Gaudi model
    • The assumption that "the TES is all that matters" is part of the Gaudi framework, but clever LHCb programmers have been finding ways around that for the last 15 years.
    • So, in some cases hidden information may be transferred between algorithms by using tools or services which remember the state without using the TES.
    • The only solution is to try and excise a bigger chunk of the running algs, or to re-add the algs which set up this state into your sandbox.
  2. When configuration is smarter than it should be
    • The assumption that the configuration of Algs and tools all happens before running on any data is part of the Gaudi framework, but clever LHCb programmers have been finding ways around that for the last 15 years.
    • If an algorithm is trying to be smart and trying to reconfigure itself based on properties of other algorithms which may be set at run time, or worse, setting it's own properties at run-time, or the properties of another algorithm at run-time, this is bad.
    • If some other service other than the JobOptionsService is configuring things, i.e. if you are running from a TCK, then you're stuck for now (I could implement a tool for TCKs, but not generically for any arbitrary JobOptions replacement.
    • The only solution is to gradually excise more and more of the sequence and see if that fixes this reconfiguration problem.or maybe debug printout can tell you when properties are being changed, and to what and why, but often there is no documentation about things that don't conform to the Gaudi framework.
  3. When TES data cannot be persisted
    • Again it is part of the Gaudi framework that all DataObjects should be writable to files, however, this is not 100% the case.
    • Some classes miss the necessary methods and streamers to send into a ROOT file.
    • So, if you need this data for your algorithm, you will need to recreate it by re-running at least the algorithm that produced it.
  4. When the memory layout is what you are testing
    • Obviously the sandbox approach does not recreate the exact memory layout, so some memory leaks may go undetected, and YMMV as to real performance improvements in the full system.
  5. When a DataObject changes its description
    • If you have a sandbox file, it will be storing some TES classes which were never intended to be stored in a file, in this case some slight code change might invalidate your sandbox file.
    • This is easy to fix, just recreate your sandbox
  6. When the alg to test is run as DataOnDemand
    • DataOnDemand algorithms cannot be excised since Gaudi has no idea when they will be run. However, most DoD algs are independent...
    • first try making a sequence which runs this alg outside of DoD, so long as this DoD alg does not rely on other DoD algs, you should be OK.
  7. When the alg to test is in a "ProcessPhase"
    • GaudiSequencers are very powerful tools in Gaudi and with python configurables we can intelligently add to them based on logic.
    • However, before we had python options we needed to do a lot of logic in C++, and ProcessPhase is a relic from this time.
    • It configures its own members based on its own "Detectors" property, and so I can't insert before/after or disable any of these guys.
    • gaudiexcise.py will warn you when this is the case, and will suggest options. One of those options is to replace the ProcessPhase with two lines of Python which would do the same job.
  8. When the alg somehow is run twice, in different events, within different sequencers
    • If an alg appears in two different sequencers, that's perfectly allowed in Gaudi and can save on CPU and make scheduling/sequencing easier.
    • However, if the alg is excecuted in different events in these different sequencers it cannot be cleanly excised both times.
    • gaudiexcise.py will warn you when this is the case, and will suggest options. Options are very limited in this case. If you don't want to test the first instance, erm, exclude it somehow I guess.
  9. When an alg is being added using its own postConfigAction.
    • Algs appended with their own postConfigAction will never be seen by GaudiExcise.
    • See if you can insert it in the right place, otherwise, give up...

How to fix this example?

In this example, the problem is "When TES data cannot be persisted". The STLiteClusters cannot be stored to a file. And so we need to add something which recreates them from the raw event. Luckily this just means killing a couple of locations and running a couple of decoders before our algorithm.

gaudiexcise.py respects the content of two sequencers GaudiSequencer('PreExciseUserAlgs') and GaudiSequencer('PostExciseUserAlgs') which you can edit to recreate the parts you actually want of the TES, as shown below for this example.

from Gaudi.Configuration import *
#Lite clusters cannot be persisted correctly :S
from DAQSys.Decoders import DecoderDB
from DAQSys.DecoderClass import decodersForBank
from Configurables import EventNodeKiller
enk=EventNodeKiller("KillBrokenClusters")
enk.Nodes=["/Event/Raw/IT/LiteClusters","/Event/Raw/TT/LiteClusters","/Event/Raw/Velo/LiteClusters","/Event/Raw/Velo/Clusters","/Event/Raw/IT/Clusters","/Event/Raw/TT/Clusters"]
alwaysAdd=[enk]
alwaysAdd+=[d.setup() for d in decodersForBank(DecoderDB,"IT")]
alwaysAdd+=[d.setup() for d in decodersForBank(DecoderDB,"TT")]
alwaysAdd+=[d.setup() for d in decodersForBank(DecoderDB,"Velo")]

#alwaysAddNames=[g.getFullName() for g in alwaysAdd]
from Configurables import GaudiSequencer, StoreExplorerAlg
GaudiSequencer('PreExciseUserAlgs').Members=alwaysAdd

GaudiSequencer('PostExciseUserAlgs').Members=[StoreExplorerAlg()]

Saving this to a file called fixtes.py, one can then re-run with:

$ gaudiexcise.py "PatForward/PatForward" $BRUNELROOT/tests/qmtest/brunel.qms/brunel2012magdown.qmt --option="from Configurables import Brunel; Brunel().EvtMax=10;" fixtes.py

Examining what was done by gaudiexcise

There are two things to examine:

  1. The content of the sandbox dst
    • gaudiexcise -d will dump this for you
  2. Which algs were identified as being before and after
    • gaudiexcise prints [before] [contains] [after] [containedby] at the start of each job
    • before: algs identified as running before the one you asked for. By default these will be disabled when you run from your sandbox
    • contains: sequencers which contain the algorithm you asked for. These cannot be deactivated.
    • after: algs identified as running after the one you asked for, by default these are always disabled, but you can run them instead with a simple option to gaudiexcise.py
    • containedby: if your amgorithm to excise is a sequencer, I shouldn't disable what it contains, which is printed in containedby

What other options are there.

Best way to see this is to add --help, or run gaudiexcise.py with no options or arguments.


-- RobLambert - 19 Mar 2014

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng Excise.png r2 r1 manage 454.1 K 2014-03-19 - 18:07 RobLambert Magic the Gathering, wizards of the coast.
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2014-03-25 - RobLambert
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback