-- StuartPaterson - 08 Apr 2009

Introduction to the Production API

The Production API provides methods for production creation and also the possibility for generating workflow (XML) templates (editable via the WF Editor). The main purpose of this API is to condense the ~300 lines of old workflows into something more manageable for a wider group of people. Initially the API supports simulation and reconstruction type workflows only (e.g. stripping is yet to be added).

The building blocks of the API are the Gaudi Application Step containing GaudiApplication, LogChecker and BookkeepingReport modules and the Finalization Step.

For information there are currently two files in DIRAC containing the Production API:

The 2 distinct results of the finalization are explained below:

  • DIRAC/LHCbSystem/Client/Production.py - contains all the API methods;
  • DIRAC/LHCbSystem/Client/ProductionTemplates.py - to generate templates and provide examples.

Production.py inherits from LHCbJob.py so those who have a familiarity with the DIRAC API should be quite comfortable with the methods allowing some common settings to be applied at the production level if desired e.g.

setSystemConfig('slc4_ia32_gcc34')
setDestination('LCG.MySiteToTest.ch')
setBannedSites('LCG.CNAF.it')
setCPUTime('300000')
setLogLevel('debug')
...

One example of where the Production API can make life easier is by wrapping around these functions e.g. it may be common to ban all Tier-1 sites from a MC simulation production so there is a method to do just that:

banTier1s()

without remembering or typing the Tier-1 site names wink

Optimizations still to be performed:

  • Stripping is yet to be fully implemented (pending something to test)
  • Storing production parameters after publishing the workflow - plenty of scope for useful things to be available here: output data directories , ....
  • Factoring out the options into a separate LHCbSystem/Utilities/ProductionOptions.py utility for convenience (this will likely evolve faster than the API)
  • Improved type checking is not in place purely due to time constraints.

A typical Gauss, Boole simulation workflow explained

This template is also available in DIRAC/LHCbSystem/Client/ProductionTemplates.py and is included in full here before adding commentary line by line:

from DIRAC.LHCbSystem.Client.Production import *
gaussBoole = Production()
gaussBoole.setProdType('MCSimulation')
gaussBoole.setWorkflowName('Production_GaussBoole')
gaussBoole.setWorkflowDescription('An example of Gauss + Boole, saving all outputs.')
gaussBoole.setBKParameters('test','2008','MC-Test-v1','Beam7TeV-VeloClosed-MagDown')
gaussBoole.setDBTags('sim-20090112','head-20090112')
gaussOpts = 'Gauss-2008.py;Beam7TeV-VeloClosed-MagDown.py;$DECFILESROOT/options/@{eventType}.opts'
gaussBoole.addGaussStep('v36r2','Pythia','2',gaussOpts,eventType='57000001',extraPackages='AppConfig.v2r2')
gaussBoole.addBooleStep('v17r2p1','digi','Boole-2008.py',extraPackages='AppConfig.v2r2')
gaussBoole.addFinalizationStep(sendBookkeeping=True,uploadData=True,uploadLogs=True,sendFailover=True)
gaussBoole.banTier1s()
gaussBoole.setWorkflowLib('v9r9')
gaussBoole.setFileMask('sim;digi')
gaussBoole.createWorkflow()

First the API must be imported:

from DIRAC.LHCbSystem.Client.Production import *
The production object encapsulates the workflow and should be instantiated:
gaussBoole = Production()
The production type should always be specified, allowed production types include:
  • MCSimulation
  • DataReconstruction
  • Merge
  • MCStripping - yet to be implemented
  • DataStripping - yet to be implemented
and protection is in place to ensure only these types can be specified.
gaussBoole.setProdType('MCSimulation')
The workflow name is the string given to the workflow XML file itself, this is what appears by default in the production monitoring page.
gaussBoole.setWorkflowName('Production_GaussBoole')
The workflow description can go into more detail about the workflow and (at least used to be) accessible from the production monitoring page.
gaussBoole.setWorkflowDescription('An example of Gauss + Boole, saving all outputs.')
The Production() object takes care of setting the BK processing pass for the production and each parameter will be explained in more detail below:
  • configName - the BK configuration name e.g. 'MC'
  • configVersion - the BK configuration version e.g. '2009'
  • groupDescription - this corresponds to the processing pass index 'Group Description' field
  • conditions - whether DataTaking or Simulation conditions both have a meaningful name
when publishing a production workflow to the production management system the create() method (see below) gives you the option to publish directly to the BK as well or to generate a script locally to perform the same action.

The groupDescription field can be chosen anew or the existing names can be reused (check via dirac-bookkeeping-pass-index-list). If not already present the new processing pass will be added automatically.

Examples for the conditions parameter include:

  • DataTaking6137
  • Beam7TeV-VeloClosed-MagDown
in both cases the specific tag must be predefined in the Bookkeeping (for Simulation Conditions the parameters are retrieved from there before publishing the production processing pass).

gaussBoole.setBKParameters('test','2008','MC-Test-v1','Beam7TeV-VeloClosed-MagDown')

All productions (or production requests) must be defined with database tags for the conditions and detector description. The setDBTags() method takes the following arguments:

  • conditions e.g. sim-20090112
  • detector e.g. head-20090112

gaussBoole.setDBTags('sim-20090112','head-20090112')

The options files to be used for a given Gaudi Application Step are specified as a string separated by a colon or as a python list ![]. Standard workflow parameters can be set in the names of the options files (more on this in the examples below).

gaussOpts = 'Gauss-2008.py;Beam7TeV-VeloClosed-MagDown.py;$DECFILESROOT/options/@{eventType}.opts'

The Gauss Gaudi Application Step of the workflow can now be defined. The full interface is available in Production.py and there are further examples below.

gaussBoole.addGaussStep('v36r2','Pythia','2',gaussOpts,eventType='57000001',extraPackages='AppConfig.v2r2')

The Boole Gaudi Application Step can now also be defined. Note that it is unnecessary to specify the eventType again this is taken by default from the previous step (but is possible to override this). The ordering of the addStep() methods also dictates the input / output chain e.g. the outputs of Gauss are automatically fed to Boole in this workflow.

gaussBoole.addBooleStep('v17r2p1','digi','Boole-2008.py',extraPackages='AppConfig.v2r2')

The Finalization step should be defined for all productions. The below can be set without arguments since the default is True for the standard modules but for debugging purposes individual modules can be enabled / disabled if desired. For local testing the default can also be set to True since nothing happens without the JOBID environment variable being set (see the below section on running XML workflows).

gaussBoole.addFinalizationStep(sendBookkeeping=True,uploadData=True,uploadLogs=True,sendFailover=True)

As explained above, this is a quick way to ban all the Tier-1 sites for simulation productions.

gaussBoole.banTier1s()

The version of the workflow library can be set using the below method, this automatically appends the input sandbox LFN for the production jobs.

gaussBoole.setWorkflowLib('v9r9')

The output data files with correct naming convention are automatically produced for each Gaudi Application Step. The output data file mask is the final authority on which file extensions are to be retained for the given production and can be specified as a semi-colon separated string or a python list ![].

gaussBoole.setFileMask('sim;digi')

At this point by adding another API command the workflow XML can be created for local testing.

Generating the workflow for a production

...

gaussBoole.createWorkflow()

Using the below script for running locally we can now test whether the production workflow is sane.

Running any XML workflow locally

Once you have an XML workflow this section explains how to run it locally. The only caveat to mention here is that it has only been tested in the DIRAC Env scripts environment (some configuration changes may be required for this to work in the SetupProject Dirac case).

The production create() method

How to publish to the BK etc.

Example 1 - overriding the standard options lines for a simulation production

In this case there was a request for a production with 'old' versions of the projects that don't support LHCbApp(). The standard options are printed by the Production.py API and it was sufficient to remove traces of the LHCbApp() module (relying on the defaults from the options). Note that the create() function here is used just to write a script for BK publishing, this allows to check that the processing pass is ok before making it visible (if it is ever to be made visible) and will only generate the workflow XML e.g. this can be tested locally before entering to the production system.

Example 2 - creating and running a reconstruction (FEST in this case) production

Templates for both the express stream and full stream reconstruction are available in the ProductionTemplates.py file. The below example is for the express stream reconstruction.

Example 3 - creating and running a merging production

coming soon (hopefully)

Edit | Attach | Watch | Print version | History: r6 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2009-04-08 - StuartPaterson
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback