ReDigi workflows

The first step is to subscribe the input dataset to a Tier-1 disk endpoint if necessary. Usually this will be on the same site which has the GEN-SIM on tape, but can be a different Tier-1 if necessary, for example if the custodial Tier-1 has too much work. If there is no custodial site, then subscribe the GEN-SIM to an appropriate Tier-1 based on the current work at each site.

I use the script to produce a list of workflows in the assignment-approved state:

The input datasets and custodial sites are also included in the output. Example:
bash-3.2$ python 
jen_a_BTV-Spring14miniaod-00071_00077_v0__141119_181505_4141 /QCD_Pt-170to300_MuEnrichedPt5_Tune4C_13TeV_pythia8/Spring14dr-PU_S14_POSTLS170_V6-v1/AODSIM T1_US_FNAL_MSS

I then manually make the subscriptions to the appropriate sites from the PhEDEx page

A script is in testing which automatically subscribes GEN-SIM datasets to disk on custodial Tier-1s (the Phys14DR campaign has mostly been done this way).

Assigning ReDigi workflows
The script is used to assign workflows. It's designed to be run twice - firstly as a "dry-run" to ensure everything is fine (e.g. acquisition era, ProcessingString, etc) then again for real.

Example checking assignment of a single workflow:

bash-3.2$ python -w pdmvserv_B2G-Summer12DR53X-00799_00332_v0__141022_160538_7286 -s T1_ES_PIC
Would assign  pdmvserv_B2G-Summer12DR53X-00799_00332_v0__141022_160538_7286  with  Acquisition Era: Summer12DR53X ProcessingString: PU_S10_START53_V19 ProcessingVersion: 1 lfn: /store/mc Site(s): T1_ES_PIC Custodial Site: T1_ES_PIC team: reproc_lowprio
This script needs to be run again with the -e option in order to actually assign the workflow.

There are 3 ways the script can be used

  • -w option: specify a single workflow
  • -f option: specify a file containing a list of workflows, one per line
  • neither of the above: all workflows in the assignment-approved state are obtained from the WMStats API and are considered

To force a workflow to be assigned to a specific site use the -s option. If this is not specified, the workflow will be assigned to the Tier-1 which has the complete input dataset on disk. By default the output datasets will be custodial at the Tier-1 that the workflow was assigned to. To specify an alternative Tier-1, use the -c option. If you want to run a workflow at a site that does not have the input data you need to specify the -o option to enable xrootd to be used.

Run python -h for more information.

Over the past few weeks I've moved to using neither the -w or -f options, and just letting it assign all workflows in assignment-approved automatically as soon as it checks that prestaging is complete (the entire Phys14DR campaign has been done this way).

Announcing workflows
I use the script to produce a list of workflows in the closed-out state:
python closed-out
The script can be used to both generate a list of the output datasets and check how complete they are.

For the case of run-dependent MC, you need to check that the number of jobs created was above the appropriate threshold (500 for PU_RD1 or 2000 for PU_RD2).

If everything is seems ok (generally output datasets should be >=95% and <=100% complete):

A script is in testing which gets a list of closed-out workflows and if everything is ok they are set to announced, the output datasets are set to VALID and subscribed to Tier-2s if necessary.

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2014-11-24 - AndrewLahiff
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback