How to create a new Data Management transformation

Introduction

dirac-dms-add-transformation is a script allowing to create many types of DMS transformations such as replication, removal, reduction of number of replicas, all run asynchronously using the RMS (and FTS for replications)

Script syntax

[localhost, PatchFull] ~ $ dirac-dms-add-transformation --help

Create a new dataset replication or removal transformation according to plugin
Usage:
  dirac-dms-add-transformation [option|cfgfile] ... 

General options: 
  -o  --option <value>         : Option=value to add 
  -s  --section <value>        : Set base section for relative parsed options 
  -c  --cert <value>           : Use server certificate to connect to Core Services 
  -d  --debug                  : Set debug mode (-ddd is extra debug) 
  -h  --help                   : Shows this help 
 
Options: 

  -  --Productions <value>    :    Production ID to search (comma separated list) 
  -  --FileType <value>       :    File type (comma separated list, to be used with --Production) [All] 
  -   --ExceptFileType=        :    Exclude the (list of) file types when all are requested 
  -  --BKQuery <value>        :    Bookkeeping query path 
  -  --Runs <value>           :    Run or range of runs (r1:r2) 
  -   --DQFlags=               :    DQ flag used in query 
  -   --StartDate=             :    Start date for the BK query 
  -   --EndDate=               :    End date for the BK query 
  -   --Visibility=            :    Set visibility (Yes, No, All) [Yes] 
  -   --ReplicaFlag=           :    Set visibility (Yes, No, All) [Yes] 
  -   --Plugin=                :    Plugin name (mandatory) 
  -  --Type <value>           :    Transformation type [Replication] (Removal automatic) 
  -   --NumberOfReplicas=      :    Number of copies to create or to remove 
  -   --KeepSEs=               :    List of SEs for the corresponding parameter of the plugin 
  -   --Archive1SEs=           :    List of SEs for the corresponding parameter of the plugin 
  -   --Archive2SEs=           :    List of SEs for the corresponding parameter of the plugin 
  -   --MandatorySEs=          :    List of SEs for the corresponding parameter of the plugin 
  -   --SecondarySEs=          :    List of SEs for the corresponding parameter of the plugin 
  -   --DestinationSEs=        :    List of SEs for the corresponding parameter of the plugin 
  -   --FromSEs=               :    List of SEs for the corresponding parameter of the plugin 
  -  --GroupSize <value>      :    GroupSize parameter for merging (GB) or nb of files 
  -   --Parameters=            :    Additional plugin parameters ({<key>:<val>,[<key>:val>]} 
  -   --RequestID=             :    Sets the request ID (default 0) 
  -   --ProcessingPasses=      :    List of processing passes for the DeleteReplicasWhenProcessed plugin 
  -   --Period=                :    minimal period at which a plugin is executed (if instrumented) 
  -   --CacheLifeTime=         :    plugin cache life time 
  -   --CleanTransformations   :    (only for DestroyDataset) clean transformations from the files being destroyed 
  -   --Debug                  :    Sets a debug flag in the plugin 
  -   --File=                  : File containing list of LFNs 
  -  --LFNs <value>           : List of LFNs (comma separated) 
  -   --Terminal               : LFNs are entered from stdin (--File /dev/stdin) 
  -   --LastLFNs               : Use last set of LFNs 
  -   --SetInvisible           : Before creating the transformation, set the files in the BKQuery as invisible (default for DeleteDataset) 
  -  --Start                  :    If set, the transformation is set Active and Automatic [False] 
  -   --Force                  :    Force transformation to be submitted even if no files found 
  -   --Test                   :    Just print out but not submit 
  -   --NoLFCCheck             :    Suppress the check in LFC for removal transformations 
  -   --Unique                 :    Refuses to create a transformation with an existing name 
  -   --Depth=                 :    Depth in path for replacing /... in processing pass 
  -   --Chown=                 :    Give user/group for chown of the directories of files in the FC ---+++ Options

The main options are those that allow to define a BK query and those that define the transformation to be performed. This is achieve by the usage of a large palette of plugins.

Input dataset

This can be achieved through a Bookkeeping query in two ways:

  • --Production <prodList> --FileType <fileTypeList> : you can give a list of productions (comma separated) and a list of file types (comma separated as well). A production range can be given in the form <prod1>:<prod2>. Special values are All or All.XXX where All is a wildcard for matching any file type produced by that production

  • --BKQuery <BKPath> : the BKPath should be in the order /ConfigName/ConfigVersion/Conditions/ProcessingPass/EventType/FileType. Note that you may use "ALL" or "" as path "directory", but for the first two... To be used with care ;-). You can use the path given by dirac-bookkeeping-production-information as a path.

  • --ExceptFileType <fileTypeList>: list of file types not to be considered. This can be used in conjunction with a --FileType ALL.DST option.

  • Additionally some qualifiers for the bookkeeping query can be added: --DQFlags, --Runs, --StartDate, --EndDate, --Visibility. The --Runs option can be a comma separated list of runs or of run ranges in the form <StartRun>:<EndRun>.

  • Examples:
--BKQuery /MC/2010/Beam3500GeV-VeloClosed-MagDown-Nu1/Sim07/Reco06-withTruth/10012004/DST
--Production 7605 --FileType DST
--Production 7605,7610,7624 --FileType All
--Production 7605,7610,7624 --FileType All.DST --Except CALIBRATION.DST --Visibility All
--Production 259:268 --FileType SEMILEPTONIC.DST,RADIATIVE.DST,MINIBIAS.DST,LEPTONIC.MDST,CHARMCONTROL.DST,CHARM.MDST,BHADRON.DST
--BKQuery /certification/test/ALL/ALL/ALL/ALLSTREAMS.DST 

One can also give a list of LFNs, or the name of a file that contains that list:

  • --LFNs <lfnList>: lfnList is a comma separated list of LFNs

  • --File <file>: (list of) files that conatins LFNs (one LFN per line). Note that the line may contain other stuff than the PFN, including a file prefix as the LFN is extracted (to the best)

Plugin

It is mandatory to specify a plugin name using the --plugin <plugin> option. A detailed list of plugins can be found here. Only replication and removal plugins can be used, not processing plugins!

Additional options

  • --Test: does not launch the transformation but checks the BKQuery, prints some information about it and the entered parameters. Existence of SEs is checked. Example
  $ dirac-dms-add-replication -o /DIRAC/Setup=LHCb-Certification --Production 259:268 --FileType SEMILEPTONIC.DST,RADIATIVE.DST,MINIBIAS.DST,LEPTONIC.MDST,CHARMCONTROL.DST,CHARM.MDST,BHADRON.DST --Plugin LHCbMCDSTBroadcastRandom --Test
Transformation Name: Replication-SEMILEPTONIC.DST/RADIATIVE.DST/MINIBIAS.DST/LEPTONIC.MDST/CHARMCONTROL.DST/CHARM.MDST/BHADRON.DST-259/260/261/262/263/264/265/266/267/268
Transformation group: LHCbMCDSTBroadcastRandom
Long description: LHCbMCDSTBroadcastRandom of SEMILEPTONIC.DST,RADIATIVE.DST,MINIBIAS.DST,LEPTONIC.MDST,CHARMCONTROL.DST,CHARM.MDST,BHADRON.DST for production 259,260,261,262,263,264,265,266,267,268
BK Query: {'FileType': ['SEMILEPTONIC.DST', 'RADIATIVE.DST', 'MINIBIAS.DST', 'LEPTONIC.MDST', 'CHARMCONTROL.DST', 'CHARM.MDST', 'BHADRON.DST'], 'ProductionID': ['259', '260', '261', '262', '263', '264', '265', '266', '267', '268'], 'Visibility': 'Yes'}
BKQuery obtained 28 files
/lhcb/certification/test/MINIBIAS.DST/00000266/0000 4
/lhcb/certification/test/RADIATIVE.DST/00000267/0000 4
/lhcb/certification/test/CHARM.MDST/00000261/0000 4
/lhcb/certification/test/BHADRON.DST/00000259/0000 4
/lhcb/certification/test/CHARMCONTROL.DST/00000262/0000 4
/lhcb/certification/test/LEPTONIC.MDST/00000264/0000 4
/lhcb/certification/test/SEMILEPTONIC.DST/00000268/0000 4
Plugin: LHCbMCDSTBroadcastRandom
Parameters: {}
RequestID: 0

  • --Start: sets the production Active and Automatic at creation time (default is to set it New).

  • --Request <requestID>: allows to assign the transformation to a request. Default value is that of the production in the BK query (if specified and if only one production)

  • --NoLFCCheck: do not check that the input files have an entry in the LFC (by default a check is performed)

  • --SetInvisible: set the input files invisible in the bookkeeping. This is the default for the DeleteDataset plugin, such that users do not get files that are only in archive SEs.

  • --Force: forces to create a transformation even if there are no files matching the criteria (yet)

  • --Unique: does not create the transformation if a transformation already exists with the same name (default is to add "-1/2/3/..." until available)

  • --Chown <user>/<group>: before launching the transformation, changes ownership of the directories concerned to <user> and <group>

Default plugin parameters in CS

The parameters used by transformation plugins can be given when creating the transformation, but default values can be set in the CS. Section /Operations/Defaults/TransformationPlugins contains default values used by all plugins, which can be overwritten by options in /Operations/Defaults/TransformationPlugins/<pluginName> which themselves are superseded by parameters set in the transformation at creation time.

How to debug a plugin while in production

It is possible to switch verbose mode on just by changing a parameter in the CS. In the CS section mentioned above, set the option "Debug = True" and next time the plugin is instantiated, it will be in verbose mode. Don't forget to reset "Debug = False" or remove the option when debugging is over. You can also use the option --Debug when creating the transformation.

Examples of transformation creation

  • Replicate a dataset to CERN-FREEZER:
dirac-dms-add-replication --BK /LHCb/Collision12//RealData/Reco13c//FULL.DST --Run 124134 --Plugin ReplicateDataset --Destination CERN-FREEZER --Start

  • Reduce the number of replicas on disk to 2 for a dataset:
dirac-dms-add-replication --BK /MC/MC10 --Plugin DeleteReplicas --MandatorySE '' --NumberOfReplicas 2 --NoLFCCheck --Start

  • Removing all replicas at a given set of SEs:
dirac-dms-add-replication --BK /MC/MC10 --Plugin DeleteReplicas --MandatorySE '' --FromSE PIC_MC-DST,PIC_MC_M-DST --NoLFCCheck --Start

  • Create the replication transformation for all 2011 RAW data:
dirac-dms-add-replication --BK /LHCb/Collision12//RealData/90000000/RAW --Plugin RawShares --Start

  • Create the replication transformation for all DST streams of a given processing pass but for CHARMTOBESWUM.DST, CALIBRATION.DST and PID.MDST
dirac-dms-add-replication --BK /LHCb/Collision12//RealData/Reco13/Stripping19//ALL.DST,ALL.MDST --Except CHARMTOBESWUM.DST,CALIBRATION.DST,PID.MDST --Plugin LHCbDSTBroadcast --Start

  • Repair for the loss of a list of files (list contained in a file). Note that one can use --Term (instead of --File) and copy/paste a list of LFNs:
dirac-dms-add-replication --Start --Plugin Healing --File lxfsrf15c05.lost

  • Re-replicate a dataset according to the Computing Model, but only 2 disk replicas, no mandatory replica or archive at CERN, after some deletion of replicas for example (or failures):
dirac-dms-add-replication --Plugin LHCbMCDSTBroadcastRandom --Number 2 --Archive1SE '' --Mandatory '' --BK /MC/MC10//Sim01/Trig0x002e002aFlagged/Reco08/Stripping12Flagged//ALL --NoLFCCheck --Start

--+++ DMS transformations for 2016 data

The list of transformations required for the 2016 data flow is given here

-- PhilippeCharpentier - 26-Feb-2011

Edit | Attach | Watch | Print version | History: r14 | r12 < r11 < r10 < r9 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r10 - 2016-07-07 - PhilippeCharpentier
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback