How to create a new Data Management transformation

Introduction

dirac-dms-add-transformation is a script allowing to create many types of DMS transformations such as replication, removal, reduction of number of replicas, all run asynchronously using the RMS (and FTS for replications)

Script syntax

[localhost] ~ $ dirac-dms-add-transformation --help

Create a new dataset replication or removal transformation according to plugin
Usage:
  dirac-dms-add-transformation [option|cfgfile] ...

General options:
  -o  --option <value>         : Option=value to add
  -s  --section <value>        : Set base section for relative parsed options
  -c  --cert <value>           : Use server certificate to connect to Core Services
  -d  --debug                  : Set debug mode (-ddd is extra debug)
  -   --autoreload             : Automatically restart if there's any change in the module
  -   --license                : Show DIRAC's LICENSE
  -h  --help                   : Shows this help

Options:
  -B  --BKQuery <value>        :    Bookkeeping query path
  -f  --FileType <value>       :    File type (comma separated list, to be used with --Production) [All]
  -   --ExceptFileType=        :    Exclude the (list of) file types when all are requested
  -   --EventType=             :    Event type
  -r  --Runs <value>           :    Run or range of runs (r1:r2)
  -P  --Productions <value>    :    Production ID to search (comma separated list)
  -   --DQFlags=               :    DQ flag used in query
  -   --StartDate=             :    Start date for the BK query
  -   --EndDate=               :    End date for the BK query
  -   --Visibility=            :    Required visibility (Yes, No, All) [Yes]
  -   --ReplicaFlag=           :    Required replica flag (Yes, No, All) [Yes]
  -   --TCK=                   :    Get files with a given TCK
  -   --Type=                  :    Transformation type [Replication] (Removal automatic)
  -   --Plugin=                :    Plugin name (mandatory)
  -   --Parameters=            :    Additional plugin parameters ({<key>:<val>,[<key>:val>]}
  -   --RequestID=             :    Sets the request ID (default 0)
  -   --KeepSEs=               :    List of SEs for the corresponding parameter of the plugin
  -   --Archive1SEs=           :    List of SEs for the corresponding parameter of the plugin
  -   --Archive2SEs=           :    List of SEs for the corresponding parameter of the plugin
  -   --MandatorySEs=          :    List of SEs for the corresponding parameter of the plugin
  -   --SecondarySEs=          :    List of SEs for the corresponding parameter of the plugin
  -   --DestinationSEs=        :    List of SEs for the corresponding parameter of the plugin
  -   --FromSEs=               :    List of SEs for the corresponding parameter of the plugin
  -   --RAWStorageElements=    :    List of SEs for the corresponding parameter of the plugin
  -   --ProcessingStorageElements= :    List of SEs for the corresponding parameter of the plugin
  -   --ProcessingPasses=      :    List of processing passes for the DeleteReplicasWhenProcessed plugin
  -   --CleanTransformations   :    (only for DestroyDataset) clean transformations from the files being destroyed
  -   --NumberOfReplicas=      :    Number of copies to create or to remove
  -   --GroupSize=             :    GroupSize parameter for merging (GB) or nb of files
  -   --Debug                  :    Sets a debug flag in the plugin
  -   --Period=                :    minimal period at which a plugin is executed (if instrumented)
  -   --UseRunDestination      :    for RAWReplication plugin, use the already defined run destination as storage
  -   --File=                  : File containing list of LFNs
  -l  --LFNs <value>           : List of LFNs (comma separated)
  -   --Terminal               : LFNs are entered from stdin (--File /dev/stdin)
  -   --LastLFNs               : Use last set of LFNs
  -   --Name=                  :    Give a name to the transformation, only if files are given
  -   --SetInvisible           : Before creating the transformation, set the files in the BKQuery as invisible (default for DeleteDataset)
  -S  --Start                  :    If set, the transformation is set Active and Automatic [False]
  -   --Force                  :    Force transformation to be submitted even if no files found
  -   --Test                   :    Just print out but not submit
  -   --NoFCCheck              :    Suppress the check in FC for removal transformations
  -   --Unique                 :    Refuses to create a transformation with an existing name
  -   --Depth=                 :    Depth in path for replacing /... in processing pass
  -   --Chown=                 :    Give user/group for chown of the directories of files in the FC
  -   --MCVersion=             :    (list of) years; gets active MC processing passes (All for all years)
  -   --ListProcessingPasses   :    Only lists the processing passes

The main options are those that allow to define a BK query and those that define the transformation to be performed. This is achieve by the usage of a large palette of plugins.

Input dataset

This can be achieved through a Bookkeeping query in two ways:

  • --Production <prodList> --FileType <fileTypeList> : you can give a list of productions (comma separated) and a list of file types (comma separated as well). A production range can be given in the form <prod1>:<prod2>. Special values are All or All.XXX where All is a wildcard for matching any file type produced by that production

  • --BKQuery <BKPath> : the BKPath should be in the order /ConfigName/ConfigVersion/Conditions/ProcessingPass/EventType/FileType. Note that you may use "ALL" or "" as path "directory", but for the first two... To be used with care ;-). You can use the path given by dirac-bookkeeping-production-information as a path.

  • --ExceptFileType <fileTypeList>: list of file types not to be considered. This can be used in conjunction with a --FileType ALL.DST option.

  • Additionally some qualifiers for the bookkeeping query can be added: --DQFlags, --Runs, --StartDate, --EndDate, --Visibility. The --Runs option can be a comma separated list of runs or of run ranges in the form <StartRun>:<EndRun>.

  • Examples:
--BKQuery /MC/2010/Beam3500GeV-VeloClosed-MagDown-Nu1/Sim07/Reco06-withTruth/10012004/DST
--Production 7605 --FileType DST
--Production 7605,7610,7624 --FileType All
--Production 7605,7610,7624 --FileType All.DST --Except CALIBRATION.DST --Visibility All
--Production 259:268 --FileType SEMILEPTONIC.DST,RADIATIVE.DST,MINIBIAS.DST,LEPTONIC.MDST,CHARMCONTROL.DST,CHARM.MDST,BHADRON.DST
--BKQuery /certification/test/ALL/ALL/ALL/ALLSTREAMS.DST 

One can also give a list of LFNs, or the name of a file that contains that list:

  • --LFNs <lfnList>: lfnList is a comma separated list of LFNs

  • --File <file>: (list of) files that conatins LFNs (one LFN per line). Note that the line may contain other stuff than the PFN, including a file prefix as the LFN is extracted (to the best)

Plugin

It is mandatory to specify a plugin name using the --plugin <plugin> option. A detailed list of plugins can be found here. Only replication and removal plugins can be used, not processing plugins!

Additional options

  • --Test: does not launch the transformation but checks the BKQuery, prints some information about it and the entered parameters. Existence of SEs is checked. Example
  $ dirac-dms-add-replication -o /DIRAC/Setup=LHCb-Certification --Production 259:268 --FileType SEMILEPTONIC.DST,RADIATIVE.DST,MINIBIAS.DST,LEPTONIC.MDST,CHARMCONTROL.DST,CHARM.MDST,BHADRON.DST --Plugin LHCbMCDSTBroadcastRandom --Test
Transformation Name: Replication-SEMILEPTONIC.DST/RADIATIVE.DST/MINIBIAS.DST/LEPTONIC.MDST/CHARMCONTROL.DST/CHARM.MDST/BHADRON.DST-259/260/261/262/263/264/265/266/267/268
Transformation group: LHCbMCDSTBroadcastRandom
Long description: LHCbMCDSTBroadcastRandom of SEMILEPTONIC.DST,RADIATIVE.DST,MINIBIAS.DST,LEPTONIC.MDST,CHARMCONTROL.DST,CHARM.MDST,BHADRON.DST for production 259,260,261,262,263,264,265,266,267,268
BK Query: {'FileType': ['SEMILEPTONIC.DST', 'RADIATIVE.DST', 'MINIBIAS.DST', 'LEPTONIC.MDST', 'CHARMCONTROL.DST', 'CHARM.MDST', 'BHADRON.DST'], 'ProductionID': ['259', '260', '261', '262', '263', '264', '265', '266', '267', '268'], 'Visibility': 'Yes'}
BKQuery obtained 28 files
/lhcb/certification/test/MINIBIAS.DST/00000266/0000 4
/lhcb/certification/test/RADIATIVE.DST/00000267/0000 4
/lhcb/certification/test/CHARM.MDST/00000261/0000 4
/lhcb/certification/test/BHADRON.DST/00000259/0000 4
/lhcb/certification/test/CHARMCONTROL.DST/00000262/0000 4
/lhcb/certification/test/LEPTONIC.MDST/00000264/0000 4
/lhcb/certification/test/SEMILEPTONIC.DST/00000268/0000 4
Plugin: LHCbMCDSTBroadcastRandom
Parameters: {}
RequestID: 0

  • --Start: sets the production Active and Automatic at creation time (default is to set it New).

  • --Request <requestID>: allows to assign the transformation to a request. Default value is that of the production in the BK query (if specified and if only one production)

  • --NoLFCCheck: do not check that the input files have an entry in the LFC (by default a check is performed)

  • --SetInvisible: set the input files invisible in the bookkeeping. This is the default for the DeleteDataset plugin, such that users do not get files that are only in archive SEs.

  • --Force: forces to create a transformation even if there are no files matching the criteria (yet)

  • --Unique: does not create the transformation if a transformation already exists with the same name (default is to add "-1/2/3/..." until available)

  • --Chown <user>/<group>: before launching the transformation, changes ownership of the directories concerned to <user> and <group>

  • --MCVersion [<year> | All]: interrogates the production requests DB to get all processing passes for a given (list of) years or for all years

Default plugin parameters in CS

The parameters used by transformation plugins can be given when creating the transformation, but default values can be set in the CS. Section /Operations/Defaults/TransformationPlugins contains default values used by all plugins, which can be overwritten by options in /Operations/Defaults/TransformationPlugins/<pluginName> which themselves are superseded by parameters set in the transformation at creation time.

How to debug a plugin while in production

It is possible to switch verbose mode on just by changing a parameter in the CS. In the CS section mentioned above, set the option "Debug = True" and next time the plugin is instantiated, it will be in verbose mode. Don't forget to reset "Debug = False" or remove the option when debugging is over. You can also use the option --Debug when creating the transformation.

Examples of transformation creation

  • Replicate a dataset to CERN-FREEZER:
dirac-dms-add-replication --BK /LHCb/Collision12//RealData/Reco13c//FULL.DST --Run 124134 --Plugin ReplicateDataset --Destination CERN-FREEZER --Start

  • Reduce the number of replicas on disk to 2 for a dataset:
dirac-dms-add-replication --BK /MC/MC10 --Plugin DeleteReplicas --MandatorySE '' --NumberOfReplicas 2 --NoLFCCheck --Start

  • Removing all replicas at a given set of SEs:
dirac-dms-add-replication --BK /MC/MC10 --Plugin DeleteReplicas --MandatorySE '' --FromSE PIC_MC-DST,PIC_MC_M-DST --NoLFCCheck --Start

  • Create the replication transformation for all 2011 RAW data:
dirac-dms-add-replication --BK /LHCb/Collision12//RealData/90000000/RAW --Plugin RawShares --Start

  • Create the replication transformation for all DST streams of a given processing pass but for CHARMTOBESWUM.DST, CALIBRATION.DST and PID.MDST
dirac-dms-add-replication --BK /LHCb/Collision12//RealData/Reco13/Stripping19//ALL.DST,ALL.MDST --Except CHARMTOBESWUM.DST,CALIBRATION.DST,PID.MDST --Plugin LHCbDSTBroadcast --Start

  • Repair for the loss of a list of files (list contained in a file). Note that one can use --Term (instead of --File) and copy/paste a list of LFNs:
dirac-dms-add-replication --Start --Plugin Healing --File lxfsrf15c05.lost

  • Re-replicate a dataset according to the Computing Model, but only 2 disk replicas, no mandatory replica or archive at CERN, after some deletion of replicas for example (or failures):
dirac-dms-add-replication --Plugin LHCbMCDSTBroadcastRandom --Number 2 --Archive1SE '' --Mandatory '' --BK /MC/MC10//Sim01/Trig0x002e002aFlagged/Reco08/Stripping12Flagged//ALL --NoLFCCheck --Start

DMS transformations for 2016 data

The list of transformations required for the 2016 data flow is given here

-- PhilippeCharpentier - 26-Feb-2011

Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2019-03-07 - PhilippeCharpentier
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback