LHCbDirac transformation plugins

Introduction

The LHCbDirac transformation system provides a large set of plugins that are used to create tasks from a list of files and their replicas. There are 3 types of plugins, depending of the type of tasks they are creating: Processing, Replication and Removal plugins.

Plugin parameters

Plugins may use parameters that can either be defined when the transformation is created, or use values defined in the LHCbDirac Configuration System, or use default values present in the code (not recommended).

When the transformation is created from the dirac-dms-add-transformation script, the parameters can be passed as options on the command line. These values are precedence over the CS settings.

CS settings can be found in /Operations/Default/ or in /Operations//. The latter has precedence over the former.

For most plugins, the number of files in a task is limited to MaxFilesPerTask that can be specified as any generic or particular parameter of the transformation.

Processing

Tasks are created for a list of files that are all present at at least one SE. The common set of SEs is assigned as TargetSEs of the task

Plugins

  • RAWProcessing: replaces AtomicRun. This generates tasks from files that are in the SEs defined by the --FromSEs option (default Tier1-BUFFER). Files are grouped to process at least GroupSize GB. All files larger than GroupSize are set in their own task. Files smaller than GroupSize are grouped, which means tasks may have to process close to twice GroupSize GB.
  • ByRun[extensions]: creates tasks for a set of files that all belong to the same run. Additionally the classification can use another BK parameter such as FileType or EventType. That parameter is part of the extensions suffix in the plugin name. The criterion for grouping files in a task can either be the number of files (default) or the data size (Size in extensions), using the parameter GroupSize (expressed in GB).
    • The available extensions are !ByRunSize, ByRunFileType, ByRunEventType, ByRunFileTypeSize, ByRunEventTypeSize
    • The suffix WithFlush is used for flushing the files of a given run when the transformation has received all files correspond to the RAW files of that run, in this case ignoring the GroupSize parameter.
    • The suffix ForceFlush will create tasks according to the other criteria, but will also create a task with the possible files remaining, irrespective whether the whole run is ready.
  • BySize: just groups files by storage elements and in bunches of more than GroupSize GB.
  • LHCbStandard: just groups files in groups of GroupSize files.

Parameters

  • GroupSize: this is the main parameter that indicates either the number of files to be grouped in tasks or the size (in GB) of datasets to be used for creating one task.
  • FromSEs: for RAWProcessing indicates the group of SE to use as targets for the jobs.

Replication

Files are assigned to tasks for being replicated to a set of SEs. These SEs are assigned as TargetSEs of the task.

Plugins

  • RAWReplication: replicates files to two destinations defined by RAWStorageElements (default Tier1-RAW except CERN-RAW) and ProcessingStorageElements (default Tier1-BUFFER). The first one is chosen according to shares defined in the CS between all SEs (excludes CERN-RAW of course), and the second one is chosen according to reconstruction shares between CERN and the site chosen for the RAW replication.
  • ReplicateDataset : replicate a dataset to a set of SEs. The allowed parameters are DestinationSEs (or MandatorySEs), DestinationSEs, NumberOfReplicas. If the number of replicas is smaller than the number of SEs, the SEs are chosen randomly in the list.
  • LHCbDSTBroadcast: replication of files grouped by run. All parameters are allowed.
  • LHCbMCDSTBroadcastRandom: replication of files at randomly selected SEs (no run grouping). All parameters are allowed.
  • ReplicateToLocalSE: replication to SEs selected out of a list that are at a site where the file is already present (e.g. form CERN-BUFFER to CERN-RDST). Note that the SAPath of the two SEs must be different, otherwise no replication takes place! Allowed parameters are DestinationSEs and MinFreeSpace which is a minimum free space (in TB) required for the replication to be scheduled.
  • ReplicateWithAncestors: same as ReplicateToLocalSE but in addition replicate as well the ancestors of the files.
  • ReplicateToRunDestination: replicate files to an SE in the DestinationSEs list located at the run destination site.
  • ArchiveDataset : replicates files to archive storage elements. Allowed parameters are Archive1SEs and Archive2SEs
  • Healing : creates tasks for SEs at which the files are set problematic in the LFC. This allows to repair file losses whenever possible. Files that don't have another available replica are set Problematic in the transformation table. Files that don't have any problematic replica are marked Processed.

Parameters

Parameters are mostly list of SEs, except NumberOfReplicas which is a number... Tasks are only created to SEs at which files are not present.

  • Archive1SEs <list of SEs>: mandatory archive SE. If empty, only one archive is taken out of Archive2SEs.
  • Archive2SEs <list of SEs>: list of archive SEs out of which one is randomly selected.
  • MandatorySEs <list of SEs>: list of non-archive SEs to which replication is mandatory (can be empty).
  • DestinationSEs <list of SEs>: list of non-archive SEs of which the defined number of complementary SEs is randomly selected.
  • NumberOfReplicas  <n>: final number of non-archive replicas requested.

Removal

Files are assigned to tasks for being removed from a set of SEs. These SEs are assigned as TargetSEs of the task.

Plugins

  • DestroyDataset: remove all replicas of the dataset. Must be used with great care obviously, as no replica is kept! No parameters required. It will fail is files are at SEs banned for removal (typically ARCHIVE SEs.
  • RemoveDatasetFromDisk: remove all replicas but those specified as KeepSEs (default: all <site>-ARCHIVE SEs). Can be specified with the --KeepSEs option.
  • RemoveReplicas: remove replicas but on <KeepSEs> (default: <site>-ARCHIVE SEs) and <MandatorySEs> (default: None), keeping at least NumberOfReplicas additional replicas. The choice of SEs were to remove from is random, unless a list of SEs from where to remove replicas is specified using the --FromSEs option. In this case only replicas at these SEs are removed, and NumberOfReplicas is just a minimum to be kept.
  • ReduceReplicas: reduce the number of replicas to NumberOfReplicas. If --FromSEs is specified, it removes preferentially the replicas at these SEs, and chooses randomly the other SEs to remove from is necessary.
  • ReduceReplicasKeepDestination: same as ReduceReplicas but keeps all replicas at the run destination site.
  • RemoveReplicasKeepDestination: same as RemoveReplicas but keeps all replicas at the run destination site.
  • RemoveReplicasWhenProcessed: this plugin checks whether a file has been processed by a (list of) processing passes. If so, it removes the replicas out of the FromSEs list. The check of whether it was processed is done every Period hours (in order not to hammer the Bookkeeping at every loop). The list of processing passes to be considered is passed with the parameter ProcessingPasses that can be either relative to the processing pass of the dataset, or an absolute BK path (i.e. starting with a /, which is useful for validation datasets).
  • RemoveReplicasWithAncestors: remove replicas when processed (as above) but also removes from the same SEs the ancestors of the files.

Parameters

  • --FromSEs <list of SEs>: list of SEs to remove from, provided more than NumberOfReplicas are left.
  • --KeepSEs <list of SEs>: list of SEs where to keep replicas
  • --MandatorySEs <list of SEs>: list of SEs where to keep replicas (replicas are kept on all SEs as well as those specified in KeepSEs.
  • --NumberOfReplicas <n>: number of replicas to keep besides <KeepSEs> and <MandatorySEs>

-- PhilippeCharpentier - 15-Aug-2012

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2019-03-07 - PhilippeCharpentier
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback