LHCbDirac transformation plugins
Introduction
The LHCbDirac transformation system provides a large set of plugins that are used to create tasks from a list of files and their replicas. There are 3 types of plugins, depending of the type of tasks they are creating:
Processing, Replication and Removal plugins.
Plugin parameters
Plugins may use parameters that can either be defined when the transformation is created, or use values defined in the LHCbDirac Configuration System, or use default values present in the code (not recommended).
When the transformation is created from the
dirac-dms-add-transformation
script, the parameters can be passed as options on the command line. These values are precedence over the CS settings.
CS settings can be found in
/Operations/Default/
or in
/Operations//
. The latter has precedence over the former.
For most plugins, the number of files in a task is limited to
MaxFilesPerTask
that can be specified as any generic or particular parameter of the transformation.
Processing
Tasks are created for a list of files that are all present at at least one SE. The common set of SEs is assigned as
TargetSEs
of the task
Plugins
-
RAWProcessing
: replaces AtomicRun. This generates tasks from files that are in the SEs defined by the --FromSEs
option (default Tier1-Buffer
). Files are grouped to process at least GroupSize
GB. All files larger than GroupSize
are set in their own task. Files smaller than GroupSize
are grouped, which means tasks may have to process close to twice GroupSize
GB.
-
ByRun[extensions]
: creates tasks for a set of files that all belong to the same run. Additionally the classification can use another BK parameter such as FileType
or EventType
. That parameter is part of the extensions
suffix in the plugin name. The criterion for grouping files in a task can either be the number of files (default) or the data size (Size
in extensions
), using the parameter GroupSize
(expressed in GB).
- The available extensions are
!ByRunSize, ByRunFileType, ByRunEventType, ByRunFileTypeSize, ByRunEventTypeSize
- The suffix
WithFlush
is used for flushing the files of a given run when the transformation has received all files correspond to the RAW files of that run, in this case ignoring the GroupSize
parameter.
- The suffix
ForceFlush
will create tasks according to the other criteria, but will also create a task with the possible files remaining, irrespective whether the whole run is ready.
-
BySize
: just groups files by storage elements and in bunches of more than GroupSize
GB.
-
LHCbStandard
: just groups files in groups of GroupSize
files.
Parameters
-
GroupSize
: this is the main parameter that indicates either the number of files to be grouped in tasks or the size (in GB) of datasets to be used for creating one task.
-
FromSEs
: for RAWProcessing
indicates the group of SE to use as targets for the jobs.
Replication
Files are assigned to tasks for being replicated to a set of SEs. These SEs are assigned as
TargetSEs
of the task.
Plugins
-
RAWReplication
: replicates files to two destinations defined by RAWStorageElements
(default Tier1-RAW
except CERN-RAW
) and ProcessingStorageElements
(default Tier1-Buffer
). The first one is chosen according to shares defined in the CS between all SEs (excludes CERN-RAW of course), and the second one is chosen according to reconstruction shares between CERN and the site chosen for the RAW replication.
-
ReplicateDataset
: replicate a dataset to a set of SEs. The allowed parameters are DestinationSEs
(or MandatorySEs
), DestinationSEs
, NumberOfReplicas
. If the number of replicas is smaller than the number of SEs, the SEs are chosen randomly in the list.
-
LHCbDSTBroadcast
: replication of files grouped by run. All parameters are allowed.
-
LHCbMCDSTBroadcastRandom
: replication of files at randomly selected SEs (no run grouping). All parameters are allowed. Defaults in the LHCb CS are: 1 archive and 2 disk SEs at Tier1s + Tier2Ds.
-
LHCbWGBroadcastRandom
: replication of files at randomly selected SEs (no run grouping), i.e. similar to above. All parameters are allowed. Defaults in the LHCb CS are different from LHCbMCDSTBroadcastRandom
, namely no archive and 2 disk replicas.
-
ReplicateToLocalSE
: replication to SEs selected out of a list that are at a site where the file is already present (e.g. form CERN-BUFFER to CERN-RDST). Note that the SAPath of the two SEs must be different, otherwise no replication takes place! Allowed parameters are DestinationSEs
and MinFreeSpace
which is a minimum free space (in TB) required for the replication to be scheduled.
-
ReplicateWithAncestors
: same as ReplicateToLocalSE but in addition replicate as well the ancestors of the files.
-
ReplicateToRunDestination
: replicate files to an SE in the DestinationSEs
list located at the run destination site.
-
ArchiveDataset
: replicates files to archive storage elements. Allowed parameter is ArchiveSEs
-
Healing
: creates tasks for SEs at which the files are set problematic in the LFC. This allows to repair file losses whenever possible. Files that don't have another available replica are set Problematic in the transformation table. Files that don't have any problematic replica are marked Processed.
Parameters
Parameters are mostly list of SEs, except
NumberOfReplicas
which is a number... Tasks are only created to SEs at which files are not present.
-
ArchiveSEs <list of SEs>
: list of archive SEs out of which one is randomly selected.
-
MandatorySEs <list of SEs>
: list of non-archive SEs to which replication is mandatory (can be empty).
-
DestinationSEs <list of SEs>
: list of non-archive SEs of which the defined number of complementary SEs is randomly selected.
-
NumberOfReplicas <n>
: final number of non-archive (i.e. disk) replicas requested.
Removal
Files are assigned to tasks for being removed from a set of SEs. These SEs are assigned as
TargetSEs
of the task.
Plugins
-
DestroyDataset
: remove all replicas of the dataset. Must be used with great care obviously, as no replica is kept! No parameters required. It will fail is files are at SEs banned for removal (typically ARCHIVE
SEs.
-
RemoveDatasetFromDisk
: remove all replicas but those specified as KeepSEs
(default: all <site>-ARCHIVE
SEs). Can be specified with the --KeepSEs
option.
-
RemoveReplicas
: remove replicas but on <KeepSEs>
(default: <site>-ARCHIVE
SEs) and <MandatorySEs>
(default: None), keeping at least NumberOfReplicas
additional replicas. The choice of SEs were to remove from is random, unless a list of SEs from where to remove replicas is specified using the --FromSEs
option. In this case only replicas at these SEs are removed, and NumberOfReplicas
is just a minimum to be kept.
-
ReduceReplicas
: reduce the number of replicas to NumberOfReplicas
. If --FromSEs
is specified, it removes preferentially the replicas at these SEs, and chooses randomly the other SEs to remove from is necessary.
-
ReduceReplicasKeepDestination
: same as ReduceReplicas
but keeps all replicas at the run destination site.
-
RemoveReplicasKeepDestination
: same as RemoveReplicas
but keeps all replicas at the run destination site.
-
RemoveReplicasWhenProcessed
: this plugin checks whether a file has been processed by a (list of) processing passes. If so, it removes the replicas out of the FromSEs
list. The check of whether it was processed is done every Period
hours (in order not to hammer the Bookkeeping at every loop). The list of processing passes to be considered is passed with the parameter ProcessingPasses
that can be either relative to the processing pass of the dataset, or an absolute BK path (i.e. starting with a /, which is useful for validation datasets).
-
RemoveReplicasWithAncestors
: remove replicas when processed (as above) but also removes from the same SEs the ancestors of the files.
Parameters
-
--FromSEs <list of SEs>
: list of SEs to remove from, provided more than NumberOfReplicas
are left.
-
--KeepSEs <list of SEs>
: list of SEs where to keep replicas
-
--MandatorySEs <list of SEs>
: list of SEs where to keep replicas (replicas are kept on all SEs as well as those specified in KeepSEs
.
-
--NumberOfReplicas <n>
: number of replicas to keep besides <KeepSEs>
and <MandatorySEs>
--
PhilippeCharpentier - 15-Aug-2012