LHCbDirac Data Management commands

This page describes the usage of the LHCbDirac data management (DMS) commands. As they interface may vary with time and introduction of new facilities, note that the --help option can be used to get the syntax.

The LHCbDirac DMS relies on Logical File Names (LFNs) that are always of the form /lhcb/....., and on LHCbDirac Storage Elements (SE) for which the first part always indicates the site where it is located. The instance of a file at an SE is called a replica. Note that some replicas may be inactive, i.e. cannot be used for accessing files. This is always the case for archive replicas (SE name of the form <Site>-ARCHIVE), furthermore some SEs may be temporarily unavailable or files not accessible (e.g. in case a disk server is down at a site).

A list of active SEs can be obtained with dirac-dms-show-se-status.

A replica has a unique SRM URL (SURL) that is used to query the Storage Resource Manager (SRM) for getting informations on the replica or getting a Transport URL (tURL), a.k.a. Physical File name (PFN) to be used by Gaudi.

Common command options

The DM scripts are based on a shared command line options parsing tool (DMScript). These options are split into sets for different purposes:

List of files

The parsing of file names is trying to extract LFNs out of any string that contains an LFN (e.g. SURL, tURL, output of other commands, Gaudi options). It works in most cases, as long as there is a separator after the LFN. In the case the LFNs are in a file or from the terminal, lines not containing an LFN are ignored.

Note that if the command is piped with the output of another command, the list of LFNs is supposed to come from the pipe unless otherwise specified. This allows chaining DMS or other commands through pipes in a handy way.

  -    --File=           : File containing list of LFNs 
  -l:  --LFNs=           : List of LFNs (comma separated) 
  -    --Terminal        : LFNs are entered from stdin (--File /dev/stdin)

Here is an example of piping commands:

[localhost, Patch] ~ $ dirac-transformation-debug 24847 --Status Processed --Info files | dirac-bookkeeping-get-file-descendants --Prod 24847 | grep full.dst | dirac-transformation-debug 24182 --Info tasks
 Transformation 24182 (Active) : Request_11636_DataStripping_Stripping20r0p1_EventType_90000000_FULL_1.xml of type DataStripping (plugin ByRunWithFlush, GroupSize: 2) in Stripping20r0p1
BKQuery: {'StartRun': 111181L, 'ConfigName': 'LHCb', 'EndRun': 133785L, 'EventType': 90000000L, 'FileType': 'FULL.DST', 'ProcessingPass': 'Real Data/Reco14', 'Visible': 'Yes', 'DataQualityFlag': 'OK', 'ConfigVersion': 'Collision12', 'DataTakingConditions': 'Beam4000GeV-VeloClosed-MagDown'}

5 files found

/lhcb/LHCb/Collision12/FULL.DST/00024847/0000/00024847_00000001_1.full.dst - Run: 115327 - Status: Processed - UsedSE: CERN-BUFFER - ErrorCount: 0
/lhcb/LHCb/Collision12/FULL.DST/00024847/0000/00024847_00000094_1.full.dst - Run: 132338 - Status: Processed - UsedSE: CNAF-BUFFER - ErrorCount: 0
/lhcb/LHCb/Collision12/FULL.DST/00024847/0000/00024847_00000020_1.full.dst - Run: 124129 - Status: Processed - UsedSE: SARA-BUFFER - ErrorCount: 0
/lhcb/LHCb/Collision12/FULL.DST/00024847/0000/00024847_00000069_1.full.dst - Run: 124915 - Status: Processed - UsedSE: SARA-BUFFER - ErrorCount: 0
/lhcb/LHCb/Collision12/FULL.DST/00024847/0000/00024847_00000031_1.full.dst - Run: 124214 - Status: Processed - UsedSE: SARA-BUFFER - ErrorCount: 0
…….

SE and sites

  -g:  --Sites=          :   Sites to consider [ALL] (comma separated list) 
  -S:  --SEs=            :   SEs to consider [ALL] (comma separated list) 

For Tier1s, the name of the site can be a short name (CERN, CNAF, GRIDKA, IN2P3, NIKHEF, PIC, RAL, SARA). When the command requires a list of SEs and a Site is specified, all SEs at that site are considered.

Storage Elements can also be a StorageElementGroup if defined in the Configuration of LHCbDirac, for example Tier1-DST, Tier1-BUFFER etc...

Bookkeeping query switches

  -P:  --Productions=    :    Production ID to search (comma separated list) 
  -f:  --FileType=       :    File type (comma separated list, to be used with --Production) [All] 
  -    --ExceptFileType= :    Exclude the (list of) file types when all are requested 
  -B:  --BKQuery=        :    Bookkeeping query path 
  -r:  --Runs=           :    Run or range of runs (r1:r2) 
  -    --DQFlags=        :    DQ flag used in query 
  -    --StartDate=      :    Start date for the BK query 
  -    --EndDate=        :    End date for the BK query 
  -    --Invisible       :    See also invisible files 

The --BKQuery option consists of a path constructed as follows:

/DataType/Activity/Conditions/ProcessingPass/EventType/FileType

  • DataType : this is either the Online partition for real data (in particular LHCb) or MC for simulated data (mandatory)
  • Activity : this is the generic activity, for example Collision12 or MC2012 (mandatory)
  • Conditions : data taking or simulation conditions, for example Beam4000GeV-VeloClosed-MagDown (if omitted: any)
  • ProcessingPass : this is the level of processing, for example RealData/Reco14/Stripping20 (if omitted: any)
  • EventType : numeric event type id (or comma-separated list of ids), for example 90000000 (if omitted: any for MC, 90000000 for real data)
  • FileType : file type of comma-separated list of file types
    • Omitted : all file types
    • ALL.DST or ALL.MDST : all DST or MDST files except those specified in --ExceptFileType
      • By default --ExceptFileTypes are all xxx.ETC files, all xxxHIST files
    • ALL : all file types except those specified in --ExceptFileType
    • ALL.HIST : all histogram file types

Some examples:

  • /LHCb/Collision12//RealData/Reco14 : all files from Reco14 of 2012 real data
  • /LHCb/Collision12//RealData/Reco14/Stripping20 : all files for Reco14/Stripping20 of 2012 data
  • /LHCb/Collision12//RealData/Reco14/Stripping20//ALL.MDST : all µDST files
  • /LHCb/Collision12//RealData/Reco14/Stripping20//ALL.MDST with --Except PID.MDST: all µDST files except PID.MDST
  • /LHCb/Collision12////BHADRON.MDST : all existing BHADRON.DST files from 2012 data (i.e. all processings and strippings)
  • /LHCb/Collision12//RealData/91000000/RAW : all Express stream RAW data of 2012
  • /MC/2012/Beam4000GeV-2012-MagDown-Nu2.5-Pythia6/Sim08/Digi13/Trig0x409f0045/Reco14/Stripping20NoPrescalingFlagged/30000000,13164417,15112001,33102100,11166104,13164061,13104241,12143401,13164213,13164215,13164217,10000000,23960000,11144001,12163011,12163021 : a long list of event types of MC

Information on files

dirac-dms-lfn-replicas

Shows where files have replicas and reports the SURL registered in the file catalog

$ dirac-dms-lfn-replicas --help

    Show replicas for a (set of) LFNs

Usage:
  dirac-dms-lfn-replicas [option|cfgfile] [<LFN>] [<LFN>...] 
General options: 
  -o:  --option=         : Option=value to add 
  -s:  --section=        : Set base section for relative parsed options 
  -c:  --cert=           : Use server certificate to connect to Core Services 
  -d   --debug           : Set debug mode (-dd is extra debug) 
  -h   --help            : Shows this help 
 
Options: 
  -    --File=           : File containing list of LFNs 
  -l:  --LFNs=           : List of LFNs (comma separated) 
  -    --Terminal        : LFNs are entered from stdin (--File /dev/stdin)
  -a   --All             :   Also show inactive replicas 

dirac-dms-lfn-metadata

Shows metadata of a logical file, i.e. not related to any physical replica.

[localhost, Patch] ~ $ dirac-dms-lfn-metadata --help
  Get the metadata of a (list of) LFNs from the FC
Usage:
  dirac-dms-lfn-metadata [option|cfgfile] ... [LFN[,LFN2[,LFN3...]]]
Arguments:
  LFN:      Logical File Name or file containing LFNs 
General options: 
  -o:  --option=         : Option=value to add 
  -s:  --section=        : Set base section for relative parsed options 
  -c:  --cert=           : Use server certificate to connect to Core Services 
  -d   --debug           : Set debug mode (-dd is extra debug) 
  -h   --help            : Shows this help 
 
Options: 
  -    --File=           : File containing list of LFNs 
  -l:  --LFNs=           : List of LFNs (comma separated) 
  -    --Terminal        : LFNs are entered from stdin (--File /dev/stdin) 
  -g:  --Sites=          :   Sites to consider [ALL] (comma separated list) 
  -S:  --SEs=            :   SEs to consider [ALL] (comma separated list) 

dirac-dms-pfn-metadata

Shows the metadata of physical replica(s), possibly at selected SEs/Sites.

[localhost, Patch] ~ $ dirac-dms-pfn-metadata  --help

  Gets the metadata of a (list of) LHCb LFNs/PFNs given a valid DIRAC SE.
  Only the LFN contained in the PFN is considered, unlike the DIRAC similar script

Usage:
  dirac-dms-pfn-metadata [option|cfgfile] ... [URL[,URL2[,URL3...]]] SE[ SE2...]
Arguments:
  URL:      Logical/Physical File Name or file containing URLs
  SE:       Valid DIRAC SE 
General options: 
  -o:  --option=         : Option=value to add 
  -s:  --section=        : Set base section for relative parsed options 
  -c:  --cert=           : Use server certificate to connect to Core Services 
  -d   --debug           : Set debug mode (-dd is extra debug) 
  -h   --help            : Shows this help 
 
Options: 
  -    --File=           : File containing list of LFNs 
  -l:  --LFNs=           : List of LFNs (comma separated) 
  -    --Terminal        : LFNs are entered from stdin (--File /dev/stdin) 
  -g:  --Sites=          :   Sites to consider [ALL] (comma separated list) 
  -S:  --SEs=            :   SEs to consider [ALL] (comma separated list) 
  -    --Check           :    Checks the PFN metadata vs LFN metadata 
  -    --Exists          :    Only reports if the file exists 

For backward compatibility the list of URLs and SEs can be provided as positional arguments, but it is recommended to use option switches. If a site is selected, the metadata of all replicas at that site are reported. If no SE and no sites are specified, the metadata of all replicas is reported.

dirac-dms-user-lfns

This command allows a user to get a list of all files they own on the Grid, within directories or with data/time criteria.

$ dirac-dms-user-lfns --help
Get the list of all the user files.
Usage:
  dirac-dms-user-lfns [option|cfgfile] ... 
General options: 
  -o:  --option=         : Option=value to add 
  -s:  --section=        : Set base section for relative parsed options 
  -c:  --cert=           : Use server certificate to connect to Core Services 
  -d   --debug           : Set debug mode (-dd is extra debug) 
  -h   --help            : Shows this help 
 
Options: 
  -D:  --Days=           : Match files older than number of days [0] 
  -M:  --Months=         : Match files older than number of months [0] 
  -Y:  --Years=          : Match files older than number of years [0] 
  -w:  --Wildcard=       : Wildcard for matching filenames [*] 
  -b:  --BaseDir=        : Base directory to begin search (default /[vo]/user/[initial]/[username]) 
  -e   --EmptyDirs       : Create a list of empty directories 

Accessing files

dirac-dms-lfn-accessURL

This command returns tURLs for files at a given SE or site. This tURL is optimised for fast and efficient access to the data. Note that any other tURL may either be inefficient or its usage may be discontinued at some point. In case the file is on tape storage, the tURL can currently only be obtained once the file has been staged, therefore the command may in this case timeout.

[localhost, Patch] ~ $ dirac-dms-lfn-accessURL --help
  Retrieve an access URL for an LFN replica given a valid DIRAC SE.
Usage:
  dirac-dms-lfn-accessURL [option|cfgfile] ... [LFN[,LFN2[,LFN3...]]] [SE[,SE2...]]
Arguments:
  LFN:      Logical File Name or file containing LFNs
  SE:       Valid DIRAC SE 
General options: 
  -o:  --option=         : Option=value to add 
  -s:  --section=        : Set base section for relative parsed options 
  -c:  --cert=           : Use server certificate to connect to Core Services 
  -d   --debug           : Set debug mode (-dd is extra debug) 
  -h   --help            : Shows this help 
 
Options: 
  -    --File=           : File containing list of LFNs 
  -l:  --LFNs=           : List of LFNs (comma separated) 
  -    --Terminal        : LFNs are entered from stdin (--File /dev/stdin) 
  -g:  --Sites=          :   Sites to consider [ALL] (comma separated list) 
  -S:  --SEs=            :   SEs to consider [ALL] (comma separated list) 

Positional arguments are supported for backward compatibility, but option switches are highly recommended. If no SE is specified, all allowed SEs are used.

dirac-dms-get-file

This command copies a file or a set of files to the local directory. The options of this command are going to evolve soon, giving additional possibilities.

[localhost, Certif] ~ $ dirac-dms-get-file --help

  Retrieve a single file or list of files from Grid storage to the current directory.

Usage:
  dirac-dms-get-file [option|cfgfile] [<LFN>] [<LFN>...] 
General options: 
  -o:  --option=         : Option=value to add 
  -s:  --section=        : Set base section for relative parsed options 
  -c:  --cert=           : Use server certificate to connect to Core Services 
  -d   --debug           : Set debug mode (-dd is extra debug) 
  -h   --help            : Shows this help 
 
Options: 
  -    --File=           : File containing list of LFNs 
  -l:  --LFNs=           : List of LFNs (comma separated) 
  -    --Terminal        : LFNs are entered from stdin (--File /dev/stdin) 
  -D:  --Directory=      :    Directory to download to (default = /home/phicharp) 

Replicating files

dirac-dms-replicate-lfn

Replicates an LFN to a (list of) SE. It requires write access to the FC directory.

[localhost, Certif] ~ $ dirac-dms-replicate-lfn --help

  Replicate a (list of) existing LFN(s) to (set of) Storage Element(s)

Usage:
  dirac-dms-replicate-lfn [option|cfgfile] ...  [LFN1[,LFN2,[...]]] Dest[,Dest2[,...]] [Source [Cache]]
Arguments:
  Dest:     Valid DIRAC SE(s)
  Source:   Valid DIRAC SE
  Cache:    Local directory to be used as cache 
General options: 
  -o:  --option=         : Option=value to add 
  -s:  --section=        : Set base section for relative parsed options 
  -c:  --cert=           : Use server certificate to connect to Core Services 
  -d   --debug           : Set debug mode (-dd is extra debug) 
  -h   --help            : Shows this help 
 
Options: 
  -    --File=           : File containing list of LFNs 
  -l:  --LFNs=           : List of LFNs (comma separated) 
  -    --Terminal        : LFNs are entered from stdin (--File /dev/stdin) 

dirac-dms-add-file

Uploads to a Grid storage element and registers in the file catalog a local file. In order to properly register the GUID of the file as the one contained in the file itself, it is mandatory to setup the ROOT environment as well. Easiest way is:

$ lb-run LHCbDirac/latest [bash --norc|tcsh|...]

If you are on lxplus or on a machine that has access to EOS or Castor at CERN, the "local file" can be a Castor file (in the form /castor/cern.ch/...) or an EOS file (in the form /eos/lhcb/...), of course not yet a Grid file.

The LFN for a user file must be in the form /lhcb/user/<initial>/<username>/whatever-you-like-as-a-path.

Command usage:

dirac-dms-add-file --help

Upload a file to the grid storage and register it in the File Catalog
Usage:
  dirac-dms-add-file [option|cfgfile] ... LFN Path SE [GUID]
Arguments:
  LFN:      Logical File Name
  Path:     Local path of the file
  SE:       DIRAC Storage Element
  GUID:     GUID to use in the registration (optional)

 ++ OR ++

Usage:
  dirac-dms-add-file [option|cfgfile] ... LocalFile
Arguments:
  LocalFile: Path to local file containing all the above, i.e.:
  lfn1 localfile1 SE [GUID1]
  lfn2 localfile2 SE [GUID2] 

General options: 
  -o  --option <value>         : Option=value to add 
  -s  --section <value>        : Set base section for relative parsed options 
  -c  --cert <value>           : Use server certificate to connect to Core Services 
  -d  --debug                  : Set debug mode (-dd is extra debug) 
  -h  --help                   : Shows this help 

GUID-related issues

Gaudi requires when reading a file through a catalog that the GUID in the file catalog matches the GUID (a.k.a. FID) stored in the file. If the file was uploaded to the Grid and registered without care, it may be that this is not the case. It can be checked and eventually fixed using the following commands:

dirac-lhcb-get-root-guid

This script will read the file GUID from the actual replica.

[localhost, PatchFull] ~ $ dirac-lhcb-get-root-guid --help

Get the GUID of a (set of) ROOT file
The file can be either local, an LFN or an xrootd URL (root:...)

Usage:
  dirac-lhcb-get-root-guid [option|cfgfile] file1 [file2 ...] 

General options: 
  -o  --option <value>         : Option=value to add 
  -s  --section <value>        : Set base section for relative parsed options 
  -c  --cert <value>           : Use server certificate to connect to Core Services 
  -d  --debug                  : Set debug mode (-dd is extra debug) 
  -h  --help                   : Shows this help 

dirac-lhcb-fix-file-guid

This script will read the file GUID from the actual file and update the file catalog if needed with the correct GUID.

[localhost, PatchFull] ~ $ dirac-lhcb-fix-file-guid --help

Fix incorrect file GUIDs

Usage:
  dirac-lhcb-fix-file-guid [option|cfgfile] [OldLFN] 

General options: 
  -o  --option <value>         : Option=value to add 
  -s  --section <value>        : Set base section for relative parsed options 
  -c  --cert <value>           : Use server certificate to connect to Core Services 
  -d  --debug                  : Set debug mode (-dd is extra debug) 
  -h  --help                   : Shows this help 
 
Options: 

  -f  --OldLFN <value>         : LFN of existing file to be fixed. 

  -n  --NewLFN <value>         : Optional: specify a new LFN for the file (retaining the existing file with incorrect GUID). 

  -D  --Directory <value>      : Optional: directory to download file (defaults to TMPDIR then PWD). 

  -k  --Keep                   : Optional: specify this switch to retain the local copy of the downloaded file 

  -m  --SafeMode               : Optional: specify this switch to run the script in safe mode (will check the GUIDs only) 

Removing replicas or files

dirac-dms-remove-replicas

Allows to remove a replica at a given SE. This is only allowed for authorised files of course. Note also that it is not possible to remove the last replica of a file. Use dirac-dms-remove-files instead.

[localhost, Certif] ~ $ dirac-dms-remove-replicas --help

  Remove the given file replica or a list of file replicas from the File Catalog
  and from the storage.

  Usage:
     dirac-dms-remove-replicas <LFN | fileContainingLFNs> SE [SE]
   
General options: 
  -o:  --option=         : Option=value to add 
  -s:  --section=        : Set base section for relative parsed options 
  -c:  --cert=           : Use server certificate to connect to Core Services 
  -d   --debug           : Set debug mode (-dd is extra debug) 
  -h   --help            : Shows this help 
 
Options: 
  -    --File=           : File containing list of LFNs 
  -l:  --LFNs=           : List of LFNs (comma separated) 
  -    --Terminal        : LFNs are entered from stdin (--File /dev/stdin) 
  -g:  --Sites=          :   Sites to consider [ALL] (comma separated list) 
  -S:  --SEs=            :   SEs to consider [ALL] (comma separated list) 
  -v   --Verbose         :  use this option for verbose output [False] 
  -n   --NoLFC           :  use this option to force the removal from storage of replicas not in FC 

The option --NoLFC allows to remove replicas even though they are not present in the LFC (dark data). In this case an attempt is made to remove the file from physical storage only.

dirac-dms-remove-files

Allows to remove all replicas of a (list of) file(s) as well as the entry in the file catalog.

 [localhost, Certif] ~ $ dirac-dms-remove-files --help

  Remove the given file or a list of files from the File Catalog and from the storage

Usage:
  dirac-dms-remove-files [option|cfgfile] [<LFN>] [<LFN>...] 
General options: 
  -o:  --option=         : Option=value to add 
  -s:  --section=        : Set base section for relative parsed options 
  -c:  --cert=           : Use server certificate to connect to Core Services 
  -d   --debug           : Set debug mode (-dd is extra debug) 
  -h   --help            : Shows this help 
 
Options: 
  -P:  --Productions=    :    Production ID to search (comma separated list) 
  -f:  --FileType=       :    File type (comma separated list, to be used with --Production) [All] 
  -    --ExceptFileType= :    Exclude the (list of) file types when all are requested 
  -B:  --BKQuery=        :    Bookkeeping query path 
  -r:  --Runs=           :    Run or range of runs (r1:r2) 
  -    --DQFlags=        :    DQ flag used in query 
  -    --StartDate=      :    Start date for the BK query 
  -    --EndDate=        :    End date for the BK query 
  -    --Invisible       :    See also invisible files 
  -    --File=           : File containing list of LFNs 
  -l:  --LFNs=           : List of LFNs (comma separated) 
  -    --Terminal        : LFNs are entered from stdin (--File /dev/stdin) 
  -    --SetProcessed    :   Forced to set Removed the files in status Processed (default:not reset) 

The list of files may be the result of a Bookkeeping query. This is only allowed for data managers and should be used with great care.

If files are used by some transformations, they are set to "Removed" status unless their current status is "Processed". "Processed" files can be forced to be set "Removed" using the switch =--SetProcessed+.

Bookkeeping commands

They can be found here.


This topic: LHCb > WebHome > LHCbComputing > LHCbDiracCLI
Topic revision: r10 - 2016-08-27 - PhilippeCharpentier
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback