First released in
PhEDEx 2.5.3.3. This document covers 2.5.3.3 - 2.5.3.4.
About
This script compares the files on a storage system with TMDB, looking for orphans etc.
Configuration
List of configuration options.
-
--help
Gives detailed help, which is probably more up to date than this wiki, and should therefore be taken as authoritative in case of discrepancy.
-
--db DBCONFIG
The familiar PhEDEx DB contact file/specification, e.g. DBParam:Prod/Reader
-
--lfnlist LFNLIST
This is a list of LFNs to check against TMDB. There should be one LFN per line, anything before the /store is truncated, and if there's a space on the line that is taken to denote the end of the LFN. LFNlists can be gzipped or bzipped to conserve space, these compressed files can be read directly by the script, no need to decompress them first. The LFN list can also be given as '-', in which case the script reads from STDIN instead of a file.
-
--verbose
This can be useful for seeing what's going on. Can be repeated many times to increase verbosity. Currently up to three levels of verbosity exist.
-
--se_name NAME
WARNING: DEPRECATED Use --node instead. This restrict the comparison to only files known to be on a given SE, according to TMDB.
-
--node NAME
This restrict the comparison to only files known to be on a given PhEDEx node, according to TMDB.
-
--injected
This considers all files known to TMDB, regardless of the SE they are stored on.
The script will take the list of files it is given and will extract the list of directories they are in. It will then look up in TMDB the full set of LFNs that match those directories, and compare the two sets of LFNs. So your LFN list should contain the complete contents of directories. The assumption is that looking for orphans is more likely to proceed by directories than by datasets or blocks, so complete directories are a reasonable starting point.
The output is a list of files known to the SE but not to TMDB. The same list is then summarised by directory, where purely numerical end-components of the directory name are grouped together (i.e. /path/001 and /path/002 are summarised under /path). This hopefully makes it easier to understand which files are missing from a block/dataset viewpoint.
This script does not perform any lookup in the SE, it only interacts with TMDB. It should therefore be rather fast, even for large amounts of data.
The
--injected,
--node and
--se_name options are mutually exclusive, and at least one of them is obligatory.
Examples
This example works for me at CERN. Adjust the path to your DBParam file if you want to try it1. N.B. This LFN list file deliberately has entries missing, plus one or two fake ones, just for good measure.
StorageConsistencyCheck --db ~/private/DBParam:Prod/Reader --lfnlist /afs/cern.ch/user/w/wildish/public/Spring07-ZToMuMu-1532.txt
which produces the following output:
2007-07-18 12:39:58: StorageConsistencyCheck[32045]: (re)connecting to database
Got 751 LFNs in 1 directories
Orphaned file summary:
/store/mc/2007/5/9/Spring07-ZToMuMu-1532/0001/0C0858A4-B2FF-DB11-A6D5-00304885AA50.root
/store/mc/2007/5/9/Spring07-ZToMuMu-1532/0001/8A685768-89FF-DB11-9BFE-000E0C3F1A2A.root
/store/mc/2007/5/9/Spring07-ZToMuMu-1532/0001/D8FD949A-8BFF-DB11-845D-0030485627C4.root
/store/mc/2007/5/9/Spring07-ZToMuMu-1532/0001/F6814D84-88FF-DB11-AD5F-003048562902.root
/store/mc/2007/5/9/Spring07-ZToMuMu-1532/0002/1445E72C-CA03-DC11-8F70-000E0C4DE5B3.root
/store/mc/2007/5/9/Spring07-ZToMuMu-1532/0002/QRSTUVWX-D90A-DC11-A86F-00304885B4A8.root
Final score: 6 files left...
Directory summary:
SE/TMDB diff : Directory
751/745 6 : /store/mc/2007/5/9/Spring07-ZToMuMu-1532
Finished...