First released in PhEDEx This document covers -


This script compares the files on a storage system with TMDB, looking for orphans etc.


List of configuration options.
  • --help
    Gives detailed help, which is probably more up to date than this wiki, and should therefore be taken as authoritative in case of discrepancy.
  • --db DBCONFIG
    The familiar PhEDEx DB contact file/specification, e.g. DBParam:Prod/Reader
  • --lfnlist LFNLIST
    This is a list of LFNs to check against TMDB. There should be one LFN per line, anything before the /store is truncated, and if there's a space on the line that is taken to denote the end of the LFN. LFNlists can be gzipped or bzipped to conserve space, these compressed files can be read directly by the script, no need to decompress them first. The LFN list can also be given as '-', in which case the script reads from STDIN instead of a file.
  • --verbose
    This can be useful for seeing what's going on. Can be repeated many times to increase verbosity. Currently up to three levels of verbosity exist.
  • --se_name NAME
    WARNING: DEPRECATED Use --node instead. This restrict the comparison to only files known to be on a given SE, according to TMDB.
  • --node NAME
    This restrict the comparison to only files known to be on a given PhEDEx node, according to TMDB.
  • --injected
    This considers all files known to TMDB, regardless of the SE they are stored on.

The script will take the list of files it is given and will extract the list of directories they are in. It will then look up in TMDB the full set of LFNs that match those directories, and compare the two sets of LFNs. So your LFN list should contain the complete contents of directories. The assumption is that looking for orphans is more likely to proceed by directories than by datasets or blocks, so complete directories are a reasonable starting point.

The output is a list of files known to the SE but not to TMDB. The same list is then summarised by directory, where purely numerical end-components of the directory name are grouped together (i.e. /path/001 and /path/002 are summarised under /path). This hopefully makes it easier to understand which files are missing from a block/dataset viewpoint.

This script does not perform any lookup in the SE, it only interacts with TMDB. It should therefore be rather fast, even for large amounts of data.

The --injected, --node and --se_name options are mutually exclusive, and at least one of them is obligatory.


This example works for me at CERN. Adjust the path to your DBParam file if you want to try it1. N.B. This LFN list file deliberately has entries missing, plus one or two fake ones, just for good measure.
StorageConsistencyCheck --db ~/private/DBParam:Prod/Reader --lfnlist /afs/

which produces the following output:

2007-07-18 12:39:58: StorageConsistencyCheck[32045]: (re)connecting to database
Got      751 LFNs in     1 directories
Orphaned file summary:
Final score: 6 files left...

Directory summary:
       SE/TMDB  diff : Directory
      751/745        6 : /store/mc/2007/5/9/Spring07-ZToMuMu-1532

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2016-01-27 - NicoloMagini
No permission to view CMS.WebLeftBar
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMS All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback