DBS replica tools

Aim

A CRAB nice characteristic is the possibility to easily register datasets (edm-compatible) to local DBS. Anyway, migrating these data e.g. to the T3 can be not simple, and also the registration of the new location in DBS.

Here two new tools are provided for accomplish to those purposes. One for transferring and registering data, the other one to invalidate and/or delete data.

If you need to transfer data not registered in DBS, you can use LSDataReplica

Current release

You can retrieve the updated version from CVS with:

cvs co -rV00-01 UserCode/leo/Utilities/dbs_utils.py
cvs co -rV00-03 UserCode/leo/Utilities/dbs_InvalidateAndDeleteDataset.py
cvs co -rV00-04 UserCode/leo/Utilities/dbs_transferRegister.py
cvs co -rV01-01-06  UserCode/leo/Utilities/data_replica.py

dbs_transferRegister.py

This tool relies on data replica for file transfers and on DBSAPI calls (see e.g. here) for block registration. LSDataReplica can be used to transfer files not registered to DBS.

The usage is straightforward:

python dbs_transferRegister.py --to-site T3_XX_YYYY dataset_name

The --help option gives all the needed information:

Usage: /swshare/psit3/bin/dbs_transferRegister.py [--dbs=ph01|ph02] --to-site=TX_YY_SITE dataset

Options:
  -h, --help            show this help message and exit
  --dbs=DBS             DBS instance, can be: ph01, ph02 (default)
  --to-site=TO_SITE     Destination site.
  --whitelist=WHITELIST
                        Sets up a comma-separated White-list (preferred
                        sites). Transfers will start from these sites. Sites
                        not included in the whitelist will be not excluded.
  --blacklist=BLACKLIST
                        Sets up a comma-separated Black-list (excluded sites).
  --retransfer          Do not skip already transferred block.
  --copy-tool=TOOL      Selects the copy tool to be used (lcg-cp or srmcp). By
                        default lcg-cp is used
  --debug               Verbose mode
  --delete              If file exists at destination and its size is
                        _smaller_ than the source one, delete it. WARNING:
                        destination files are checked only for SRM endpoints.

The tool will find the blocks composing the dataset and transfer/register them separately. In case of transfer failure the block is not registered. Dataset source is retrieved from DBS.

The usage of data_replica ensures the support also for checks on file existing files on destination:

  • if the sizes are equal, no error is reported, just a warning (exit code 0)
  • if the destination file is bigger than the source file, an error is raised (user should check and perform actions)
  • if the destination file is smaller than the source file, the former is deleted and the transfer begins

dbs_InvalidateAndDeleteDataset.py

Caveat: need some feedback

This tool is intended to easily manage the various private dataset copies at SE, e.g.: you staged out and published at a T2, moved the output to a T3 (and published), then you want to remove the copy located at the T2.

dbs_InvalidateAndDeleteDataset.py provides this for some user cases:

  • Remove data from a SE and unregister it from DBS
  • Remove all data, unregister it and invalidate the dataset
  • Invalidate the dataset

Usage:

Usage: dbs_InvalidateAndDeleteDataset.py [--dbs=ph01|ph02] [--all] [--site=TX_YY_SITE] dataset

If --all is used, the the dataset will be unregistered and deleted from ALL the sites. Otherwise,
it will only be deleted and invalidated from the site specified in --site.

The invalidation is actually done through the DBSInvalidateDataset.py provided by CRAB [*]. If you
just want to invalidate a dataet, but not to delete the data from the SE, try to use this tool instead.

[*]
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabForPublication?redirectedfrom=CMS.SWGuideCrabForPublication#Invalidate_a_dataset_in_DBS


Options:
  -h, --help   show this help message and exit
  --dbs=DBS    DBS instance, can be: ph01, ph02
  --all        Delete the sample from all the sites, and invalidate the
               dataset
  --site=SITE  Delete the sample from this site, both physically and from DBS
               (just the replica information. If the dataset is
               available at other sites, it is still VALID in DBS).
  --yes        Answer YES to all the questions. USE IT WITH CARE!
  --debug      Verboooose

Usecase 1: remove a replica

Let's suppose that you want to remove a replica from site T3_AA_FOO. In this case, the proper command is:

python dbs_InvalidateAndDeleteDataset.py --dbs=ph02 --site=T3_AA_FOO  <DATASETNAME>

This will physically delete all files at the given SE and remove the corresponding blocks from DBS. In case of errors in the deletion, the block will be removed from DBS anyway and a list of files which failed the deletion will be given as output. PS "No such file" is not considered as an error

Usecase 2: remove a dataset

If you want to completely remove a dataset, then:

python dbs_InvalidateAndDeleteDataset.py --dbs=ph02 --all  <DATASETNAME>

This will delete all the replicas at all SEs, remove the SE information from DBS and invalidate the dataset using the DBSInvalidateDataset.py tool shipped with CRAB

-- LeonardoSala - 22-Jun-2010

Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2012-08-21 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback