The CERN Castor default pool and how to avoid using it

This page gives an overview of castor disk pools at Cern and especially hints about the proper usage of the "Cern Castor Default Pool"

What are Castor Pools? Which ones exist?

The storage area for LHCb data at Cern is divided into 4 different "disk pools" which are described in the table below

# Common name Grid Storage Tape backend Replicated on other sites "Dirac Name" "Dirac storage element" Castor service class Castor path Size (Feb '12)
1 Cern local storage aka "Castor default pool" N Y N N/A N/A lhcb???? /castor/cern.ch/user 45 TB
2 Grid user storage Y N N CERN-USER CERN-USER lhcbuser /castor/cern.ch/grid/lhcb/user 260 TB
3 LHCb data on permanent disk Y N Y CERN-DISK CERN_M-DST, CERN-DST, CERN-FAILOVER, CERN_MC_M-DST, CERN_MC-DST, CERN-HIST, CERN-FREEZER lhcbdisk eg. /castor/cern.ch/grid/lhcb/LHCb/Collision11/BHADRON.DST 1.3 PB
4 Disk pool for LHCb data in front of tape Y Y Y CERN-TAPE CERN-RAW CERN-RDST CERN-ARCHIVE lhcbtape eg. /castor/cern.ch/grid/lhcb/data/2011/RAW/FULL 260 TB

All the disk pools for "grid storage" will also exist in a similar shape at T1 sites, e.g. CNAF-DST, GRIDKA-USER, PIC-RAW, etc.

The Castor default pool

As can be seen from the table above the "Castor default pool" (Pool #1) is the smallest of all disk pools. This pool should not be used for any analysis work but for storage of "private data", eg.

  • backup of my local files (e.g. pc backup)
  • ....

In all other cases one of the other pools of above shall be used.

Castor service classes

A "castor service class" is a hint to the castor system which disk pool to use for accessing the files. If no service class is given for the access to the file the "default pool" ill be used. Therefore it is always essential to provide the proper service class for file access, the castor service class is described by the option "svcClass=" in the access string, e.g. the file below is in service class "lhcbdisk"

root://castorlhcb.cern.ch//castor/cern.ch/grid/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst?svcClass=lhcbdisk

Use cases

Access to analysis files

Files that were produced by "official productions" and are located at "CERN-DISK" (Pool #3) shall ALWAYS be accessed via the proper Castor service class "lhcbdisk". If no service class is being used for access this will trigger an unnecessary copy of the analysis file to the default pool and access from there.

See below "Get the proper access..." for details on how to access grid analysis files correctly.

Access to user files

Access to "grid user files"

Grid user files Pool #2 live in the namespace "/castor/cern.ch/grid/lhcb/user", the access to those files can be done via the same tools as above for analysis files. See "Get the proper access..." for more details.

Access to "local user files"

Local user files live in the namespace "/castor/cern.ch/user", Pool #1 those files cannot be accessed via grid tools, as the are not known to the grid environment (e.g. no entry in the LFC). For these kind of files there are two possibilities

  • Gridify the file <- RECOMMENDED, for details on how to do this see "Gridify my user files" below
  • If a gridification is not possible the files need to be accessed via the default pool. Before doing this please think twice whether it would not make sense to put the files into grid storage given all the limitations of this pool described above.

Possible solutions

Write my file output to grid storage instead of Castor default

Using Ganga

The Ganga backend when submitting jobs locally to the Cern batch system (bsub) shall be set to Dirac(). So when setting up your Ganga job you may use

j=Job()
j.backend= Dirac()

which will automatically write your output files into the grid storage area.

"Gridify" my user files

Files on the local disk

Files that are not located in the grid storage area need to be made known to the grid tools (e.g. File Catalog (LFC)). This can be achieved with the dirac tool "dirac-dms-add-file", e.g.

[pcites03] /afs/cern.ch/user/r/roiser > dirac-dms-add-file /lhcb/user/r/roiser/my-grid-analysis-file my_local_analysis_file CERN-USER {'Failed': {}, 'Successful': {'/lhcb/user/r/roiser/my-grid-analysis-file': {'put': 9.388253927230835, 'register': 0.24701905250549316}}} [pcites03] /afs/cern.ch/user/r/roiser >

Files in the Castor default pool

For the time being files in castor default pool (Pool #1) that need to be put into grid storage need to be copied to the local disk before and then the recipe above needs to be applied. In the future a dirac tool "dirac-dms-gridify-castor-file" will be available for use.

Get the proper access to files in grid storage

In case you are accessing files on "grid storage" (see table above) you may use the following two commands in order to get the proper access to the files you need, starting from a generic LFN.

[pcites03] /afs/cern.ch/user/r/roiser > dirac-dms-lfn-replicas /lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst
{'Failed': {},
 'Successful': {'/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst': {'CERN-DST': 'srm://srm-lhcb.cern.ch/castor/cern.ch/grid/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst',
                                                                                                   'IN2P3_M-DST': 'srm://ccsrm.in2p3.fr/pnfs/in2p3.fr/data/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst',
                                                                                                   'PIC-DST': 'srm://srmlhcb.pic.es/pnfs/pic.es/data/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst',
                                                                                                   'SARA-DST': 'srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst'}}}

from the list above you need to take the name of the "Dirac space token", i.e. here CERN-DST and feed it to

[pcites03] /afs/cern.ch/user/r/roiser > dirac-dms-lfn-accessURL /lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst CERN-DST
{'Failed': {},
 'Successful': {'/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst': 'root://castorlhcb.cern.ch//castor/cern.ch/grid/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst?svcClass=lhcbdisk"'}}

The output of this command provides you a file name that can be used in your application to access the file through the "grid pool", e.g.

[pcites03] /afs/cern.ch/user/r/roiser > root -b
  *******************************************
  *                                         *
  *        W E L C O M E  to  R O O T       *
  *                                         *
  *   Version   5.32/00   2 December 2011   *
  *                                         *
  *  You are welcome to visit our Web site  *
  *          http://root.cern.ch            *
  *                                         *
  *******************************************

ROOT 5.32/00 (tags/v5-32-00@42375, Dec 02 2011, 12:42:25 on linuxx8664gcc)

CINT/ROOT C/C++ Interpreter version 5.18.00, July 2, 2010
Type ? for help. Commands must be C++ statements.
Enclose multiple statements between { }.
root [0] f = TFile().Open("root://castorlhcb.cern.ch//castor/cern.ch/grid/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst?svcClass=lhcbdisk")
Warning in <TClass::TClass>: no dictionary for class DataObject is available
[...]
(class TFile*)0xda676e0
root [1] f->GetSize()
(const Long64_t)5449176776
root [2] 

-- StefanRoiser - 20-Feb-2012

Edit | Attach | Watch | Print version | History: r12 | r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2012-02-21 - StefanRoiser
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback