The CERN Castor default pool and how to avoid using it

This page gives an overview of castor disk pools at Cern and especially hints about the proper usage of the "Cern Castor Default Pool"

Introduction

What are Castor Pools? Which ones exist?

The storage area for LHCb data at Cern is divided into 4 different "disk pools" which are described in the table below

# Common name Grid Storage Tape backend Permanent disk Replicated on other sites SRM space Dirac storage element Castor service class Castor path Size (Feb '12)
1 Cern local storage aka "Castor default pool" N Y N N N/A N/A default /castor/cern.ch/user 60 TB
2 Grid user storage Y N Y Y LHCb-USER CERN-USER lhcbuser /castor/cern.ch/grid/lhcb/user 240 TB
3 LHCb data on permanent disk Y N Y Y LHCb-DISK CERN_M-DST, CERN-DST, CERN-FAILOVER, CERN_MC_M-DST, CERN_MC-DST, CERN-HIST, CERN-FREEZER lhcbdisk eg. /castor/cern.ch/grid/lhcb/LHCb/Collision11/BHADRON.DST 1.3 PB
4 Disk pool for LHCb data in front of tape Y Y N Y LHCb-TAPE CERN-RAW CERN-RDST CERN-ARCHIVE lhcbtape eg. /castor/cern.ch/grid/lhcb/data/2011/RAW/FULL 240 TB

All the disk pools for "grid storage" also exist in a similar shape at T1 sites, e.g. CNAF-DST, GRIDKA-USER, PIC-RAW, etc. Note that the storages 3 and 4 are reserved for production output data, and only storages 1 and 2 are accessible for writing to users.

The Castor default pool

As can be seen from the table above the "Castor default pool" (Pool #1) is the smallest of all disk pools. This pool should not be used for any analysis work but might be useful for storage of "private data" or low-scale tests. In all other cases one of the other pools of above shall be used. The namespace for this pool corresponds to the $CASTOR_HOME environment variable. Please note that confidential data (e.g. PC backups - which would contain at least password files) does not belong in CASTOR, since access controls are rather weak, and the data is not encrypted.

Most important: the default pool is not keeping files permanently on disk, therefore there may be a huge penalty for accessing files that need to be recalled from tape, making its usage in jobs most inefficient.

Castor service classes

A "castor service class" is a hint to the castor system which disk pool to use for accessing the files - in principle a given file can be stored on several pools, and in particular there is no mapping between the namespace and the disk pool. If no service class is given for the access to the file the "default pool" will be used, even if the file is already available on a different pool. Therefore it is always essential to provide the proper service class for file access. For ROOT access, the castor service class is described by the option "svcClass=" in the access string, e.g. the file below is in service class "lhcbdisk". Note that one should never use the rfio: protocol any longer, neither to not specify a protocol (which in ROOT defaults to rfio:), but the following URL (a.k.a. PFN) should be used (as generated by the Dirac tools):

root://castorlhcb.cern.ch//castor/cern.ch/grid/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst?svcClass=lhcbdisk

The following two access modes are highly discouraged and Dirac tools should be used instead for transferring files:

  • RFIO access (rfcp), please set the environment variable STAGE_SVCCLASS.
  • access via SRM (srmcp, lcg-cp), please specify the corresponding space token (options --sst and --dst for source and destination space tokens). SRM is special in that it will look into a list of pools if no space token has been specified, but please do not rely on this.

Use cases

The information below is only useful when not using ganga, i.e. are only relevant for interactive or LXBATCH direct job submission. ganga and Dirac deal internally with setting the proper URL for file access.

Access to analysis files

Files that were produced by "official productions" and are located at "LHCb-DISK" (Pool #3) shall ALWAYS be accessed via the proper Castor service class "lhcbdisk". If no service class is being used for access this will trigger an unnecessary copy of the analysis file to the default pool and access from there.

See below "Get the proper access..." for details on how to access grid analysis files correctly.

Access to user files

Access to "grid user files"

Grid user files Pool #2 live in the namespace "/castor/cern.ch/grid/lhcb/user", the access to those files can be done via the same tools as above for analysis files. See "Get the proper access..." for more details.

Access to "local user files"

Local user files live in the namespace "/castor/cern.ch/user", Pool #1 those files cannot be accessed via grid tools, as the are not known to the grid environment (e.g. no entry in the LFC). For these kind of files there are two possibilities

  • Gridify the file <- RECOMMENDED, for details on how to do this see "Gridify my user files" below. This will replicate the file on CERN-USER and register it in the LFC, such that it can be used as any file created on the Grid.
  • If a gridification is not possible the files need to be accessed via the default pool. Before doing this please think twice whether it would not make sense to put the files into grid storage given all the limitations of this pool described above. You should anyway use the root: protocol as described above. The PFN would have to be constructed "by hand", without service class specification, e.g. root:/castor/cern.ch/user/a/another/mycastorfile.root

Possible solutions

Write my file output to grid storage instead of Castor default

Using Ganga

The Ganga backend when submitting jobs locally to the Cern batch system (bsub) shall be set to Dirac(). So when setting up your Ganga job you may use

j=Job()
j.backend= Dirac()

which will automatically write your output files into the grid storage area. The default is to upload two replicas in one of the 7 available storage areas (CERN-USER, CNAF-USER, GRIDKA-USER, IN2P3-USER, PIC-USER, RAL-USER and SARA-USER).

"Gridify" my user files

Files on the local disk

Files that are not located in the grid storage area need to be made known to the grid tools (e.g. File Catalog (LFC)). This can be achieved with the dirac tool "dirac-dms-add-file", e.g.

[pcites03] /afs/cern.ch/user/r/roiser > dirac-dms-add-file /lhcb/user/r/roiser/my-grid-analysis-file my_local_analysis_file CERN-USER
{'Failed': {},
 'Successful': {'/lhcb/user/r/roiser/my-grid-analysis-file': {'put': 9.388253927230835,
                                                              'register': 0.24701905250549316}}}
[pcites03] /afs/cern.ch/user/r/roiser > 

Files in the Castor default pool

For the time being files in castor default pool (Pool #1) that need to be put into grid storage need to be copied to the local disk before and then the recipe above needs to be applied. In the future a dirac tool "dirac-dms-gridify-castor-file" will be available for use.

Get the proper access to files in grid storage

In case you are accessing files on "grid storage" (see table above) you may use the following two commands in order to get the proper access to the files you need, starting from a generic LFN.

[pcites03] /afs/cern.ch/user/r/roiser > dirac-dms-lfn-replicas /lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst
{'Failed': {},
 'Successful': {'/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst': {'CERN-DST': 'srm://srm-lhcb.cern.ch/castor/cern.ch/grid/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst',
                                                                                                   'IN2P3_M-DST': 'srm://ccsrm.in2p3.fr/pnfs/in2p3.fr/data/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst',
                                                                                                   'PIC-DST': 'srm://srmlhcb.pic.es/pnfs/pic.es/data/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst',
                                                                                                   'SARA-DST': 'srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst'}}}

from the list above you need to take the name of the "Dirac storage element", i.e. here CERN-DST and feed it to

[pcites03] /afs/cern.ch/user/r/roiser > dirac-dms-lfn-accessURL /lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst CERN-DST
{'Failed': {},
 'Successful': {'/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst': 'root://castorlhcb.cern.ch//castor/cern.ch/grid/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst?svcClass=lhcbdisk"'}}

The output of this command provides you a file name that can be used in your application to access the file through the "grid pool", e.g.

[pcites03] /afs/cern.ch/user/r/roiser > root -b
  *******************************************
  *                                         *
  *        W E L C O M E  to  R O O T       *
  *                                         *
  *   Version   5.32/00   2 December 2011   *
  *                                         *
  *  You are welcome to visit our Web site  *
  *          http://root.cern.ch            *
  *                                         *
  *******************************************

ROOT 5.32/00 (tags/v5-32-00@42375, Dec 02 2011, 12:42:25 on linuxx8664gcc)

CINT/ROOT C/C++ Interpreter version 5.18.00, July 2, 2010
Type ? for help. Commands must be C++ statements.
Enclose multiple statements between { }.
root [0] f = TFile().Open("root://castorlhcb.cern.ch//castor/cern.ch/grid/lhcb/LHCb/Collision11/CHARM.MDST/00012718/0000/00012718_00000012_1.charm.mdst?svcClass=lhcbdisk")
Warning in <TClass::TClass>: no dictionary for class DataObject is available
[...]
(class TFile*)0xda676e0
root [1] f->GetSize()
(const Long64_t)5449176776
root [2] 

-- StefanRoiser - 20-Feb-2012

Edit | Attach | Watch | Print version | History: r12 | r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r6 - 2012-02-27 - PhilippeCharpentier
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback