5.9 The Role of the T2 Resources

Complete: 5
Detailed Review status

Goals of this page:

This page is intended to familiarize you with performing a large scale CMS analysis on the Grid. In particular, you will learn
  • the role of the Tier-s for user analysis,
  • the organization of data at the Tier-2 sites,
  • how to find and request datasets,
  • where to store your job output,
  • and how to elevate, delete and deregister a private dataset.
It is important that you also become familar with running a Grid analysis with Crab.

Contents

Introduction

The Tier-2 centers in CMS are the only location, besides the specialized analysis facility at CERN, where users are able to obtain guaranteed access to CMS data samples. The Tier-1 centers are used primarily for organized processing and storage. The Tier-2s are specified with data export and network capacity to allow the centers to refresh the data in disk storage regularly for analysis. A nominal Tier-2 will deploy 810 TB of storage for CMS in 2012. The CMS expectation for the global 2012 Tier-2 capacity is 27 PB of usable disk space. In order to manage such a large and highly distributed resource CMS has tried to introduce policy and structure to the Tier-2 storage and processing.

Storage Organisation at a Tier-2

T2_storage_2012.tiff

Apart from 30 TB storage space for central services, like MC production, and buffers, the main storage areas of interest for a user are:

  • 200 TB central space
    Here datasets of major interest for the whole collaboration, like primary skims or the main Monte Carlo samples, are stored. This space is controlled by AnalysisOperations.
  • 250 TB (125 TB * 2 groups) space for the detector and physics groups.
    Datasets which are of particular interest for the groups associated to a Tier-2 site, like sub-skims or special MC samples.
  • In the order of (e.g. 40 users * 4 TB) 160 TB "Grid home space" for local/national user.
    This quota can be extended by additional local/national resources. Mainly the output files from Crab user analysis jobs will be stored in this area.
  • 170 TB local space.
    Data samples of interest for the local or national community. The movement and deletion of the data is fully under the responsibility and control of the site.

Sites larger than nominal will provide resources for more central space, three groups, and additional regional space. Sites smaller than nominal may provide resources for only one physics group, or only central space, or if sufficiently small, only for simulated event production.

How to find a dataset?

If you have identified the physics processes which contribute to the background of your analysis and for your signal you want to know over which datasets you have to run your analysis. From the dataset names this is usually not so obvious. As a general tip you should subscribe to your preferred detector & physics (PAG/POG/DPG) groups’ Hypernews mailing list and to hn-cms-physics-announcements. Sometimes your group provides this information on the group's information page and documentation systems like TWikis or webpages. Ask your colleagues! If you have identified the names of the relevant datasets you should check whether they are available for analysis by utilizing the DAS Data Aggregation System or alternatively the PhEDEx Physics Experiment Data Export.

How to request a replication of a dataset?

The datasets you want to analyse have to be fully present at a Tier-2 (or at your local Tier-3) site. If shown to be present only at a Tier-1 center you can request a Phedex transfer to copy datasets to Tier-2 (and Tier-3) sites. Please consult the responsibles of a Tier-2/3 site operated by your national community if the datasets will be accounted towards local Tier-2/3 space, or the data managers of the physics group you are associated with whether they agree to store the datasets in their Tier-2 group space. After their agreement please give a reasonable explaination in the Phedex request comment field and choose the appropriate group from the corresponding pull-down menu or use local in case of transfers for the local/national community. With Phedex transfers you can not copy datasets into your personal Grid home space.

Where to store your data output?

Usually your Crab analysis job produces an amount of output which is too large to be transfered by the Grid sandbox mechanism. Therefore you should direct your job output to your associated Grid user home storage space using the stage-out option in Crab. Using CERN resources like Castor pools will probably be restricted in the near future, so for the majority of the CMS users a Tier-2 site will provide the output capacity. Usually your Grid home space will be at a Tier-2 site which your country is operating for CMS, if more than one site is present, ask your country's IT contact persons how they distribute their users internally. In case your institute or lab operates a Tier-3 site which has a suffient capability to receive CMS analysis output data over the Grid, also such a site could be used, however CMS support is only on best effort basis. Countries without own CMS Tier-2 centers and with no functional Tier-3 should contact their country representatives who have to negotiate with other sites to provide storage space for guest users.
Your associated Tier-2 provides you with in the order of 4 TB (exact amount to be negotiated with your Tier-2) space, usually only protected on hardware (e.g. Raid disks) level but without a backup mechanism. If there are additional local or national resources available it could be more, for details consult your Tier-2 contact persons.
Presently the Grid storage systems do not provide a quota system, therefore the local Tier-2 support will review the user space utilization regularly. Please try to be carefull not to overfill your home area.
If you register the output of your Crab job in DBS, all CMS users can have access to your data.

How to move a private dataset into offical space and how to delete and deregister a dataset?

In CMS official datasets and user datasets are differentiated. Whereas official datasets are produced centralized, the users are allowed to produce and store their own datasets containing any kind of data at a Tier-2 center. There are no requirements concerning data quality, usefulness and appropriate size to be stored on tape. The data is located in the private user space at the users home Tier-2 and can be registered in a local scope bookkeeping to use provided Grid tools in order to perform a distributed analysis. In principle, this dataset can be analysed by any user of the collaboration, however only at the Tier-2 center hosting the dataset, which has naturally a limited number of job slots. Later it could be possible, that the dataset created by the user becomes important for many other users or even a whole analysis group. To provide a better availability it is reasonable to distribute the dataset to further Tier-2 centers or even to a Tier-1 center for custodial storage on tape. However, the CMS data transfer system can only handle official data registered in the central bookkeeping. Therefore, it is necessary that the user dataset becomes an official dataset fitting all the requirements of CMS. The StoreResults service provides a mechanism to elevate user datasets to central bookkeeping by doing the following tasks:

  • Validation, through authentication and roles, ensures that the data is generally useful.
  • Merge the files into a size suitable for tape storage.
  • Inject data into the central bookkeeping and data transfer system.

The current system is ad-hoc based around a Savannah request/problem tracker for approvals and on the legacy CMS ProdAgent production framework. For the long term future a complete rewrite based on forthcoming new common CMS tools is presently discussed. Further information can be found in URL1.

To delete data from the user‘s home space the usage of Grid commands and the knowlegde of the physical file names is necessary. Please contact your local Tier-2 data manager and ask for advice and help. To invalidate private dataset registrations in a local-scope database in order to synchronise with deleted data samples is not a trivial action so far, a user friendly tool might become available in the future. Until then please consult the DBS removal instruction pages.

Information sources

CMS computing Technical Design Report
Presentation (for 2009 storage resources)

Review status

Reviewer/Editor and Date (copy from screen) Comments
StefanoBelforte - 2015-08-19 remove reference to Filemover
-- JohnStupak - 2-July-2013 Review
-- ThomasKress - 22-Apr-2012 Major changes for 2012 T2 storage and in StoreResults section
-- NitishDhingra - 10-Apr-2012 See the detailed comments below
-- ThomasKress - 29-Apr-2010 temporary URLs to StoreResults service
-- StefanoBelforte - 08-Feb-2010 Complete Expert Review, minor changes
-- KatiLassilaPerini - 16 Feb 2009 Created the template page
-- ThomasKress - 18 Feb 2009 First draft
-- ThomasKress - 19 May 2009 200 TB "end of 2008" instead of "2009"
-- ThomasKress - 27 Nov 2009 Storage resources adapted for 2010, and minor mods.
-- ThomasKress - 03 Dec 2009 Minor mods.

Review with minor modifications. Link to monitor the status of different sites added. The page gives a good overview of the Tier2 resources.

Responsible: ThomasKress
Last reviewed by: IanFisk 19 May 2009

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg T2_Storage_2010.jpg r1 manage 263.4 K 2009-11-27 - 17:15 ThomasKress Tier-2 storage categories in 201
JPEGjpg T2_storage_2012.jpg r1 manage 69.9 K 2012-04-22 - 14:07 ThomasKress Storage categories for a nominal CMS Tier-2 in 2012
PNGtiff T2_storage_2012.tiff r1 manage 376.9 K 2012-04-22 - 14:22 ThomasKress Storage categories of a nominal CMS Tier-2 in 2012
Edit | Attach | Watch | Print version | History: r25 < r24 < r23 < r22 < r21 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r25 - 2017-10-04 - NitishDhingra


ESSENTIALS

ADVANCED TOPICS


 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback