USCMS Tier 3 Documentation

Table of contents:

Introduction

The point of a CMS Tier 3 is to utilize local computing resources to do any of the following tasks: develop analysis code, access CMS data, submit jobs to remote CMS resources and/or store analysis outputs. There are numerous configuration possibilities depending on available hardware and local expertise. The USCMS Tier 3 support staff would like to support all such configurations but due to limited time have compiled a list of suggested configurations with a description of their various capabilities. Hopefully, this guide will serve as a useful starting place. However, we realize it is impossible to document every possible variation or issue a site might encounter. Please do not hesitate to reach out for help. (See the "Getting Help" section below.)

Capabilities

Task Configuration A Configuration B Configuration C
Develop user analysis code (CMMSW) Green led Green led Green led
Submit grid analysis jobs (CRAB) Green led Green led Green led
Local root analysis Green led Green led Green led
Use XROOT to read data from T2/T3 Green led Green led Green led
Support multiple users Yellow led Green led Green led
Full local datasets w/CPUS to run over it Red led Red led Green led
Accept remote grid jobs Red led Red led Green led
Stageout to local storage Red led Red led Green led
PhEDEx endpoint Red led Red led Green led
Run production jobs from collaboration Red led Red led Yellow led

Key: Green led = Always, Yellow led = Sometimes, Red led = Never

Configuration A

Configuration A is comprised of a single server with 4 to 16 CPU slots, a few terabytes of local disk space, a persistent internet connection and little else.

Hardware:

  • Single Server with >=4 cores
  • >=2 GB RAM/core
  • Local hard disk storage >= 3 TB
Network:
  • Outbound Internet connectivity: Required
  • Inbound Internet access: Not required
  • Static IP address: Not required

Site Registration:

  • OSG Registration: Not required
  • CMS Registration: Not required

Configuration B

Configuration B is comprised of a small cluster of computers dedicated to doing CMS data analysis. A cluster of this type may have a handful of nodes, an NFS server for common user space, a job scheduler and an interactive login node, much like found in Configuration A.

Hardware:

  • Capable headnode server Info
  • Multiple servers with >=4 cores
  • >=2 GB RAM/core for workers
  • >=10 GB/core scratch space on worker harddrives
  • NFS Storage >= 2 TB per user
  • Network switch for workernode LAN

Info Consult with T2/T3 admins for current options for suitable headnode hardware.

Network:

  • Outbound Internet connectivity: Required
  • Inbound Internet access: Only required for headnode
  • Static IP address: Required for headnode, local ips for workers
  • A private network space for the workernodes with NAT to outside internet

Site Registration:

  • OSG Registration: Not required but possible
  • CMS Registration: Not required

Configuration C

A larger cluster with some added grid services and hardware. This configuration differs from Configuration B with the addition of a "large" file system with one or more gridftp servers. This file system will mostly be a distributed system such as HDFS (Hadoop). The gridftp server(s) are high I/O nodes generally requiring dedicated hardware. This system may be limited to staging in/out data from local users but may also serve as a PhEDEx endpoint to facilitate the movement of CMS data via centralized data services. Configuration C is approaching the capabilities of a Tier 2. Current deployment's for the USMCS Tier2s can be found on their deployment table at CMS Deployment Table.

Personnel Requirements

It is difficult to quantify the personnel requirements (number of FTEs) to install, configure and maintain a Tier 3 site. The requirements depend on who is doing the system administration work; campus IT staff, post-doc, graduate student, faculty, dedicated CMS computing staff, etc; their prior system administration experience and their familiarity with OSG and CMS software. For any of these groups of people the amount of time to learn the software systems, properly monitor the site and keep the site running will vary greatly. There are two qualities that are important for the success of a Tier 3 site, continuity of system/site information and consistent attention to the functioning of the site. The first quality is most relevant for sites that use personnel that change on a relatively short time frame, for example every few years. For these sites, it is important that current site administrator(s) pass their knowledge to the next site administrator(s). This can greatly reduce transition time for new personnel and greatly increase the probability that the site will continue to function during the transition. The later quality refers to amount of attention paid to the site, being aware of problems as they occur and addressing them in a timely manner. Continuous attention to the site both keeps the site functioning and the administrator current with the technology.

Configuration A, the simplest defined Tier 3, should be doable and maintainable by anyone in HEP.

Hardware Requirements

  Configuration A Configuration B Configuration C
Server Class Machine for Interactive login Info Green led Green led Green led
Shared file system such as NFS Yellow led Green led Green led
Job Scheduler Yellow led Green led Green led
Seperate Server Class Machine(s) For Grid Services Red led Yellow led Green led
Distributed Filesystem Red led Red led Green led

Key: Green led = Always, Yellow led = Sometimes, Red led = Never

InfoA "server class machine" is a vague term to indicate a modern computer with a persistent internet connection. It's definition is also highly time dependent as technology presents a moving target to what is considered "modern". Interested parties are encouraged to contact uscms-tier3-team@cernNOSPAMPLEASE.ch and ask for recommendations for specific hardware to meet the needs.

Hosts Certificates - Required

In 2017, the OSG stopped being able to supply hosts and personal certificates. Each institution is now responsible for obtaining their own host certificates. Most institutions are members of InCommon and can get IGTF compatible certificates through their host institution's site contract. For those not members of InCommon, it is recommended that you get certificates from DigitCert. As of Fall 2019, the cost per host is $144. Please see Getting DigiCert IGTF Compatible Certificate for help in buying host certificates from DigiCert.

Software Requirements and Working Knowledge

Software Dependency Diagram:

Below is a directed graph showing the dependencies for the software installation and/or system configuration. Across the top of the diagram is the basic (minimal) setup for a Tier 3. In this view, a USCMS Tier 3 is defined as an interactive login computer running a variant of RHEL (CentOS, SL, SLC) with CVMFS installed. This is the basic building block for any site. From this configuration, additional functionality can be added such as a Compute Element and/or a Distributed File System with GRID access.

  • Compute Element: is the GRID site head node; also known as the gateway. It allows grid users to run jobs on your site. In order to address possible confusion, a Compute Element does not run GRID jobs. GRID jobs are run by Worker Nodes. The Compute Element translates grid tasks into jobs submitted to a local batch queue.
  • Gridftp server: is the data transfer gateway to a GRID site. When data move into or out of a site they are transferred using one or more GRIDFTP servers. The most common data storage system for CMS is HDFS, the Hadoop File System.

Also shown in the diagram are two possible registrations; OSG and CRIC (was*SiteDB*). When a site implements additional functionality beyond the basic setup, it is recommended that the site register with both entities. Some functionality is only possible for registered sites for example running CMS/CRAB jobs or transferring data via PhEDEx. Please see OSG Registration Instructions for instructions on registering with the OSG. Please see Site Support Team - Documentation / Adding a New CMS Site under paragraph 3 titled SiteDB for instructions on registering your site in SiteDB.

Guide to using the graph: In the graph there are oval boxes and rectangular boxes. The 3 rectangular boxes represent system configurations: Minimal System (Configuration A), Distributed File System (Configuration B) and Compute Element (Configuration C). All configurations start with the Minimal System. To find the software components of the Minimal System follow the connected oval boxes.

digraph G { /* rankdir = TB; */ rank = source; decorate=true; labelfloat="true'; graph [bgcolor="#eeeeff"]; graph [bgcolor="#aaaaaa", size="1000x800"]; // node [fontsize=14, fontname="ariel"]; // clusterrank="false"; // edge [color=blue, fontsize=14, fontname="ariel", style="bold"];

// edge [color=yellow, style="bold"] t3 -> CRIC [label="Recommended"] t3 -> osg [label="Recommended"]; // edge [color=blue]

// ce -> bh [label="Required", color="#000000", style="bold"] ce -> bh [label="Required"] ce -> CRIC [label="Highly\nRecommended"]; xrootd -> squid [label="Required, weight="0""]; ce -> certs [label="Required"]; ce -> lcmaps [label="Required"]; ce -> rsv [label="Recommended"]; ce -> osg [label="Highly\nRecommended"]; ce -> xrootd [label="Highly\nRecommended"];

se -> osg [label="Highly\nRecommended"]; se -> gridftp [label="GRID Transfer\nTechnology", weight="0"]; se -> hdfs [label="Recommended", weight="0"]; se -> phedex [label="Almost Included", weight="0"]; se -> CRIC [label="Highly\nRecommended"]; // se -> squid [label="Recommended"]; se -> certs [label="Required"]; se -> lcmaps [label="Required"]; se -> rsv [label="Recommended"]; se -> xrootd [label="Highly\nRecommended"];

cvmfs -> squid [label="Requireda"];

/* {rank=same; rankdir=LR; ce; se; squid; osg}; {rank=same; htcondor; gridftp; certs; CRIC;}; {rank=same; hdfs; lcmaps; xrootd;}; */

{rank=same; t3; os; cvmfs; bh;}; // {ranl=same; CRIC; ce; ;xrootd; squid; gridftp} t3 -> CRIC [style=invis, weight=10000]; os -> ce [style=invis, weight=10000]; cvmfs -> squid [style=invis, weight=10000]; bh -> se [style=invis, weight=10000]; // htcondor -> xrootd [style=invis, weight=10000];

subgraph cluster_tier3 { // rankdir = LR; t3 -> os [label="Required"]; os -> cvmfs [label="Required"]; cvmfs -> bh [label="Recommended"]; t3 [shape=box, label="USCMS\nTier 3", style="filled", fillcolor="#eeffee"] os [shape=oval, label="RHEL", style="filled", fillcolor="green"]; cvmfs [shape=oval, label="CVMFS", style="filled", fillcolor="green"]; bh [shape=oval, label="Local Batch Scheduler", style="filled", fillcolor="yellow"]; label = "USCMS Tier 3"; style = "filled"; fillcolor = "#111111"; } subgraph cluster_register { # rankdir = LR; CRIC -> osg [style=invis]; CRIC [label="CRIC", style="filled", fillcolor="yellow"]; osg [label="OSG", style="filled" fillcolor="yellow"]; label = "Site\nRegistration";; style = "filled"; fillcolor = "#ffffdd"; } subgraph cluster_ce { ce -> htcondor [style="bold", arrowhead="none", label="Current\nCompute Element\nTechnology"]; ce [shape=box, style="filled", fillcolor="#eeffee", label="Compute Element"]; htcondor [label="HTCondorCE", style="filled", fillcolor=green]; label = "Compute\nElement"; style = "filled"; fillcolor = "#ccccdd"; } // subgraph cluster_xrootd { xrootd [shape=oval, label="xrootd\nredirector", style=filled, fillcolor="yellow"]; // } subgraph cluster_extras { squid -> certs [style=invis]; certs -> lcmaps [style=invis]; lcmaps -> rsv [style=invis]; squid [label="Squid", style="filled", fillcolor="yellow"]; certs [label="Host Certifcates", style="filled", fillcolor="green"]; lcmaps [label="LCMAPS", style="filled", fillcolor="green"]; rsv [label="RSV Probes", style="filled", fillcolor="yellow"];

label = "Software\nsub-systems"; style = "filled"; fillcolor = "#ffffdd"; } subgraph cluster_storage_element { se -> gridftp [style=invis]; gridftp -> hdfs [style=invis]; hdfs -> phedex [style=invis]; se [shape=box, style="filled", fillcolor="#eeffee", label="Grid Services"]; gridftp [label="One or More\ngridftp servers", style="filled", fillcolor=yellow]; hdfs [label="HDFS", style="filled", fillcolor=yellow]; phedex [label="PhEDEx", style="filled", fillcolor="green"]; label = "Distributed\nFile System"; style = "filled"; // width = "20"; fillcolor = "#ccccdd"; }

Minimal system:

  • RHEL variant operating system - Required
  • CVMFS - Required
  • Local Batch Scheduler - Recommended
  • Squid proxy server - Recommended
  • CRIC registration - Recommended

Distributed File System:

  • Minimal System - Required
  • Host Certificates - Required
  • LCMAPS - Required. Please see footnote 3 below.
  • Squid proxy server - Recommended
  • RSV monitor probes - Highly Recommended
  • One or more Gridftp Server(s) - Optional
  • Hadoop file system (HDFS) - Recommended
  • PhEDEx - Almost included
  • xrootd redirector - Recommended
  • CRIC registration - Highly Recommended
  • OSG Registration - Highly Recommended

Compute Element:

  • Minimal System - Required
  • Host Certificates - Required
  • LCMAPS - Require. Please see footnote 3 below.
  • Squid proxy server - Required
  • Singularity - Required
  • RSV monitor probes - Highly Recommended
  • xrootd redirector - Highly Recommended
  • CRIC registration - Highly Recommended
  • OSG Registration - Highly Recommended

If you are having trouble reading the figure, click on [ps] below the figure to view a postscript version.

System Configuration Table:

Software System Why A B C Documentation to Get Started
Minimal Configuration
Linux OS, CVMFS Base Computing Platform Green led Green led Green led CVMFS Overview
        CVMFS Installation
NFS Service Local User Space for Files and Code Yellow led Green led Green led NFS (external link)
Condor, Slurm, PBS, LSF or SGE6 Manage Jobs and Worker Computers Red led Green led Green led HTCondor HomepageInfo
Squid Cache Server5 Cache CVMFS and Calibration Constants to limit WAN usage Red led Yellow led Green led Frontier Squid
Gridftp Transfer services
Gridftp server(s)1 Receive Data From Jobs and PhEDEx Red led Yellow led Green led OSG GRIDFTP
Host Certificate(s) Allow Servers to Run Services Requiring Authentication Red led Yellow led Green led OSG Certificate Information
Hadoop File system (HDFS) Store large Amounts of Data Red led Yellow led Green led OSG HDFS Overview
        Hadoop Installation
LCMAPS1,3 Grid services authorization Red led Yellow led Green led OSG LCMAPS
RSV probes2 Monitor Services Red led Yellow led Green led RSV Overview
PhEDEx4 Move Data into Site Automatically Red led Yellow led Green led Contact uscms-tier3-team@cernNOSPAMPLEASE.ch
about Central PhEDEx
Compute Element Configuration
OSG Compute Element1 Receive Jobs from CRAB/grid Red led Red led Green led Installing HTCondor-CE
Singularity Job Isolation/OS independence Red led Red led Green led Installing Singularity
XrootD Re-director Service Read Data from Local Site "Anywhere" Red led Red led Green led OSG XrootD overview
Site Registration
CMS CRIC Registration Run Jobs from Production Factories Yellow led Yellow led Yellow led Adding a New CMS Site
OSG Registration Registered GRID Site Yellow led Yellow led Yellow led OSG Registration Instructions

Recommendations:

Info HTCondor is strongly recommended for sites without a preference for their batch system. HTCondor has a strong installation base at the USCMS Tier2s and thus has a large cross-section of expertise. Sites are welcome to install or utilize supported batch systems that align with their local institutional expertise and policies. Please see Configuring the Batch System for details on how to interface HTCondorCE to other batch systems.

Info More complex site installations are encouraged to invest the time, effort and hardware to install a configuration management system such as Puppet or Chef.

Software Notes:

1. GRID servers (Compute Element, GridFTP, xrootd redirectory, etc) require host and/or services certificates. These certificates are obtained from the appropriate certificate authority. See: Host Certificates
for information for sites in the US. Sites outside the US may be able to follow the OSG directions or might have to obtain certificates from a regional or national certificate authority instead. 2. RSV probes can be run using either a personal certificate or a service certificate.
3. As of late 2017, LCMAPS is the only supported authorization method. GUMS and grid-mapfil are no longer support and will fail to work in Spring 2018
4. It is highly recommended that T3 sites use the CMS central PhEDEx facility at FNAL rather than installing and running PhEDEx locally.
5. Use of a Squid Cache Server is needed for sites with >10 worker nodes. It improves the use of CVMFS and conditions access.
6. Please see Configuring the Batch System for details on how to interface HTCondorCE to other batch systems.

Additional Resources:

Hadoop Best Practices

Hadoop Best Practices documents methods for monitoring and maintaining various aspects of a Hadoop installation. Following best practices will increase the reliability of the HDFS file system and reduce the possibility of data loss or corruption.

Getting Help

There are several ways that you can get help if you have questions or run into problems:

OSG/Tier 3 Hypernews

This mailing list provides a way to interact with both the USCMS T3 support staff as well as the broader community of Tier 3 admins and enthusiasts. It does require access to CMS computing accounts to join.

Bi-Weekly Meeting

There is a bi-weekly meeting via Vidyo for those intersted in T3 issues. The meeting encompasses two parts; announcements relevant to T3 sites and community support. The main emphasis of the meeting is the community support part. During community support anyone can ask for help on any subject. The problems and solutions are generally posted to the OSG/Tier 3 Hypernews.

US CMS Tier-3 Support Team

You can contact the US CMS Tier-3 support team directly at the following e-mail address: uscms-tier3-team@cernNOSPAMPLEASE.ch. Anyone can send an email to the US CMS TIer 3 support team.

GGUS Ticketing System

CMS uses the GGUS 1 ticket tracking system. The GGUS system is the central helpdesk for the European Grid Infrastructure (EGI) and the Worldwide LHC Computing Grid (WLCG) communities2. Tickets submitted to GGUS can be routed to the OSG ticketing system. Filing a GGUS ticket requires a valid GRID certificate.

The USCMS Tier 3 support team recommends using the above facilities before filing a GGUS ticket. We feel that many of the issues that sites experience can be solved more expediently by either posting to the Tier 3 hypernews or by emailing the Tier 3 support team. If the situation warrants a GGUS ticket, we would be happy to file the ticket and help keep track of the progress. If you want to file a ticket, please use this link Submit CMS Ticket, select your site from the pulldown menu "Notify CMS SITE" and CC uscms-tier3-team@cernNOSPAMPLEASE.ch. We also recommend putting your site name in the body of the problem description.

  1. GGUS Main Page
  2. FAQ GGUS-Short-Guide
Edit | Attach | Watch | Print version | History: r48 < r47 < r46 < r45 < r44 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r48 - 2019-09-30 - DouglasJohnson
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback