New Computing Site

CMS welcomes universities/institutes contributing to the computing needs of the experiment! We currently have about 125 institutes providing significant computing resources for production and analysis. CMS had a computing pledge deficit the last years. If you have compute and/or storage resources available at your institute consider allowing CMS to access them even if only opportunistically, i.e. when not utilized otherwise.

Institutes contributing significant resources to the CMS computing infrastructure receive EPR credits.

Tiered Computing Structure

Computing sites are organized in a tiered structure:
  • a tier-0 computing center at CERN for RAW data archival and prompt reconstruction;
  • seven tier-1 sites with large computing resources, providing 24 hour support and persistent mass storage, i.e. tape;
  • about 50 tier-2 sites with varying computing capacities constituting the bulk of the CMS computing;
  • and about 80 tier-3 sites many of which are grid-enabled and registered.
A tier-0 site can also operate outside of CERN but needs to have excellent network connection to CERN/P5. The Swiss Supercomputing Center in Lugano is, for instance, providing special tier-0 service during data taking. Tier-1 and 2 sites need to be registered with WLCG (and local grid infrastructure) and "pledge" to make compute and storage resources available to CMS in quarter year granularity. Compute pledges are in HEPSpec06 and GBytes of disk/tape space, with one Intel/AMD core about 10 HEPSpec06. Tier-3 sites are most flexible. CMS is most interested in grid-enabled tier-3 sites. (EGI/OSG/WLCG registration is a plus.)

Hardware to setup a Computing Site

CMS computing is currently bound to x86 architecture, uses 64-bit addresses and Linux. Any x86_64 CPU (Intel, AMD, etc.) with a Linux version that supports CVMFS , the CERN Virtual Machine FileSystem, and Singularity, a light-weight container virtualization, should be able to run CMS software. CMS can use batches of compute servers, no high-performance computing setup is needed. However, if your institute is equipped with a "High Performance Computer" (HPC), CMS can use it. CVMFS is a network filesystem used to distribute CMS software to computers. Compute servers thus need a network connection. To use multiple compute servers at a site a batch system is required. To integrate the compute resources of a site into the CMS computing infrastructure a CE service is needed. Both batch and CE service can run on a small server for most sites. Most CMS computing does not require high-performance storage. Any existing storage solution at a site should work fine. Various storage technologies are utilized by CMS sites to combine simple Linux disk servers into a consistent storage system. DPM, dCache, and HDFS are the most popular ones. Several protocols for local data access are supported, from local, mounted filesystem (POSIX) to xrootd. Xrootd access supports both LAN and WAN access and is currently the most popular one. To distribute CMS data to a storage system it needs to be grid-enabled, i.e. provide an endpoint with a supported protocol. The protocol currently used by CMS is gsiftp (or higher-level GSI based protocols like gridFTP and SRM). Storage needs to be IPv6 accessible on the network.

Minimal Tier-2 Setup

  • 500 CPU cores (i.e. 8 machines with dual-socket and 32-core CPUs); this would allow to pledge about 4,000 to 6,000 HEPSpec06
    • compute servers, i.e. worker nodes, should have 2 GB of memory per core. If HyperThreading is enabled, about a 30% gain, and being used this would be 2 GB per CPU thread. Worker nodes also need scratch space for running jobs, about 20 GB per core. For more information, see the CMS VO-card.
    • 64-bit Linux (preferentially CentOS or Scientific Linux) with CVMFS and Singularity running on the worker nodes
  • 330 TBytes of disk space (i.e. one 24-disk server with 14 TB hard drives)
  • 2 Gbps Internet connectivity
  • a batch system with 8-core/48-hour queues and either an ARC-CE or HTCondor-CE service
  • a dual-stack, IPv4 and v6 accessible, gsiftp and xrootd storage endpoint
  • a small server for squid service (an HTTP caching service) and PhEDEx service
  • support during work days/hours

Getting Started

  • Check with your funding agency, if necessary, if you/your institute/your country can pledge computing resources to CMS. If yes, consider a tier-2 site, if not your site will be a tier-3.
  • Contact CMS Level-1, cms-offcomp-coordinator@cernNOSPAMPLEASE.ch, and Facilities & Services coordinators, Giuseppe.Bagliesi@cernNOSPAMPLEASE.ch and lammel@fnalNOSPAMPLEASE.gov, and let them know about your plans and if/where you could use assistance.
  • Acquire the hardware and start the site setup:
  • Registering your site with EGI/OSG and WLCG:
    • EGI registration (EGI calls sites "Resource Centres") web page
    • OSG registration (OSG calls sites "Resource Groups") web page
    • WLCG registration (WLCG groups sites into "Federations" if your country has already a WLCG-registered site, your site can probably be in the same federation, otherwise you need a new federation) web page
  • Acquire Certificates for the CE, gsiftp, and xrootd service:
    • grid authentication/authorization is currently based on certificates; grid services need to have a service certificate to identify themselves. Depending on the grid infrastructure, country, and institute, you might have options from which certificate authority, CA, you obtain the service certificates for your site. The CA needs to be supported/trusted by WLCG, i.e. from the Interoperable Global Trust Federation, IGTF. A list of national IGTF CAs can be found here.
  • CMS specific setup:
    • Please make sure you and all system administrators are registered with the CMS secretariat and have CMS VO membership. System administrators don't need to be CMS members but can be registered as non-CMS members.
    • Select a name for your site. CMS site names are of the form Tn_CC_Name with "n" being the tier number, "CC" being the ISO-3166 2-letter country code of your institute, and "Name" a short name/addbreviation/identifier of your institute. Take a look at the list of existing CMS sites. (No "-" or " " are allowed and no ending in "_Disk", "_Buffer", "_MSS", or "_Export".)
    • Request creation of the site in the information system and configuration repository via GGUS ticket. (If you cannot create a GGUS ticket, please complete/check the registration page.)
      • Type of issue: CMS_Register New CMS Site, CMS Support Unit: CMS Site Support, subject: Tn_CC_Name registration
      • Include the following information:
        Title: Tier: CMS Name: Site Executive: Location:
      • Optionally, please provide the list of
        Data Manager: name, e-mail; PhEDEx Contact: name, e-mail; Site Admin: name, e-mail
        . The site executive can set/update this information her/himself but this is a bit complex in the new information system, CRIC, we use and this way you don't need to interact with CRIC.
    • Configure SITECONF for your site:
      • setup a GitLab account if you don't have one yet, how-to
      • clone the project of your site, how-to
      • create site-local-config.xml, storage.json, and storage.xml under the project of your site
      • commit the new files to GitLab, how-to
      • See the twiki page on local site configuration and look at the files of other sites as example. The information in storage.xml is commonly referenced as trivial file catalogue, TFC, in CMS.
    • Configure CVMFS for your site, i.e. the "SITECONF/local" link. Instructions are on the CMS CVMFS twiki page. (You can also use a local software installation and keep it updated but we recommend using CVMFS.)
      • a few hours after committing the site configuration to GitLab it should be visible on CVMFS at /cvmfs/cms.cern.ch/SITECONF/=
    • Setup a squid server, CMS squid twiki page
    • Make your CE known to the CMS computing infrastructure, i.e. setup a glide-in WMS factory entry. The glide-in factory will send pilot jobs via the CE to your batch system that then fetch CMS production and/or analysis jobs that the worker node executes.
      • double check firewall settings, so the CE service can be reached from the glide-in factories located at CERN, Fermilab, and UCSD
      • for the glide-in factory entry, create a GGUS ticket, CMS Support Unit: Glidein Factory and include CE/batch information, i.e.
        • name of the CE
        • name of the batrch queue if not the CE default
        • type of CE (ARC-CE or HTCondor-CE)
        • number of cores the pilot should use (eight is the default right now but if the cores of your worker nodes don't divide evenly by eight, you should propose a different number, like 10 or 12)
        • the wall-time limit of the queue (should be >=48 hours)
        • the memory per core
        • any other special parameters
      • factory operations will add the entry first to the integration factory and verify/debug things with you (or your admins)
    • Setup WebDAV and XRootD endpoints. See instructions above. For some storage technologies gridftp and xrootd endpoints are native and you need to follow the config instructions of the storage technology.
      • configure data access such that CMS collaborators (and only CMS collaborators), i.e. certificates with CMS VOMS extension can access CMS data; for write access to the various areas please take a look at the CMS namespace twiki page;
      • configure IPv4 and v6 firewalls of your institute to allow access to the gsiftp, xrootd, and storage servers (from all CMS sites); so far all sites opted to allow access to the CE, gsiftp, xrootd ports from anywhere and not maintain a list of CMS site subnets;
      • inform the site support team, cms-comp-ops-site-support-team@cernNOSPAMPLEASE.ch, about the xrootd endpoint (hostname and port) of your site.
    • Rucio configuration
      • Contact the transfer/data management team via GGUS, CMS Support Unit = "CMS Datatransfers" and ask them to setup a Rucio Storage Element, RSE, for your site.
      • please provide the site name, storage type, and storage technology, and amount of disk space dedicated for central experiment use
      • ask the transfer/data management team to subscribe/make a rule for the active SAM and HammerCloud datasets for your site
      • if you like to contribute disk space for local dataset subscription/storage, please let them know the amount of space and ask them to setup a local Rucio account.
      • ask the transfer/data management team to include your RSE in the LoadTest setup. (The /store/test/loadtest LFN area is used for the Rucio based transfer tests.)
      • inform the site support team about the SRM/gridftp/gsiftp endpoint to update the SAM tests
      • answers to Rucio questions
    • Check and debug SAM test of your site:
      • with the glide-in factory entry, a Rucio Storage Element configured, and the xrootd endpoint reported, your site should have an entry in the VO-feed with all resources listed. (The SAM storage tests use the /store/mc/SAM (reading) and /store/unmerged/SAM (writing) LFN areas.)
      • go to the SAM3 dashboard and check SAM test results of your site
        • click on "Historical View" in the top menu bar
        • select "Test History" in the left-most pull down
        • select "CMS_CRITICAL_FULL" in the right-most pull down
        • click "Show Results"
        • the page shows the result of each test grouped by resource as function of time. The green, yellow, red sections are click-able and this brings you to the log file of a test
    • Ask HammerCloud, HC, to be enable by sending an email to cms-comp-ops-site-support-team@cernNOSPAMPLEASE.ch
    • Inform CMS about the amount of disk space that can be used at your site. This should be between 70% and 80% of the pledge, as you need space for temporary file areas and local datasets/users. Please take a look at the [[CMS.DMWMPG_Namespace][CMS Name space documentation]. The amount of disk space that can be used by CMS is called "DDM quota" and you ask cms-comp-ops-transfer-team@cernNOSPAMPLEASE.ch to set it for your site.
    • For temporary file areas, you need to setup a cleanup daemon/crontab job. This is described on the CMS Name space twiki.

Sites that are not grid-registered will have no GGUS site entry, i.e. tickets cannot be assigned to the site. If at all possible, please register a grid-enabled site!

Sites that are not grid-enabled are effectively local resources. They can have a PhEDEx node and gsiftp (or xrootd) endpoint and thus data can be send to the site/served by the site. The above CE, worker node, and glide-in WMS factory step vanish. In case no PhEDEx node, gsiftp, and xrootd endpoint are setup there is no reason to register the site with CMS.

Using a Supercomputer or Existing Community Cluster

The setup to use a supercomputer or existing community cluster to provide compute resources is very similar to the above if the nodes of the cluster/supercomputer have Internet access. In that case CVMFS and Singularity need to be installed/configured on the cluster/supercomputer and a CE setup to provide a grid interface to the resource.

In case the cluster/supercomputer has outgoing Internet access but is not reachable from the internet but the node with the CE is reachable from the Internet, there is also no problem. For setups without outgoing internet access things are significantly more difficult. Please contact cms-comp-ops-site-support-team@cernNOSPAMPLEASE.ch for possible solutions.

Using Cloud Resources

Several sites use private and commercial cloud resources to complement a local worker node farm. Please contact the Dynamic Resource Provisioning group at this email cms-clouds@cernNOSPAMPLEASE.ch.

Useful Links:

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2022-05-04 - StephanLammel
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback