Local Resource Provisioning

Use cases for local resource provisioning

  • Resource-based fair-share and prioritization (see HTCondor ticket). The current glideinWMS work-around using separate frontend groups does not scale, splits the pool with different types of pilots.
    • 95% share for production at Tier-1 sites
    • Higher priority on certain resources for national VOMS groups, local groups, physics groups etc.
  • CRAB analysis on sites with no grid access, e.g. Fermilab LPC
  • Analysis on opportunistic sites

Proposed Solutions

Proposed Solutions depend on whether CE is needed or not. For sites without a CE, there are only two possibilities, and the first one listed blow is very heavy. For sites with a CE, there are more options, with different costs. Solutions 1-4 come from this presentation.

  1. Site replicates CRAB3 Server, submission infrastructure (no CE)
    • Large overhead for the site
  2. Site runs local schedd which points to production CRAB3 Server (CE)
  3. Site-customized glideins (CE)
    • glidein-based solution, only match to prefered users
  4. non-centrally-launched glideins (no CE)
    • Brian's favored solution.
    • Not specific to CRAB3.
    • Resources provisioned manually or by a script.
    • Jobs submitted by users (which can only run their own jobs) or by the site (with Role=pilot, but a whitelist of allowed users).
  5. HTCondor resource-based fair-share (CE)
    • Moves problem from glidienWMS to HTCondor
    • Will it scale?
  6. Get glideinWMS frontend groups to scale better? (CE)
  7. Sites change their policies and allow grid submissions or deploy a grid CE (no CE)

Non-centrally-launched glideins in depth

There are a few required pieces here:

  • Script to launch a pilot available in CVMFS. Provide a script that can launch a CMS pilot given only a valid proxy. The rest of the information is taken from CVMFS, site-local-config.xml, a site-specific override in SITECONF, and (optionally) a user-specified config file.
    • A prototype of this already exists. However, in order to bootstrap the glidein, we need to know the name of the latest gWMS config file and SHA1 signatures and update these as necessary in CVMFS. These can be taken from a job in queue - i.e., I can write a script that we could execute as a cron on the central CVMFS repo host that would work 80% of the time. We'd want to ask the glideinWMS team to make these centrally available so we can reliably update CVMFS nightly.
  • Script to indicate "load".
  • "APF-lite" - script that will run in the local HTCondor schedd and submit pilots. This is for sites that have Role=pilot (LPC use case). Note: still some interesting security / scheduling challenges to overcome before this is reality.

Site-customized glideins in depth

This option allows a site to specify a list of local users; CMS will send an identifiable pilot (special DN and/or VOMS role) and guarantee only the specified users will be able to run jobs on that pilot.

The implementation will reuse the existing factory entries and will require only one new group in the frontend (regardless of the number of participating sites).

Technical details [deprecated]

The "site customized glideins" solution is in production for CRAB3 since September 2015 and here there are the actual details: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CompOpsCustomizedGlideins

The real implementation is currently a simplified version of what explained below which is kept for reference.

  1. Setup a new group, “group_local”. Associate this with a specific DN (could perhaps be Role=uscms or simply we have a special DN).
  2. Maintain a new file, /etc/gwms-frontend/group_mapping. This is line-oriented, two columns. The first column is a group name; the second column is the corresponding site name.
    • The idea is a line like “Nebraska T2_US_Nebraska” means jobs in group “Nebraska” are allowed to use the local DN at site T2_US_Nebraska.
    • This file will get shipped with the glidein, meaning we need to remember to do a reconfig post-update.
    • This is a N:M mapping. If we define a "uscms-higgs" group, we may want to map it to both UCSD and Nebraska.
  3. We map user's jobs to appropriate groups.
    • Once available in 8.3.4 / 8.3.5, we would use the submit features to enforce this. Until then, we can use the job router.
    • The user->group mappings could be periodically populated from sub-files in /cvmfs/cms.cern.ch/SITECONF/$SITENAME/JobConfig/local-users.txt. This allows sites to fully control the list of their local users.
    • We will provide further mechanisms (perhaps e-groups, allowing us to put "Brian" into the "higgs" group) in the future.
  4. In the match_expr for group_local, we invoke a python function with two arguments: job[“GroupName”] and glidein["attrs"]["GLIDEIN_CMSSite"] ("GroupName" will need to be added to the job match_attrs). This python function will have the mappings in group_mapping cached in memory and return true if the group is mapped to the site.
    • We drop this function into a separate module and reference it from the match_expr like __import__("modulefoo").match_group(job[“GroupName”], glidein["attrs"]["GLIDEIN_CMSSite"]).
  5. We add a new validation script which adds a config variable of the form: START = ( $(START) ) && (GroupName=?="group1" || GroupName=?="group2" ...) for the local GLIDEIN_CMSSite.
  6. We ask sites to prioritize the specific DN appropriately and to assume it only contains local users.
    • However, we say that we control the fairshare and group quotas. Caltech doesn't manage the fairshare of CaltechUserA versus CaltechUserB.


An update on this is in this thread https://hypernews.cern.ch/HyperNews/CMS/get/comp-ops/2031/1/1.html

-- JamesLetts - 2014-11-14

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2015-09-24 - MarcoMascheroni
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback