Glideinwms Deployment

glideinWMS components

A glideinWMS system is composed of 4 components:

  • A central manager/"Collector" (running the user pool collector and negotiator)
  • A submit node/"Schedd" (running the schedd)
  • A glidein factory, and
  • A VO fronted
There can be any number of each component in the system (but at least one of each), and they several could be co-located on the same physical host. For the purpose of this document, however, we will assume we have one of each, and that each is located on a dedicated node.

For a high level description see the glideinWMS training at CERN and/or the glideinWMS Web area.

Requirements for the machines

Action Item/Service Comments
Central manager/"Collector" (vocms97, vocms097 (global prod), vocms099 (global analysis), vocms159 (itb), vocms165 (hlt) ]
Open the port range from 9620/tcp to 10319/tcp Collector/Negotiator machine, the Collector(s) must be accessible from the WAN, assuming 1+number_of_collectors collectors, i.e. 700 in production currently (Feb 2014)
Open ports 51450/tcp and 41450/tcp Ports used by HA (high availability, 51450) and Replication (41450) daemons
Local OS ulimit -n 16384
  Local firewall must be shut down
  ulimit -a; see "virtual memory", should be set to "unlimited" (was 768000 which prevents us run more than 8k)
Software HTCondor
Confirgation files Available up-to-date from:
Local users/accounts _gfrontend:x:100002:100000:GlideIn Front-End:/home/gfrontend:/bin/bash
_condor:x:100003:100000:Condor Pool:/home/condor:/bin/bash
_gfactory:x:100001:100000:GlideIn Factory:/home/gfactory:/bin/bash
Submit node/"Schedd" (vocms[201,202,216,234,235,237,etc, see List of Machines] )
Open the port 4080 Condor schedd, the shared port daemon must be accessible from the WAN, see additional settings for config_config on the user schedd side
gridFTP for WMAgent WMAgent runs gridFTP as root on vocms144, we need gridFTP on WMAgent+schedd machine. (Old: Use custom quattor template for this machine.)
Local OS ulimit -n 16384
Software HTCondor, WMAgent
Configuration files

Available up-to-date from:

Local users

cmst1 login

VO Frontend [ vocms157 (main prod), vocms0157 (global prod), vocms0167 (global analysis), vocms143 (itb) ]
Web server Needs a port open accessible from WAN (both in the CERN firewall and in the local firewall)
  sudo cat /selinux/enforce # should be 0
  sudo echo 0 >/selinux/enforce
Generate service certificate service cert for the glideins:
  subject= /DC=ch/DC=cern/OU=computers/CN=cmspilotjob/vocms{157,0157,0167,143}.cern.ch
Register service cert in the voms /cms/Role=pilot Currently voms role is "production" -- new global pool machines will use role "pilot"
  need a voms proxy extension: voms-proxy-init --voms cms:/cms/Role=pilot
Software GlideinWMS Frontend, Apache web server
Local users/accounts _gfrontend:x:100002:100000:GlideIn Front-End:/home/gfrontend:/bin/bash
_condor:x:100003:100000:Condor Pool:/home/condor:/bin/bash
_gfactory:x:100001:100000:GlideIn Factory:/home/gfactory:/bin/bash
Glidein Factory [ vocms32 and vocms158 (itb) ]
Web server Needs http port open and as well a port range open accessible from WAN (both in the CERN firewall and in the local firewall). Burt recommends going to at least 1k for safety, i.e. LOWPORT=20000, HIGHPORT=24999 in the condor_config.local
  sudo cat /selinux/enforce # should be 0
  sudo echo 0 >/selinux/enforce
Open the port for Globus Needs a range of ports opened for Globus callbacks, TCP/UDP 20000-25000 (both in the CERN firewall and in the local firewall)
Open port for external frontends Open port 9618 in central and local firewall
Software GlideinWMS Factory, HTCondor, javascriptrrd, Apache web server
Reference page here.
Local users/accounts

Condor and factory login, plus one account for each frontend that this factory will be communicating with (_fecms{cern,fnal,ucsd})
In the future, there will only be one frontend, possibly _fecmsglobal

_gfrontend:x:XX:100000:GlideIn Front-End:/home/gfrontend:/bin/bash
_fecmsucsd:x:XX:100000:Front-End CMS UCSD:/data/empty:/sbin/nologin
_fecmsfnal:x:XX:100000:Front-End CMS FNAL:/data/empty:/sbin/nologin
_fecmscern:x:XX:100000:Front-End CMS CERN:/data/empty:/sbin/nologin
_fecmsglobal:x:XX:100000:Front-End CMS GLOBAL:/data/empty:/sbin/nologin
_condor:x:100003:100000:Condor Pool:/home/condor:/bin/bash
_gfactory:x:XX:100000:GlideIn Factory:/home/gfactory:/bin/bash

Common for all hosts
Clean installation of the machines Parrot configuration
Sudo on all hosts for amccrea, klarson1, sfiligoi, bbockelm, jletts, zvada, belforte, dmason  
Generate host/key certificates available at: /etc/grid-security/
UI software with grid commands source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.sh
  or install it locally (some recent version of two packages: voms-clients; globus-proxy-utils)
Grid certificates should come from quattor, /etc/grid-security/certificates/
Local service accounts** _condor:_gwms (preferred entry in /etc/passwd: _condor:_gwms:XX:Condor Pool:/home/condor:/bin/bash)
  _gfrontend:_gwms (preferred entry in /etc/passwd: _gfrontend:_gwms:XX:GlideIn Front-End:/home/gfrontend:/bin/bash)
  _gfactory:_gwms (preferred entry in /etc/passwd: _gfactory:_gwms:XX:GlideIn Factory:/home/gfactory:/bin/bash)
  _wmagent:_gwms (preferred entry in /etc/passwd: _wmagent:_gwms:XX:WMAgent:/home/wmagent:/bin/bash)
System-wide file descriptors everywhere where condor runs, so better on all machines, here is doc

*We must be sure that this list is in quattor template, otherwise quattor tools will delete all software which was installed manually.

**Remark: People operating Factory on respective machines must have added ssh-keys into these local accounts (ask service responsible person). Also, if needed, these people would have sudo privilege.

Directory Structure

  • /data
    • /data/certs - local copy of host certificates goes here (chmod 750), also auth files needed for WMAgent (which will not be used in long term) could go here.
    • /data/admin - directory to keep all the scripts and things needed for deployment, monitoring and operating
    • /data/admin/condor - condor tarballs
    • /data/admin/wmagent - for the WMAgent part (deployment scripts goes here and WMAgent.secret can go here)
    • /data/srv - all the software should be deployed here
    • /data/srv/condor - condor installation
    • /data/srv/wmagent - wmagent installation
    • /data/srv/factory - factory install
    • /data/srv/frontend - frontend install
    • /data/srv/vdt
    • /data/srv/sw - additional software or libraries needed, such as javascriptrrd for the factory and frontend
  • /etc
    • /etc/condor - condor config files (in config.d mode)

Software:

Users and local accounts used on the machines to deploy, run and operate the service

Local accounts on the machines. Logging in works on ssh public keys.

  • VO Frontend machines (vocms143 and vocms157): _gfrontend:_gwms
  • GlideinWMS Factory machines (vocms107 and vocms158): Condor as root factory as _gfactory:_gwms
  • User Pool machines (vocms105, vocms159, vocms120, vocms164): _condor:_gwms
cmst1 account is used on the Schedd machines. Login as yourself and then sudo to cmst1 accout.

Directory structure(old)

We follow current directory structure to deploy and run the services.

Deployment instructions

Need to deploy in this order:

  • [1] glideinWMS Schedds and Collector (root@factory_machine) Instrunctions
  • [2] Glidein Factory (_gfactory@factory_machine) Instrunctions
  • [4] User Pool Collector (_condor@central_collector_machine) Instrunctions
  • [5] User Schedd (_condor@user_submitter_machine) Instrunctions
  • [6] Condor for VO Frontend (_gfrontend@frontend_machine) Instrunctions
  • [7] VO Frontend (_gfrontend@frontend_machine) Instrunctions
  • [8] WMAgent (_condor@cubmitter_machine) Instrunctions

Current list of all GlideinWMS Machines

-- Main.AlisonMcCrea - 12-Sep-2012

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2014-03-21 - IvanGlushkov
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback