Deployment of gLExec on the Worker Node
Applicability
The information on this page is relevant for WLCG sites with
CREAM or OSG-CE services (the ARC-CE presently is not concerned). Installation and configuration advice is given for EMI middleware. USATLAS and USCMS sites are asked to coordinate with their respective T1 sites instead. It is understood that for various reasons some sites may be unable to comply with the guidelines detailed below; the affected sites may need to be excluded from use by Multi User Pilot Jobs in the future.
Background
Each of the experiments has a framework that supports the submission of so-called pilot jobs:
- ALICE - AliEn
- ATLAS - PanDA
- CMS - GlideinWMS
- LHCb - DIRAC
On the Worker Node (WN) a pilot job deploys a pilot agent that contacts the task queue managed by the VO's framework, to obtain the highest-priority task (a.k.a. payload) that is compatible with the WN environment.
A pilot job can either be "single-user" (a.k.a. "private"), i.e. run only payloads submitted with the same credentials as its own, or "multi-user", in which case it may run payloads submitted by any user authorized by the VO.
Multi-user pilot jobs (MUPJs) can only be submitted by a small group of people in each experiment, essentially production managers, using a specific
VOMS role (Role=pilot).
When a payload has finished and the pilot job slot has sufficient CPU and wallclock time left, another task might be downloaded, possibly submitted by a different user, etc.
Various sites were uncomfortable with the idea that the users who submitted such tasks would not be identifiable by the site in case forensic investigations are needed after an incident.
This led to the idea of introducing a mechanism allowing sites to exercise authorization and obtain traceability within MUPJs. The proposed implementation depends on a command that allows the pilot job to imitate the CE to a certain extent.
That command is "glexec" and is similar to Apache's "suexec". Glexec has two running modes. In the "log-only" mode it logs the authorization decision and runs a given payload under the identity of the invoking pilot agent. In the "identity-changing" mode it logs the authorization decision, maps the payload proxy and runs the payload under the corresponding account. In the latter case glexec needs to be setuid root.
More details at:
https://wlcg-tf.hep.ac.uk/wiki/Multi_User_Pilot_Jobs
Time lines
During the WLCG Management Board meeting of Feb 8 2011 it was decided that CERN and the T1 sites should have gLExec on the WN working by the end of March 2011, while for T2 sites the aim was the end of June 2011.
Subsequent discussions then led to a suspension of the deployment until the matter of Multi User Pilot Jobs had been revisited in the WLCG Technical Evolution Groups on Security and Workload Management. The outcome is that currently the use of gLExec is the only viable method for user separation in Multi User Pilot Jobs and that its deployment should hence continue. This has recently been confirmed in the Management Board
meeting
of May 14, 2013, as summarized in the
minutes
. As of March 2016 the deployment efforts have been
concluded, see
here.
The deployment status is tracked on a separate
page.
The status of the experiment frameworks:
- ALICE - adaptations of AliEn under study.
- ATLAS - implementing use of gLExec by the PanDA pilot.
- CMS - transparent usage available through GlideinWMS.
- LHCb - usage configurable in DIRAC.
Further details at:
How to implement gLExec on the WN
This section pertains to EMI / UMD middleware. USATLAS and USCMS sites should consult with their respective T1 sites.
By default sites are advised to take EMI products from the EGI UMD, where products appear after passing a Staged Rollout phase (to which any EGI site can contribute):
For urgent bug fixes or features one can use the EMI repositories directly:
Please consult the current documentation for the various components; some of the links mentioned below may be out of date.
Check list
On the CE the pilot role needs to be configured for each supported experiment and for the "ops" VO
The roles should be mapped to separate sets of accounts that will be put into the gLExec "whitelist" (accounts that are allowed to run the "glexec" command). Examples are shown here:
https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables#Notes_on_using_SCAS_and_ARGUS
An Argus server should be set up
On the EMI-WN the "emi-glexec_wn" meta package should be installed.
Configure it (preferably in "setuid" mode) according to the YAIM documentation:
-
- http://wiki.nikhef.nl/grid/GLExec
- https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables#GLEXEC_wn
- Note: The "glexec" executable is expected to be located in the directory
$GLEXEC_LOCATION/sbin
if that variable is defined, else in the directory $GLITE_LOCATION/sbin
or simply /usr/sbin
in EMI. A future version of YAIM ought to ensure that GLEXEC_LOCATION
is always defined (see next item).
- Note: there is no relocatable version of the "gLExec" meta package yet. To deploy a relocatable version the admin would have to build "glexec" from its source, because the path to the "glexec" configuration file needs to be hardcoded in the executable for security reasons. The default path is
/etc/glexec.conf
, which reinforces the idea that "glexec" is more a system package than an ordinary middleware component. This may reduce the need for a relocatable version. For the time being, a recipe for building "glexec" and its dependencies from CVS sources is given here:
Option to install gLExec without Argus
-
- In this method, central banning can be accomplished as described here
, with a ban file that may be kept in one location, e.g. on NFS
- If log-only mode is used, a gridmapdir is not needed, and the glexec executable does not need to have the setuid bit set
- If setuid mode is used (recommended for traceability), the gridmapdir ought to be shared between the WN and the CE(s), to ensure consistent mappings and thereby avoid that users can access files (including proxies) of others
- An example glexec.conf file for the log-only configuration without Argus:
[glexec]
log_level = 3
user_white_list = .somegroup
linger = yes
user_identity_switch_by = glexec
use_lcas = no
lcmaps_debug_level = 3
lcmaps_get_account_policy = glexec_get_account
log_destination = syslog
- An example lcmaps-glexec.db file for the log-only configuration without Argus:
path = /usr/lib64/lcmaps
verify_proxy = "lcmaps_verify_proxy.mod"
" -certdir /etc/grid-security/certificates/"
" --allow-limited-proxy"
good = "lcmaps_dummy_good.mod"
" --dummy-username nobody"
" --dummy-group nobody"
" --dummy-sec-group nobody"
ban_dn = "lcmaps_ban_dn.mod"
"-banmapfile /somewhere/ban_users.db"
ban_fqan = "lcmaps_ban_fqan.mod"
"-banmapfile /somewhere/ban_users.db"
glexec_get_account:
verify_proxy -> ban_dn
ban_dn -> ban_fqan
ban_fqan -> good
Known issues
-
glexec < 0.9.9
aborts when the environment contains MALLOC_
variables
Monitoring of gLExec tests
For
CREAM services registered in the GOCDB (EGI and direct partners) the "ops" VO can submit hourly test jobs with "/ops/Role=pilot" as the primary attribute in the proxy and executing a simple test of the "glexec" command on the WN, using its own proxy as the "payload" proxy: no identity change shall occur, but all other aspects of the setup are thus tested. The tests used to be submitted centrally, but that functionality has been decommissioned early April 2012, see the notes below.
Already since Update 10 of the SAM-Nagios software these tests can be enabled in the ROC/NGI profile, such that sites will be able to monitor the results along with those for other tests:
Note: such automatic glexec tests only cover CE services that have also been declared with a
"gLExec"
type in the GOCDB.
Note: please declare your CE services that should receive the glexec tests!
To see the OPS gLExec test results for all NGI/ROC instances:
LHCb also run such tests:
CMS have developed a more elaborate test for which the results are published here:
ATLAS test results are published here:
ALICE test results are published here:
More information
- The mailing list "wlcg-glexec-deployment" (with cern.ch as the domain) is available for site admins to subscribe to and post questions.
--
MaartenLitmaath - 13-Apr-2011