LCG Computing Element (lcg CE)
Functional description
LCG CE is a native computing resource access service with Globus Gatekeeper. LCG has modified some of its component to improve its performance like job manager.
Daemons running
- globus-gatekeeper - must be started
- globus-gridftp - must be started
- globus-job-manager-marshal - must be started
- globus-gass-cache-marshal - should be started, but the client is able to work in fall-back mode with stopped daemon
- globus-gma - must be started if GLOBUS_GMA is enabled in site's config
Init scripts and options (start|stop|restart|reload|...)
- globus-job-manager-marshal, globus-gass-cache-marshal and globus-gma scripts support 'reload' action to send a SIGHUP to a daemon.
Configuration files location with example or template
- /opt/globus/etc/globus-gass-cache-marshal.conf, /opt/globus/etc/globus-job-manager-marshal.conf
-
logf
(srting) - location of the log file (default is relative to GLOBUS_LOCATION)
-
maxproc
(numeric) - maximum number of parallel requests [this is the most useful variable for tuning] (5 by default)
-
rrobin
(numeric) - enables round-robin queue mode for users(1) or groups (2) (disabled (0) by default)
-
tick
(numeric) - hung child processes are killed every this number of seconds (if no other events are happening) (300 by default).
-
reqtout
(numeric) - client should send a complete request in this number of seconds after connection (10 by default)
-
proctout
(numeric) - each request (child process) is allowed to run this number of seconds (600 by default)
-
reqlimit
(numeric) - maximum size of a request in bytes (16384 by default). One should increase this limit if environment is very large.
-
window
(numeric) - data block for recv/send in bytes, probably should never be changed (default value is 4096 (x86 page size)).
-
debug
(numeric) - debug level. There are three of them: 0 - only warnings (default), 1 - all messages, 2 - stderr is being redirected to the log file (bad for log parsers, but good for catching problems in perl jobmanagers)
- All parameters (except debug 1 -> 2) could be changed online (modify config file and sent a sighup), both daemons create pidfiles in /var/run/ but their location is not configurable.
Logfile locations (and management) and other useful audit information
- /opt/globus/var/log/*.log - configurable with
logf
option above.
- /var/log/globus-gridftp.log
- /var/log/globus-gatekeeper.log
- /var/log/message
- /opt/edg/var/gatekeeper/
Open ports
Possible unit test of the service
Submitting jobs to it through both WMS and globus-job-run
Where is service state held (and can it be rebuilt)
Under home directory of pool account
Cron jobs
The cron jobs can be found in:
and are:
- bdii-proxy
- edg-mkgridmap
- lcg-expiregridmapdir
- cleanup-grid-accounts
- edg-pbs-knownhosts
- cleanup-job-records
- edg-pbs-shostsequiv
- edg-apel-pbs-parser
- fetch-crl
Security information
- This section contains information on security service about lcg-CE.
Access control Mechanism description (authentication & authorization)
How to block/ban a user
- If it is necessary to ban a user on a CE, the following step:
- Add the user(s)'s DN into the "ban_users.db" file, which in default can be found at /opt/edg/etc/lcas/ or /opt/glite/etc/lcas/ if it is glite CE, as follow:
- "User1's DN"
- "User2's DN"
- ... ... ...
- "UserN's DN"
- If there are multiple DNs to be banned, each DN name should be in separated lines and must be quoted with the double quote mark (""), otherwise LCAS will not be able to block the user. At the moment, LCAS does not support wild mark, therefore you can not use "/C=UK/O=eScience/OU=CLRC/L=RAL/*" to ban a group of users. To verify that the user has indeed been banned, in the log there should be something like "LCAS failed authorization" if the job of the banned user landed on the CE.
- Nothing needs to be restarted
- If it is necessary to ban a VO reconfigure the service without that VO
- Will also adapt the information system
Network Usage
Firewall configuration
Security recommendations
Security incompatibilities
List of externals (packages are NOT maintained by Red Hat or by gLite)
Other security relevant comments
- If you need to handle suspicious jobs, these the step tp follow:
- Pause or stop the batch system queues
- Suspend all active jobs, if the batch system supports it
- Stop gatekeeper and gridftp-server while suspected DNs not yet identified
- Ban suspected DNs or VO
- Keep the active jobs submitted by the suspected accounts suspended if possible, to facilitate forensic investigations. Otherwise kill the jobs.
- Follow the EGEE Incident Response Procedure: IncidentReporting
Utility scripts
Location of reference documentation for users
Location of reference documentation for administrators