LCG Computing Element (lcg CE)
LCG CE is a native computing resource access service with Globus Gatekeeper. LCG has modified some of its component to improve its performance like job manager.
- Functional description
- Daemons running
- Init scripts and options (start|stop|restart|...)
- Configuration files location with example or template
- /opt/globus/etc/globus-gass-cache-marshal.conf, /opt/globus/etc/globus-job-manager-marshal.conf
- logf (srting) - location of the log file (default is relative to GLOBUS_LOCATION)
- maxproc (numeric) - maximum number of parallel requests [this is the most useful variable for tuning] (5 by default)
- tick (numeric) - hung child processes are killed every this number of seconds (if no other events are happening) (300 by default).
- reqtout (numeric) - client should send a complete request in this number of seconds after connection (10 by default)
- proctout (numeric) - each request (child process) is allowed to run this number of seconds (600 by default)
- reqlimit (numeric) - maximum size of a request in bytes (16384 by default). One should increase this limit if environment is very large.
- window (numeric) - data block for recv/send in bytes, probably should never be changed (default value is 4096 (x86 page size)).
- debug (numeric) - debug level. There are three of them: 0 - only warnings (default), 1 - all messages, 2 - stderr is being redirected to the log file (bad for log parsers, but good for catching problems in perl jobmanagers)
- All parameters (except debug 1 -> 2) could be changed online (modify config file and sent a sighup), both daemons create pidfiles in /var/run/ but their location is not configurable.
- Logfile locations (and management) and other useful audit information
- Possible unit test of the service
- Where is service state held (and can it be rebuilt)
- Cron jobs
- Utility scripts
- Location of reference documentation for users
- Location of reference documentation for administrators
--
DiQing - 28 Apr 2008
Topic revision: r2 - 2008-04-28
- DiQing