LCG Computing Element (lcg CE)

Functional description

LCG CE is a native computing resource access service with Globus Gatekeeper. LCG has modified some of its component to improve its performance like job manager.

Daemons running

  • globus-gatekeeper - must be started
  • globus-gridftp - must be started
  • globus-job-manager-marshal - must be started
  • globus-gass-cache-marshal - should be started, but the client is able to work in fall-back mode with stopped daemon
  • globus-gma - must be started if GLOBUS_GMA is enabled in site's config

Init scripts and options (start|stop|restart|reload|...)

  • globus-job-manager-marshal, globus-gass-cache-marshal and globus-gma scripts support 'reload' action to send a SIGHUP to a daemon.

Configuration files location with example or template

  • /opt/globus/etc/globus-gass-cache-marshal.conf, /opt/globus/etc/globus-job-manager-marshal.conf
  • logf (srting) - location of the log file (default is relative to GLOBUS_LOCATION)
  • maxproc (numeric) - maximum number of parallel requests [this is the most useful variable for tuning] (5 by default)
  • rrobin (numeric) - enables round-robin queue mode for users(1) or groups (2) (disabled (0) by default)
  • tick (numeric) - hung child processes are killed every this number of seconds (if no other events are happening) (300 by default).
  • reqtout (numeric) - client should send a complete request in this number of seconds after connection (10 by default)
  • proctout (numeric) - each request (child process) is allowed to run this number of seconds (600 by default)
  • reqlimit (numeric) - maximum size of a request in bytes (16384 by default). One should increase this limit if environment is very large.
  • window (numeric) - data block for recv/send in bytes, probably should never be changed (default value is 4096 (x86 page size)).
  • debug (numeric) - debug level. There are three of them: 0 - only warnings (default), 1 - all messages, 2 - stderr is being redirected to the log file (bad for log parsers, but good for catching problems in perl jobmanagers)
  • All parameters (except debug 1 -> 2) could be changed online (modify config file and sent a sighup), both daemons create pidfiles in /var/run/ but their location is not configurable.

Logfile locations (and management) and other useful audit information

  • /opt/globus/var/log/*.log - configurable with logf option above.
  • /var/log/globus-gridftp.log
  • /var/log/globus-gatekeeper.log
  • /var/log/message
  • /opt/edg/var/gatekeeper/

Open ports

Possible unit test of the service

Submitting jobs to it through both WMS and globus-job-run

Where is service state held (and can it be rebuilt)

Under home directory of pool account

Cron jobs

The cron jobs can be found in:

  • /etc/cron.d/

and are:

  • bdii-proxy
  • edg-mkgridmap
  • lcg-expiregridmapdir
  • cleanup-grid-accounts
  • edg-pbs-knownhosts
  • cleanup-job-records
  • edg-pbs-shostsequiv
  • edg-apel-pbs-parser
  • fetch-crl

Security information

  • This section contains information on security service about lcg-CE.

Access control Mechanism description (authentication & authorization)

How to block/ban a user

  • If it is necessary to ban a user on a CE, the following step:

  • Add the user(s)'s DN into the "ban_users.db" file, which in default can be found at /opt/edg/etc/lcas/ or /opt/glite/etc/lcas/ if it is glite CE, as follow:
    • "User1's DN"
    • "User2's DN"
    • ... ... ...
    • "UserN's DN"

  • If there are multiple DNs to be banned, each DN name should be in separated lines and must be quoted with the double quote mark (""), otherwise LCAS will not be able to block the user. At the moment, LCAS does not support wild mark, therefore you can not use "/C=UK/O=eScience/OU=CLRC/L=RAL/*" to ban a group of users. To verify that the user has indeed been banned, in the log there should be something like "LCAS failed authorization" if the job of the banned user landed on the CE.

  • Nothing needs to be restarted

  • If it is necessary to ban a VO reconfigure the service without that VO
    • Will also adapt the information system

Network Usage

Firewall configuration

Security recommendations

Security incompatibilities

List of externals (packages are NOT maintained by Red Hat or by gLite)

Other security relevant comments

  • If you need to handle suspicious jobs, these the step tp follow:
    • Pause or stop the batch system queues
    • Suspend all active jobs, if the batch system supports it
    • Stop gatekeeper and gridftp-server while suspected DNs not yet identified
    • Ban suspected DNs or VO
    • Keep the active jobs submitted by the suspected accounts suspended if possible, to facilitate forensic investigations. Otherwise kill the jobs.
    • Follow the EGEE Incident Response Procedure: IncidentReporting

Utility scripts

Location of reference documentation for users

Location of reference documentation for administrators

This topic: EGEE > WebHome > SA3 > ServiceReferenceCards > LcgCE
Topic revision: r9 - 2009-01-16 - LorenzoSbolgi
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback