Notes on the gLite 3.0 RC2 release to pre-production.

The gLite 3.0 PPS release is now available. It is based on LCG-2_7_0 with the addition of

gLite WMS/LB
gLite CE
Combined gLite/LCG WN
Combined gLite/LCG UI
FTS server
FTA

There is an apt-get repository for PPS;

rpm http://lxb2042.cern.ch/gLite/APT/R3.0-pps rhel30 externals Release3.0 updates

The CAs have been decoupled from the release - further info on how to install them can be found here

http://grid-deployment.web.cern.ch/grid-deployment/lcg2CAlist.html

Result of Certificaiton

glite 3.0 RC2 has been evaluated on the Certification Testbed.

The glite WMS failed stress testing as the network server failed due to bug #15761. The cron job that restarts the network server also failed (see note). The glite bulk submission was not tested due to the above failure. The FTS also failed as the configuration is still incomplete.

Note: The new cron, does not run cron jobs in cron.d if the file has executable permission. The following cron jobs will fail it this is set. Please ensure that after an install you ensure these will work by running "chmod a-x /etc/cron.d/*"

UI
-rwxr-xr-x    1 root     root          267 Mar 24 15:09 glite-fetch-crl.cron

WMS
-rwxr-xr-x    1 root     root          268 Apr  6 15:03 glite-fetch-crl.cron
-rwxr-xr-x    1 root     root          160 Apr  6 15:04 glite-wms-check-daemons.cron
-rwxr-xr-x    1 root     root          158 Apr  6 15:02 glite-wms-ns-proxy.cron
-rwxr-xr-x    1 root     root          680 Apr  6 15:02 glite-wms-purger.cron
-rwxr-xr-x    1 root     root          241 Apr  6 15:02 glite-wms-wmproxy-purge-proxycache.cron


MON
-rwxr-xr-x    1 root     root          211 Mar 27 19:45 glite-iperf-check
-rwxr-xr-x    1 root     root          207 Mar 27 19:21 glite-udpmon-check


 CE glite
-rwxr-xr-x    1 root     root          267 Apr  6 12:26 glite-fetch-crl.cron

List of targets;

Please use yaim's install_node script for fresh installs. For upgrades from RC1, use apt-get dist-upgrade.

The repository and yaim now support yum. If you use yaim for installation, set REPOSITORY_TYPE="yum" in site-info.def before running install_node. This will configure yum for you.

Many meta-rpm names have now been changed to rationalise the naming (lcg-* -> glite-*). For upgrading a node whose name has changed, please do the following (for example)

rpm -e lcg-WN
apt-get install glite-WN
apt-get dist-upgrade

The metapackages available are;

glite-UI (a combined LCG/gLite UI)
glite-WN (a combined LCG/gLite UI)
glite-FTS (FTS server plus related services)
glite-CE (the gLite CE)
glite-WMSLB (WMS and LB, recommended deployment of the WMS)
glite-BDII
glite-LFC_mysql
glite-LFC_oracle
glite-MON
glite-PX
glite-SE_classic
glite-SE_dpm_mysql
glite-SE_dpm_oracle
glite-SE_dpm_disk
glite-SE_dcache
glite-SE_dcache_gdbm
glite-VOBOX
glite-VOMS_mysql
glite-VOMS_oracle
lcg-RB
lcg-CE
lcg-CE_torque
glite-FTA

Many of these node types are described in the LCG Manual Install Guide

http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Manual-Install/

Configuration

Configuration for all above components is now supported via yaim (FTS still requires a manual step). Note that the configuration targets have not yet been fully synchronised with the installation targets and some names are different.

Yaim has been renamed glite-yaim and has been relocated to /opt/glite/yaim. Please

  • Ensure any customised files are moved from /opt/lcg/yaim
  • Ensure your site-info.def references the new location for FUNCTIONS_DIR and perhaps others (eg USERS_CONF)
  • Put configuration files in /opt/glite/yaim/etc

Configuration for all 'gLite' components is also supported via the native (XML) system.

Where yaim is configuring a gLite node type, it populates the XML files and runs the gLite config scripts. Please note that any modifications you make to the XML files, to parameters not managed by yaim, should be preserved. Parameters managed by yaim will be clearly marked in the XML after it has been run. The intention is that yaim offers a simple interface if prefered, but the ability to use the more powerful native machanism is retained.

Please use yaim to configure pool accounts. Yaim allows non contiguous ranges of uids which some sites require and is therefore the default user configuration mechanism.

Yaim is in the apt-get repository.

New Yaim parameters;

WMS_HOST - gLite WMS + LB
FTS_HOST - for building an FTS server
REPOSITORY_TYPE - defaults to apt, but yum can be used. 
BATCH_BIN_DIR - The path of the lrms commands, eg /usr/pbs/bin
BATCH_VERSION - The version of the Local Resource Managment System, eg OpenPBS_2.3
LFC_DB_HOST - Set this to use a separate db server for LFC
LFC_DB - Set this to define the name of LFC's db

Some parameters have changed for the DPM

DPM_FILESYSTEMS - The filesystems/partitions parts of the pool
DPM_DB_USER - The database user (was DPMMGR)
DPM_DB_PASSWORD - The database user password (was DPMUSER_PWD)

so the following are no longer used

DPMMGR
DPMUSER_PWD
DPMPOOL_NODES

There is more information in the example site-info.def file

Notes on particular node types;

lcg-RB

Condor is upgraded to 6.7.10 there is a new condor-lcg package which provides LCG modifications to the gahp_server and grid_monitor. Configuration of these is handled by yaim.

glite WMS + LB

To install the glite WMS + glite LB (recommended deployment scenario)
install_node site-info.def glite-WMSLB
configure_node site-info.def WMSLB

Combined UI

The gLite 3.0 UI is a 'combined' UI, incorporating LCG and gLite components.

On the combined node, please watch out for glite commands which are symlinked to edg commands and may appear earlier in the PATH than their edg counterparts. The extent to which the glite symlinks can provide the functionality of the edg commands they replace is untested. These symlinks will be removed in future releases.

The RPM based userland installation finished without conflicts but there are lots of warnings and errors due to install scripts which require root privilege.

install_node site-info.def glite-UI
configure_node site-info.def UI_combined

WN

The gLite WN has combined gLite and LCG components

install_node site-info.def glite-WN
configure_node site-info.def WN_combined

glite-WN + Torque client

install_node site-info.def glite-WN glite-torque-client-config
configure_node site-info.def WN_combined_torque 

FTS

In the case of the FTS yaim will configure all related services such as crl downloads, info provider etc but the FTS server itself must be configured using the usual gLite system. A yaim component will follow.

install_node site-info.def glite-FTS
configure_node site-info.def FTS

gLite CE

The gLite CE is configured to support only VOMS proxies.

install_node site-info.def glite-CE
configure_node site-info.def gliteCE

If you want your gliteCE to run the site BDII;

configure_node site-info.def gliteCE BDII_site

The glite-CE configuration configures also software and scheduler GIP plugins. Due to the bug in the /opt/lcg/libexec/lcg-info-dynamic-scheduler file the following command must be run in order to get a correct functionality:

# sed -i '{s/jobmanager/blah/}' /opt/lcg/libexec/lcg-info-dynamic-scheduler

Batch systems and the gLite CE

If you are installing your batch system server on the same node as the CE, and you want to use yaim or gLite to configure it, please choose one or the other and stick to it. If you use yaim and then make modifications via the gLite system, any rerun of yaim will reset the configuration. The same advice applies to management of WNs. If yaim fulfils your needs, this is the recommended route.

glite-CE + Torque server

install_node site-info.def glite-CE glite-torque-server-config
configure_node site-info.def gliteCE TORQUE_server

Note that the log-parser daemon must be started on whichever node is running the batch system. If your CE node is also the batch system head node, you have to run the log-parser here.

If you are running two CEs (typically LCG and gLite versions) please take care to ensure no collisions of pool account mapping. This is typically achieved either by allocating separate pool account ranges to each CE or by allowing them to share a gridmapdir.

DPM

A VOMS enabled DPM (1.5.5) is now available. Upgrade from LCG-2_7_0 is supported.

install_node site-infoe.def glite-SE_dpm_mysql
configure_node site-info.def [SE_dpm_mysql|SE_dpm_disk]

dCache

The yaim script for configuring dCache has received many updates from GridPP. It offers extended functionality but is backward compatible.

Note that dcache may show errors if you have more than around 56 CAs. If this is the case, currently the only fix is to identify CAs you do not need to support and remove them.

Yaim does not yet support d-Cache with a postgresql based pnfs. To accommodate sites who have already upgraded to this version of pnfs, we now have two types of d-Cache SE.

glite-SE_dcache

This has no dependency on pnfs at all, so upgrades of either type (postgresql or gdbm) should work at the rpm level.

glite-SE_dcache_gdbm

This has a dependency on pnfs (ie the gdbm version) and is necessary for a new install. Please note however that pnfs_postgresql is the preferred implementation and migration is non trivial.

FTA

New yaim configuration for FTA. Please take the fta-info.def file from yaim's examples directory and append it to your site-info file before configuring.

install_node site-info.def  glite-file-transfer-agents-config
configure_node site-info.def FTA

Fixes with respect to RC1

The following most recent critical bug fixes are contained in the new release candidate 2:

Bug 15330: glite-wms-ui-cli-python masks commands from LCG UI https://savannah.cern.ch/bugs/?func=detailitem&item_id=15330

Bug 15642: When mapping all the VOs to one queue on a glite CE with LSF the ... https://savannah.cern.ch/bugs/?func=detailitem&item_id=15642 TO BE CONFIRMED BY DEVELOPER - INCONSISTENT STATE IN SAVANNAH

Bug 15674: Blah submission from a glite 3.0 CE (glite flavour) to an LSF queue does not work https://savannah.cern.ch/bugs/?func=detailitem&item_id=15674

Bug 15710: gLite 3.0 job wrapper has bad kill usage https://savannah.cern.ch/bugs/?func=detailitem&item_id=15710

Bug 15769: large job collection submission and cancel through WMproxy didn't work https://savannah.cern.ch/bugs/?func=detailitem&item_id=15769

Bug 15806: matchmaking slow for bulk submission https://savannah.cern.ch/bugs/?func=detailitem&item_id=15806

Bug 15874: FTS - Can't configure the http timeout in the ChannelAgent https://savannah.cern.ch/bugs/?func=detailitem&item_id=15874

Bug 15934: Blah submission from a glite 3.0... https://savannah.cern.ch/bugs/?func=detailitem&item_id=15934

In addition, the following bug fixes in yaim have been included

Bug 15101: LFC : central LFC configured for all the VOs supported by a site https://savannah.cern.ch/bugs/?func=detailitem&item_id=15101

Bug 15131: Wrong permissions in LFC catalog when VO name = local group name https://savannah.cern.ch/bugs/?func=detailitem&item_id=15131

Bug 15484: DPM and LFC config does not allow for alternative database name and server https://savannah.cern.ch/bugs/?func=detailitem&item_id=15484

Bug 15622: Request for optional LFC_DB_HOST variable in yaim. https://savannah.cern.ch/bugs/?func=detailitem&item_id=15622

Bug 15764: GLITE_TMP is set but directory is not created https://savannah.cern.ch/bugs/?func=detailitem&item_id=15764

Middleware components

The gLite 3.0 issue tracking page has information on what has been fixed in RC2

https://uimon.cern.ch/twiki/bin/view/LCG/Glite30IssueTracking

Yaim and configuration

  • Yaim support for new gLite services (combined UI, combined WN, TORQUE_server)
  • Support for VOs without VOMS (for gLite services)
  • Missing WMS_HOST switch off the configuration of gLite UI part of the combined UI
  • Return value of gLite configuration scripts is checked bug #15543
  • GIP configuration fixed on glite CE bug #15434
  • ACL publication fixed on gLite CE bug #15424
  • rationalisation of DPM configuration
  • LFC now suports a remote DB
  • FTA now yaim configurable
  • BDII - allow site BDII on gliteCE
  • Condor config for lcg-RB
  • No longer mandate home dir under /home for edginfo and edguser
  • Bogus 'requires' removed from config_gip
  • ERT plugin and software plugin for gliteCE (still requires manual step as plugin expects 'jobmanager')
  • config_mkgridmap - support new VOMS capability syntax
  • RGMA - set dir perms on /etc/tomcat5 and new CATALINA_OPTS
  • dcache - new native info provider

Outstanding bugs

During the integration and testing process a list of outstanding issues was maintained. Here is a summary of the issues which have not yet been addressed and were considered important;

savannah issue 15050 - this has NOT been fixed. The impact is of the order of a few jobs (<5) per thousand.

savannah issue 15189 - status not updated for nodes of a large collection - now fixed but missed the cut for RC2, now fine for 400 jobs, but doesn't work for 1000 jobs in a collection.

savannah issue 15894 - dynamic scheduler plugin on glite-CE doesn't provide correct information. Temporary fix:

# sed -i '{s/jobmanager/blah/}' /opt/lcg/libexec/lcg-info-dynamic-scheduler

savannah issue 15643 - proxy renewal works, job aborts after renewal. Voms credentials are dropped.

savannah issue 15688 - Jobs stay in ready state. Situation still not entirely clear, can be just a configuration problem

Publishing software tags by user. Not solved yet, we will add a gridFTP server later.

In configuring a UI you may see complaints about the absence of files in vomsdir. Please ignore this as the script is making an invalid assumption about the naming convention of files in there.

Notes

Other issues to remain aware of;

Between LCG-2_7_0 and gLite 3.0 MySQL has been upgraded from 4.0 to 4.1. There has been a change in the password encryption, please keep this in mind.

Pointers to documentation on the components of this release are being compiled here

http://www.grid.kfki.hu/afs/gdebrecz/web/LCG/the-LCG-directory.html

Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r17 - 2006-11-28 - LaurenceField
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback