Release notes for the gLite 3.0.2 WMS Checkpoint release - patch 1167
Installation
Please use the following repository to install
http://lxb2042.cern.ch/gLite/APT/R3.1-RB-pretest/rhel30/
The following meta-packages are available
- glite-WMS
- glite-LB
- glite-WMSLB
To use apt-get, create
glite.list
in
/etc/apt/sources.list.d
with the following contents.
rpm http://lxb2042.cern.ch/gLite/APT/R3.1-RB-pretest rhel30 externals Release3.1 updates
For CAs, you may need the following apt-get repository (for example in
lcg-ca.list
)
rpm http://linuxsoft.cern.ch/ LCG-CAs/current production
Configuration
The
YAIM configuration for this WMS checkpoint release doesn't differ from the 3.0.1 from the configuration point of view. But all python scripts have been replaced with bash scripts. All configuration files used with
YAIM 3.0.1 should be compatible with the
YAIM 3.1.0
The yaim 3.1 is currently in finalization state and there is still a list of known problems and imperfections we are fixing. There are several modifications between yaim 3.0.1 and 3.1:
- the configure_node, install_node and run_function are obsoleted although they are still located in the "/opt/glite/yaim/scripts" directory, please do not use them The configuration will very probably fail. All these commands will be removed in the next release. Their functionality has been replaced by new command yaim which has been introduced in yaim 3.0.1 and in yaim version 3.1 it became the only way to configure gLite middleware using yaim.
- changes in the yaim packaging. Yaim 3.1 has a modular structure (
glite-yaim-core
, glite-yaim-clients
) in contrary to the monolithic distribution of yaim 3.0.1 (glite-yaim package)
- added service and node based configuration
The detailed documentation for yaim 3.1 is currently prepared and will be accessible soon from the :
yaim 3.1 page.
Regarding WMS and LB configuration, when LB and WMS are installed on separate nodes, you should add a variable LB_HOST in your site-info.def, for example, like
LB_HOST='"<LB_HOSTNAME>:9000"'
or
LB_HOST='"<LB1_HOSTNAME>:9000","<LB2_HOSTNAME>:9000","...","..."'
if you have multiple LBs. Please configure your WMS and LB as follows
/opt/glite/yaim/bin/yaim -c -s <site-info.def> -n glite-WMS
/opt/glite/yaim/bin/yaim -c -s <site-info.def> -n glite-LB
Or combine them together with
/opt/glite/yaim/bin/yaim -c -s <site-info.def> -n glite-WMS -n glite-LB
Known issues
- LB node: There is a bug in bkserverd which causes memory leak. This happens when LB server DB is configured with transactional database support. The workaround is to modify
/opt/glite/etc/init.d/glite-lb-bkserverd
by add "-b 0" into the start procedure
Ex:
su - $GLITE_USER -c "$GLITE_LOCATION/bin/glite-lb-bkserverd -b 0 \
Instead of
su - $GLITE_USER -c "$GLITE_LOCATION/bin/glite-lb-bkserverd \
and then restart bkserverd. This will disable transactional database support. See
bug #27555
.
- Bug 25932
is still not fixed in this version of LB server, when you install LB and WMS on the same node, you also need to add the host DN into /opt/glite/etc/LB-super-users
as the separate LB and WMS
Ex: (for 2 separate machines, lxb7283 - WMS, lxb7026 - LB)
cat /opt/glite/etc/LB-super-users # On LB
/C=CH/O=CERN/OU=GRID/CN=host/lxb7283.cern.ch
/C=CH/O=CERN/OU=GRID/CN=host/lxb7026.cern.ch
- Need to set the permissions of directory /var/lib/mysql manually: chmod og+rx /var/lib/mysql/ (it may only happen on the machines at CERN, but it would be good that YAIM can check it). See bug #27653
.
- /etc/glite/profile.d/glite_setenv.sh contains gridpath_append and gridenv_set commands before sourcing file /opt/glite/etc/profile.d/grid-env-funcs.sh. This is a bug in YAIM core. On the production machines at CERN, by default, LANG is set to C, then this bug is triggered. It doesn't happen if LANG is set to en.US or some others (example: export LANG=en_US.UTF-8). See bug #27577
.
- The packages bdii-3.8.8-1 and glue-schema-1.2.2-1_sl3, installed with the glite-WMS meta-package, are out of date with the yaim configuration scripts. The only possible work around is to upgrade manually to versions bdii-3.9.0-1
and glue-schema-1.3.0-2
. See bug #27655
. Fixed.
- LB node: The glite-LB metapackage does not install the rpms required by function config_bdii. This will produce the following error in the configuration of a stand-alone LB:
INFO: Executing function: config_bdii
error reading information on service bdii: No such file or directory
bdii: unrecognized service
bdii: unrecognized service
The workaround is to install manually all necessary rpms: bdii; glue-schema; lcg-info-templates; lcg-schema. See
bug #27656
.
- Directory /opt/glite/var/log/ does not exist (can not create file /opt/glite/var/log/xferlog) on WMS. See bug #18306
.
- Don't use the c-ares from OS, instead, install the one from WMS+LB repository (here
). If it is installed, please remove the one from OS. Need to check if it is CERN specific or not.
- lb101 is trying to connect to wms101 and the connection is blocked by the firewall on rb101:
Jun 29 19:21:28 wms101 kernel: [DENIED] IN=eth0 OUT= MAC=00:30:48:68:ed:f8:0a:00:30:81:ad:81:08:00 SRC=137.138.4.182 DST=128.142.173.15
3 LEN=40 TOS=0x00 PREC=0x00 TTL=60 ID=0 DF PROTO=TCP SPT=9001 DPT=40334 WINDOW=0 RES=0x00 RST URGP=0
Is it expected ? Mail sent to the developers.
- Need to modifiy some parameters of the MySQL database in order to remove the 4GB limitation:
ALTER TABLE short_fields MAX_ROWS=1000000000;
ALTER TABLE long_fields MAX_ROWS=55000000;
ALTER TABLE states MAX_ROWS=9500000;
ALTER TABLE events MAX_ROWS=175000000;
See
bug #27658
.
- If for some reason, any of the condorc scripts (condorc-launcher/condorc-advertiser/condorc-authorizer) gets hold or is removed, restarting service gLite does not bring it back to live. Two workarounds are possible: running configuration again or executing 'su $GLITE_USER -c /opt/condor-c/libexec/glite/condorc-initialize'.
- Need to set to a bigger value (eg. 15) the threshold for which no new job submission is authorized because of a high load (see glite_wms.conf file).
- 2007-07-24: Since yesterday afternoon, I noticed a high cpu utilization of lb101 (~100%), and it was not the case until now (see lemon monitoring web page). After some investigations with Di, we found out that the bkserverd processes are crashing and SIGSEGV signal is triggered. So, as far as I understand, each time a bkserverd process is crashing, a new one is quickly created, causing this high cpu utilization. Zdenek Salvet has been contacted and he thinks that the crashes are caused by wrong LB server RPM being installed, the right match for glite-lb-common-5.0.3-1 is glite-lb-server-1.5.5-1. He solved the problem without installing the new rpm, as requested by me.