gLite 3.0.2 PPS-Update 31 to PPS (Wed, 30 May 2007) - Full Report
glite-CE
Site Name: PPS-IFIC
Site Admin: Alvaro Fernandez
- Reconfigured gliteCE node.
- Reconfigured and restarted:
- job submission not working, but need further investigation (maybe not a release problem, but just a parameter configuration issue)
LFC-mysql
Site Name: PPS-LIP
Site Admin: Mario David
The LFC was "almost" not affected by this release patch 1165 of glite-yaim-3.0.1-17 it says:
No pool accounts are created on LFC
, but taking a look at the node-info.def we have:
BASE2_FUNCTIONS="
config_host_certs
config_users
.......
and
LFC_mysql_FUNCTIONS="${BASE1_FUNCTIONS} ${BASE2_FUNCTIONS}
LFC_oracle_FUNCTIONS="${BASE1_FUNCTIONS} ${BASE2_FUNCTIONS}
so in principle the function config_users is still run on the configuration of both LFC nodes (mysql and oracle). I will investigate further to check if this is true.
MON and IC
Site Name: pps-mon.egee.cesga.es
Site Admin: asimon,esfreire
It seems all is working fine.
SE-dpm-mysql
Site Name: RU-Moscow-KIAM-PPS
Site Admin: Eugenia Kovalenko
and
Site Name: prague_cesnet_pps
Site Admin: Jan Svec
There are not any errors during the upgrading. But two new variables in site-info.def are required:
DPM_INFO_USER
and
DPM_INFO_PASS
. Without these variables the value of "Total Available SE" is zero.
KIAM had some initial problems which PRAGUE helped to solve after upgrade and re-configuration, they made an aditional step by running /etc/cron.monthly/create-default-dirs-DPM.sh script.
glite-UI
Site Name: RU-Moscow-KIAM-PPS
Site Admin: Eugenia Kovalenko
We have upgraded UI successfully. General tests were passed without any errors.
glite-WMSLB
Site Name: PPS-IFIC
Site Admin: Alvaro Fernandez
When restarting the WMSLB service, missing configuration (missing in yaim?):
[Thu May 31 10:18:03 2007] [warn] PassEnv variable GLITE_WMS_WMPROXY_WEIGHTS_UPPER_LIMIT was undefined
[Thu May 31 10:18:03 2007] [warn] PassEnv variable GLITE_SD_VO was undefined
[Thu May 31 10:18:03 2007] [warn] PassEnv variable LCMAPS_LOG_LEVEL was undefined
[Thu May 31 10:18:03 2007] [warn] PassEnv variable LCMAPS_DEBUG_LEVEL was undefined
First restart after upgrade:
Could not stop the LB Server service [FAILED]
An unrecoverable error occurred while stopping the gLite Logging and Bookkeeping [FAILED]
The gLite LB Server service is already running. Restarting...
Stopping glite-lb-interlogd ...glite-lb-notif-interlogd: no process killed
done
/var/glite/glite-lb-bkserverd.pid does not exist - glite-lb-bkserverd not running?
Error at stop, but starting the services seems fine.
glite-WN
Site Name: UKI-SOUTHGRID-BHAM-PPS
Site Admin: Yves Coppens
SL4 compatibility mode
- Old style and new style yaim reconfiguration worked.
- On my production site, the yaim function config_sw_dir is run on every worker and tries to recursively change the permissions the software area. The problem here is that every worker will do this when this should only be done once. On my production site, the workers got permission denied and I had to overwrite the config_sw_dir function. I reported this in ggus ticket 22090.
Due to the absence of VO software on my PPS site, this is difficult to catch, so I compared the old and new 3.0.1-17 functions which are identical, so this is still a problem.
Site Name: PPS-LIP
Site Admin: Mario David
WN (native glite 3.1 SLC4) are not affected, but the dcache-client-1.7.0-35 should be installed/upgraded (notes on patch 1151), this version is not on the slc4 glite 3.1 repository. The RPM is in the dCache repository:
http://cvs.dcache.org/repository/apt/sl4.4.0/i386/RPMS.testing/dcache-client-1.7.0-35.i586.rpm
so it's just a matter of fetching it
lcg-CE_torque
Site Name: UKI-SOUTHGRID-BHAM-PPS
Site Admin: Yves Coppens
Updated fine.
Congiguration of the CE using new yaim does not return:
/opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n CE...
Starting edg-gatekeeper: [ OK ]
Configuration Complete
-> requires control-C I had observed this in the past and submitted a ticket (ggus 22090). Apparently, yaim developpers were aware of this problem.
[ Note this worked with RB/WN ]
I tested successfully the job submission chain UI/RB/CE/WN.
I haven't had time yet to look carefully at the accounting.
(Received after report)
Hi Mario,
I've just noticed a potential problem with the info provider, possibly linked to the introduction of the DENY attribute in the VOVIEW config (function config_gip in yaim), below is some sample output which at first sight does not seem correct, I'll keep looking into it,
Yves
dn:
GlueVOViewLocalID=alice,GlueCEUniqueID=epbf005.ph.bham.ac.uk:2119/jobmanager-lcgpbs-alice,mds-vo-name=local,o=grid
objectClass: GlueCETop
objectClass: GlueVOView
objectClass: GlueCEInfo
objectClass: GlueCEState
objectClass: GlueCEAccessControlBase
objectClass: GlueCEPolicy
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueVOViewLocalID: alice
GlueVOViewLocalID: atlas
GlueVOViewLocalID: biomed
GlueVOViewLocalID: calice
GlueVOViewLocalID: cms
GlueVOViewLocalID: dteam
GlueVOViewLocalID: lhcb
GlueVOViewLocalID: ops
GlueCEAccessControlBaseRule: VO:alice
GlueCEAccessControlBaseRule: VO:atlas
GlueCEAccessControlBaseRule: VO:biomed
GlueCEAccessControlBaseRule: VO:calice
GlueCEAccessControlBaseRule: VO:cms
GlueCEAccessControlBaseRule: VO:dteam
GlueCEAccessControlBaseRule: VO:lhcb
GlueCEAccessControlBaseRule: VO:ops
GlueCEStateRunningJobs: 0
GlueCEStateRunningJobs: 0
GlueCEStateRunningJobs: 0
GlueCEStateRunningJobs: 0
GlueCEStateRunningJobs: 0
GlueCEStateRunningJobs: 0
GlueCEStateRunningJobs: 0
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateWaitingJobs: 0
Site Name: PPS-LIP
Site Admin: Mario David
Updated fine.
Annoying:
Executing config_ldconf
/sbin/ldconfig: /opt/glite/externals/lib/libswigpy.so.0 is not a symbolic link
/sbin/ldconfig: /opt/glite/externals/lib/libswigpl.so.0 is not a symbolic link
/sbin/ldconfig: /opt/glite/externals/lib/libswigtcl8.so.0 is not a symbolic link
.....
Also on upgrading the rpm's
>>>> Errors:
Executing config_gip_scheduler_plugin
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
Starting LocalLogger: edg-wl-interlogd and edg-wl-logd. [ OK ]
Failed to initialize input queue: input_queue_attach: error binding socket: Permission denied
This is LocalLogger, part of Workload Management System in EU DataGrid.Copyright (c) 2002 CERN, INFN and CE
SNET on behalf of the EU DataGrid.
Starting MAUI Scheduler: [ OK ]
Configuration Complete
It does not return the prompt, this problem is here for some time I had to ^C to get back the prompt.
[root@pprod03 ~]# /etc/init.d/edg-wl-locallogger status
edg-wl-logd (pid 24627) is running...
edg-wl-interlogd is stopped
Doesn't work a restart even after removing the /var/tmp/dg20*
Still same status even after a reboot. The CE is running in SLC4, this might be the cause of some of the problems found on the configuration.
Site Name: prague_cesnet_pps
Site Admin: Jan Svec
Upgrade: Everything went fine.
lcg-RB
Site Name: UKI-SOUTHGRID-BHAM-PPS
Site Admin: Yves Coppens
Old and new style configuration with:
/opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n RB
worked
-- Main.thackray - 01 Jun 2007