Outstanding issues for 3.0

gLite 3.0 released

26/05/2006:

Problems found in RC4


  • OPEN - not fixed.
  • FIXED - understood and with an fix available.
  • CLOSED - fix included in release.

Number Component Description Savannah Status State Action
105 Yaim DPM_FILESYSTEMS in site-info.def example of YAIM #16492   FIXED In Yaim head
106 glite-CE gliteCE configuration needs to be run twice after a fresh install #16496   FIXED glite-CE version 2.4.7
107 Yaim Confusion in meta pkg name #16509   FIXED Yaim head
108 LCMAPS bulk submission fails when users' DN mapped to single account #16295   OPEN  
110 FTS BDII plugin sometimes crashes or abort when listing associated services #16510   FIXED  
111 FTS BDII plugin returns no entries when quering by host and type #16505   FIXED  
112 FTS shouldn't have to specify default logging params in YAIM #16521   CLOSED Not Critical
113 Dcache gsidcap fails when > 56 CAs on server #16538   OPEN  
* problem 102 was not fixed

Packages required for RC4

Problems found in RC3

Number Component Description Savannah Status State Action
97 Meta Package glite/lcg CE targets confusing for configuration and installation #16385 (16356, 16416 are dupes of this)
Apr/25 fixed on yaim HEAD
Nasty Yaim Gotcha. CLOSED glite-yaim-3.0.0-9, checked by Di
100 FTS BDII publication does not work properly #16453   FIXED Yaim head
101 DPM YAIM starts rfio before creating... #16465   CLOSED glite-yaim-3.0.0-9, checked by Di
102 dCache meta package glite-SE_dcache lacks... #16463 pnfs-3.1.10-15 needs to be added to the dCache meta pakcage CLOSED RC4, checked by Di
103 Yaim Rationalizing Yaim targets #16466   FIXED Yaim head
104 Tank MySQL configuration #15656   FIXED Yaim head
109 VOMS Upgrade fails for VOMS Oracle #16475   FIXED RC4 Maria to check

Packages required in RC3

  • glite-security-config version 1.2.4
  • glite-WMS version 2.3.3
  • glite-iperf-1.3.2-1.i386.rpm
  • glite-udpmon-1.3.10-i386.rpm
  • glite-CE-2.4.5-2
  • patch #700: condor-lcg-1.1.0-1
  • glite-yaim-3.0.0-6.noarch.rpm
  • patch #710: LFC / DPM 1.5.6

RC2 Problems

Found in Certification

Number Component Description Savannah Status State Action
80 Cron glite cron jobs have execute permissions #15638 CLOSED   glite-security-config version 1.2.4 glite-WMS version 2.3.3 glite-iperf-1.3.2-1.i386.rpm glite-udpmon-1.3.10-i386.rpm, checked by Di
81 Yaim FTS needs configuration scripts #16149 CLOSED   glite-yaim-3.0.0-6.noarch.rpm
82 NW Server NW Server "Authentication Problem" #15761 CLOSED   Due to #15638, checked by Di
83 Yaim dcap should be gsidcap in yaim.
New function submit to test.
#16161 CLOSED   glite-yaim-3.0.0-6.noarch.rpm, checked by Di
84 Yaim DPM/LFC misconfiguration of the /etc/shift.conf #16162 CLOSED   RC4, checked by Di

Issue 67 * may be back.

Found in Pre-production

Number Component Description Savannah Status State Action
76 UI/WN A relocatable UI/WN is needed 15998 Requires fixes to glite configuration REOPENED Robert
85 gLite RB Logmonitor crashed #16191 Occurs when people do nasty things while debugging CLOSED None
87 Yaim Bug in path setup of grid pool accounts #16319   CLSOED glite-config-1.6.29-0 package, checked by Di
88 WMS Configuration for bulk submission needed #16294   CLSOED glite-yaim-3.0.0-9, checked by Di
89 Yaim glite-yaim-3.0.0-5: 'gliteCE' parameter missing in the example file #16304 Wiki error, parameter removed CLOSED None
90 Yaim YAIM install_node script fails to recognise lcg-CE_torque node type #16347 Patch available FIXED Fixed in Yaim head
91 Yaim "Connection refused" during YAIM configuration of Torque #16339   OPEN Di
92 glite CE Software TAG publish #10872 Fixed in RC3 CLOSED checked by Di
93 BDII site does not publish GlueSiteUniqueID #16248 Won't fix CLOSED NONE
94 WMS WMProxyEndpoints attribute empty in glite_wms.conf #16330 duplicated with the bug no #16294. CLOSED checked by Di
95 Yaim glite-yaim does set queue.acl.groups #16220 Fixed in YAIM head FIXED  
99 gliteCE APEL #16425 Can not fix before the release Closed None
101 config_gip GlueHostOperatingSystemVersion hardcoded #16425   FIXED Fixed in Yaim head

Issue 67 is being encountered again in PPS.

Found by ROC Testers

Number Component Description Savannah Status State Action
86 dCache Reset the databases #16388 Patch available for YAIM, needs adding CLOSED glite-yaim-3.0.0-9, checked by Di
96 Install Wrong versions for lcg-info-dynamic-scheduler #16355 To be done for RC4 CLOSED RC4, checked by Di

Carried forward

Number Component Description Savannah Status State Action
50 L&B status not updated for nodes of a large collection
Apr/6 - bugs identified causing high CPU usage of glite-proxy-renewd, for this new bug was raised 16050, we'll track both here
Apr/10 - file descriptor limit can be increased, but Francesco has a proper fix. Will produce the tag to be tested on CTB.
15189 16050 Working with ~400 jobs, still problems with 1000. Could this be related to slow matchmaking or a limited number of available CEs? More work FIXED Included in integration build for 3.0
57.1 WMS Job aborts after proxy renewal
Apr/4 - new bug 15905
Apr/5 - nonews
Apr/10 - waiting for DI's testing results (on his private setup, not CTB). In any case Francesco reports problem has been seen in dev.
Apr/12 - the tag with fix for #15905 arrived in integration. To be included in RC3.
Fixed in RC3 CLOSED checked by Di
69 gliteCE How should we manage publishing of SoftwareRunTimeEnvironment
Apr/6 - add gridftp server to gLite CE, configure exactly as LCG CE, assigned to Louis
Apr/10 - change being prepared for RC3
Apr/12 - Robert is working on it.
16159 On the gLite 3.0 timescale we will have to put gridFTP on the gliteCE. We must make sure, if possible, that we configure to restrict access only to sgms, and to allow them only to edit a single file. CLOSED glite-CE-2.4.5-2, checked by Di
72 NS NS produces authentication errors after 3-4 days
Apr/6 - nothing heard from developer, Francesco will ping him
Apr/10 - A similar problem has been observed on RC2, not clear if identical, to be analysed on PPS.
15761 Developer investigating node in this state. Is this time or activity related? Due to #15638 CLOSED checked by Di
73 Doc The port table has to be updated for all services in the release
Apr/6 - nothing obtained, Zdenek will ping John White
Apr/10 - John is contacting developers to supply their parts, doc in preparation.
Apr/12 - list for gLite delivered. Needs to be merged with LCG list. John working on it.
  Ian N contacted OPEN Romain
Ian N

Issues with fixed and CLOSED

Number Component Description Savannah Status State Action
24 FTS Provide FTS server YAIM component
Apr/6 - done by Gavin, not in CTB repository yet
Apr/10 - moved to CLOSED
  FTS node still requires some manual configuration but is mostly yaim now. FTA has a full yaim component, to be tested CLOSED Gavin.
24.1 FTA Provide FTA YAIM component   Ready for yaim 3.0.0-3. FTA can be installed on the FTS server or on a separate machine CLOSED Gavin
34.1 DPM/LFC Race condition in srm v2 methods   Fixed in 1.5.5 CLOSED In cert
34.2 DPM/LFC Python LFC interface broken in 1.5.3, should regress to 1.5.2 python interface.   Fixed in 1.5.5 (without reverting to old interface) CLOSED 1.5.5 in cert
34.3 DPM/LFC bug fix requiring db update   Fix committed. update integrated into upgrade script CLOSED 1.5.5 1.5.5 in cert
34.4 DPM/LFC Fresh install problems with virtual id mappings   Fixed in yaim config CLOSED Jean-Philippe/Louis
34.5 DPM/LFC rfio problem preventing correct functioning with more than one pool node.   Fixed in 1.5.5 CLOSED 1.5.5 in cert
34.6 DPM/LFC Oracle error due to locks for virtual ids   Fixed in 1.5.5 CLOSED 1.5.5 in cert
34.7 DPM/LFC DPM upgrade script doesn't work with more than one VO     CLOSED Sophie, fixed
36.1 LB With >5000 jobs the interlogger is in a state where events are not forwarded. 15217 A possible contributor to 15050 CLOSED Fix in the repository. Daniele confirms fix
39 DPM/LFC To be statically linked against MySQL 4.1 15421 1.5.5 linked against MySQL 4.1 CLOSED 1.5.5 in cert
40 lcg-RB RB needs modifications to run with new Condor, both a config and updates to some binaries eg condor_gridmanager 15417 Fixed in yaim CLOSED Ready for cert
41   The ./glite-lfc-client-config-config.py script cannot be found 15425   CLOSED Issue in glite-wn.cfg.xml template, ready for test
42 WMS/gridFTP WMS gridFTP crashing with > ~56 CAs 15383 New tag produced 8th, to build 10th CLOSED fix confirmed
43 gliteCE gliteCE pointing to separate torque server 15424   CLOSED Robert - fixed in yaim, ready for cert
44 gliteCE In a glite 3.0 CE(glite flavour) the LSF gip script points to a non existing file 15434   CLOSED Fixed in CE config script
45 gliteCE It should be documented that the logparser daemon MUST be started on a glite CE 15432   CLOSED  
46 WMS Job submission from a glite 3.0 WMS (glite flavour) to a glite 3.0 CE (glite flavour) fails 15426   CLOSED problem with a missing rpm, now it's a dependency
47 VOBOX unable to install glite-UI untop of glite-VOBOX 15411   CLOSED glite-VOBOX packate to incorporate all of glite-UI
48 FTS FTS - YAIM configuration: "error reading information on service rgma-gin","... bdii" 15410   CLOSED rpm missing, fix ready for cert
49 * UI remove links to edg-* commands 15330 Update 28/03/2006; links to be removed CLOSED Fabrizio
51 WMS glite-wms-wm crashes under some unknown bdii or ism condition 15098   CLOSED  
52 WMS submission of collection always fails 15095   CLOSED  
53 WMS WMProxy not correctly configured 14861   CLOSED  
54 WMS ISM keeps CEs removed from a BDII 14930   CLOSED  
55 yaim/CE It seems it is not possible with YAIM to map the VOs to a unique queue when using LSF 15520   CLOSED invalid; site-info.def misconfiguration
56 combined UI No dpm client in the combined UI 15514   CLOSED Joachim - fix ready
57 * WMS VOMS proxy renewal failing on WMS 15643
Problem identified in handling of vomses file in voms library. Fix on CVS, to tag CLOSED Di, Mario, checked by Di
57.1 VOMS VOMS proxies with '=NULL' format for attributes must be supported -   CLOSED Maarten, checked by Di
57.2 WMS Apr/5 - New critical bug will be posted by Francesco for VOMS authentification, not observed yet because DPM hasn't been used with ACL's so far. Problem discovered as by-product of analysis of pt.57.1 above.
Apr/6 - still in discussion, no bug raised yet, no problem for GRAM based jobs
Markus: contacted JPB for a test of setacl from the WN, Gilbert doesn't have one, current DPM tests cannot be adapted, write a new one (who? Gilbert? JPB? Markus to solve)
Apr/10 - inexistent problem! After much deeper analysis the original worries appeared unfounded. Closing.
?? Francesco to open the bug CLOSED Francesco
58 gliteCE lcmaps/lcmaps.db updated for VOMS roles - fix in CVS CLOSED Alberto - in integration repository
59 installation renamed glite meta rpms   In integration repo CLOSED Joachim/Oliver
60 WMS AccessControlbaseRule & LSF 15642 CLOSED    
62 gliteCE Blah submission from a glite 3.0 CE (glite flavour) to an LSF queue does not work
Apr/6 - in the RC2 repository
15674 Two solutions i) run log parser on head node ii) copy log file locally onto a non-AFS partition. Alessandro to investigate (ii). (i) difficult because parser is not available as a standalone rpm (for 3.1). Job submission still failing. CLOSED Alessandro
63 gliteCE Jobs remaining in ready state despite finishing
Apr/5 - We may need to upgrade Condor to
6.7.16 - no code change
6.7.18 - requires one-line change, would be submitted by Francesco, better Condor support expected if higher version used.
-> Francesco to check existing bad-state-gliteCE's for a new hint
Apr/6 - bad state gliteCE's revealed that cron has been auto-updated, now requires crotab not to have exec permission for security, but all(?) our crontabs are created with that perm bit on(!!), CTB has to now verify all nodes and remove exec perm by hand for RC2, YAIM script needs to be changed, but some crontabs are hardcoded, to cleanup for after-RC2.
After-RC2 testing of condor 6.7.16 may be the best, 6.7.18 has new problems, better wait for 6.7.19 in a few weeks, Francesco thinks.
This may be a long problem before we get to the right solution
Apr/10 - the bug will be raised against YAIM install to fix the crontab exec bit problem, which seems to have caused all seen problems (after increased timeout has fixed some, the original assessment that it didn't is considered incorrect now)
Problem is being closed now waiting for new observations from the PPS, will be reopened if appropriate after PPS assessment.
Francesco will continue testing condor 6.7.19 just in case.
15688 Adjustement of timeouts suggested by Francesco has not worked. This could be related to problems in the interaction of LCG LB clients and the glite LB, which will always happen in the gliteWMS->LCG-2_7_0 submission chain. CLOSED Alessandro Francesco
64 gliteCE after restart of a gliteCE jobs fail for a brief period
Apr/5 - moved to CLOSED
15685 Issue to be refered to the Condor group. Anticipated effect in production is minimal CLOSED condor
65 * WMS gLite 3.0 job wrapper has bad kill usage 15710 in integration repo CLOSED  
66 savannah Migration of existing issues to new Savannah project
Apr/6 - all bugs needed tracking for gLite 3.0 were moved
  categories created, users and issues not yet transfered CLOSED Nick
67 * WMS/UI glite-job-list-match fails, undefined symbol 15724 Retag unchanged code, rebuild and see if problem persists; tag sent 29/03/2006 CLOSED  
68 inst It is sometimes necessary to remove boost-g3 before installing glite3.0
Apr/5 - must be fixed, who can fix it?
-> must be put in Savanah as a bug, Joachim will submit the bug
Apr/6 - bug submitted, temporary solution by Louis with new rpm
Apr/10 - though being able to keep only one boost-g3 would be the correct solution, given the fact it needs code changes forces us to stay with the Louis' temporary solution. Closed.
15986 We need to find out if we can remove the 'provides boost' from boost-g3. The rpm has Build Host: pc-cel3-build
Problem must be tracked until we have a proper solution or a full confidence in the current one
CLOSED Joachim
70 WMS large job collection submission and cancel through WMproxy didn't work
Apr/5 - moved to CLOSED
15769 17MB MySQL limit implemented, Mario confirms issue fixed CLOSED  
71 WMS fix for speeding up match-making for bulk submissions of a factor of 5 (from ~10 to ~2 s per job)
Apr/5 - moved to CLOSED
15806 This comprises two fixes, first of which relates to brokerinfo files, which will be ready to tag by 31/03/2006. The other component to the fix will not be ready on this timescale, but is not tied to the first. CLOSED  
74 FTS MySQL FTS has wrong db schema version
Apr/6 - update of operating procedures was needed
15875 Gav fixing CLOSED Gav
74 * UI bad distinction between file/directory of $GLITE_LOCATION/etc/vomses 15734 This halts the UI config CLOSED    
75 FTS Every so often a "Failed to get proxy certificate" error is obtained when submitting a transfer
Apr/5 - moved to CLOSED
15866 Fixed in yaim due for RC2 release CLOSED Gav
77   Apr/5 - gLite CE won't work with LSF queues
lsf_submit.sh must not source ~/.login
Apr/5 - The tag with the QF for this bug is: glite-ce_R_1_5_12
Apr/6 - tag is in the RC2 repository, ready to test
15985   CLOSED  
78 FTS Job status is Failed after four re-tries, but file was copied to the destination.
Apr/6 - closed by Gavin as not reproducible.
15987 The wrong RPMs were tagged for the original gLite 3.0 build. Need to check this in RC2 CLOSED Gavin
79 VOMS Apr/5 - ACs list contains wrong group/role values
Apr/6 - needs to be tested by Di
15692   CLOSED Maria
