TWiki> LCG Web>LCGGridDeployment>GLiteCertification (revision 86)EditAttachPDF
LCG Grid Deployment

Test results for for glite3.0.2 RC

The current release canditate is RC6

Test SFT Information System edg-tests on lcg WMS FTS VOMS R-GMA DPM LFC
Responsible Gergo Di Di, Gilbert Di, Mario Gergo Maria Laurence Di (edg-tests) Di (edg-tests)
3.0.2 RC2 OK - Gergo #17856 - Gergo     OK   OK - rgma-client-check    
3.0.2 RC5     FAIL closer investigation shows that most of the tests ran fine FAIL closer investigation shows that basic WMS functinality is OK OK OK rgma-client-check OK see edg-tests see edg-tests
3.0.2 RC6   gLite WMS still publishes empty information, GIP not configured OK Some failures came from the firewall settings on one of WNs OK       OK OK

WMS stress tests

07/10/06 bulk submission 1000 jobs with Retrycount=0 on lxb2032 (upgraded WMS)
  1. command for submission: glite-wms-job-submit -a -o idbulk3 --collection bulk3
  2. JDL test.jdl: JobType = "Normal" ; Executable = "test.sh"; InputSandbox = {"test.sh"}; OutputSandbox = {"stdout","stderror"}; StdOutput = "stdout"; StdError = "stderror"; Requirements = other.GlueCEUniqueID == "lxb1905.cern.ch:2119/blah-pbs-atlas"; RetryCount = 0;
  3. execution script test.sh : #!/bin/bash /bin/hostname echo "done"
  4. submission time: 480 seconds
  5. results
    • only 19 jobs are in "Done (Success)".
    • 198 jobs are "Cancelled" and it shows "Aborted by user" in few minutes after submitted, but we did not cancel any jobs
    • 10 jobs are in "Done (Failed)", and logging info shows "Got a job held event, reason: Repeated submit attempts (GAHP reports:)"
    • 242 jobs are in "Waiting" status for ever
    • 13 jobs are in "Running" status for ever
    • 518 jobs are in "Submitted" status for ever

07/10/06 bulk submission 2000 jobs with Retrycount=0 on lxb2032 (upgraded WMS)
1. -4. see above
  1. results : failed to submit to WMS because
    • Error - Operation failed Unable to register the job to the service: https://lxb2032.cern.ch:7443/glite_wms_wmproxy_server The Operation is not allowed: edg_wll_RegisterJob (null) Resource temporarily unavailable (edg_wll_RegisterJobProxy(): unable to register with bkserver;; Resource temporarily unavailable;; Logging library ERROR: ;; Resource temporarily unavailable;; edg_wll_DoLogEventDirect(): Error code mapped to EAGAIN;; Lbserver (proxy) store protocol error;; edg_wll_log_proto_client_direct(): error reading answer from L&B direct server) Method: jobRegister Error code: 1220
    • Error - Operation failed Unable to register the job to the service: https://lxb2032.cern.ch:7443/glite_wms_wmproxy_server The Operation is not allowed: edg_wll_RegisterJob (null) Resource temporarily unavailable (Logging library ERROR: ;; Resource temporarily unavailable;; edg_wll_DoLogEventProxy(): Error code mapped to EAGAIN;; Lbserver (proxy) store protocol error;; edg_wll_log_proto_client_proxy(): error reading answer from L&B Proxy server) Method: jobRegister Error code: 1220

07/11/06 bulk submission 1000 jobs with Retrycount=0 on lxb0744 (freshly installed WMS)
1. - 4. see above
  1. results
    • 331 jobs are in "Done (Success)"
    • 647 jobs are in "Cancelled" and "Aborted by user"
    • 1 job is in "Aborted", (Got a job held event, reason: Repeated submit attempts (GAHP reports:))
    • 17 jobs are in "Done (Failed)" (Got a job held event, reason: Repeated submit attempts (GAHP reports:) )

07/11/06 bulk submission 400 jobs with Retrycount=0 on lxb0744
Used a new configuration parameter communicated by the developers.
  1. results
    • 379 jobs "Done (Success)"
07/11/06 bulk submission 1000 jobs with Retrycount=0 on lxb0744
Used a new configuration parameter communicated by the developers.
  1. results
    • 860 jobs "Done (Success)"

TAR distribution tests
Built the relocatable out of the certification repository on 07/17/06. For configuration with yaim config_add_glite_env has to be removed from TAR_WN_FUNCTIONS.
TAR_UI
Built the relocatable out of the certification repository on 07/17/06. Installed and configured as bash as well as tsch user. Commands tested: lcg-infosites, lcg-cr,lcg-cp, lcg-lr, lcg-del, rgma-client-check, glite-job-list-match, glite-job-submit, glite-job-status, glite-job-logging-info, glite-job-output, edg-job-list-match, edg-job-submit, edg-job-status, edg-job-get-logging-info
TAR_WN
All edg-tests were run against the TAR_WN (ctb-wn-1): OK

Bugs fixed in gLite 3.0.2 RC1

Bug Status Checked by
#13368 FTS: reason for job status should be other ...    
#13470 fields cannot be mapped from ldap to rgma ...
#13530 add the table names to the log4j messages ...
#13798 R-GMA Servicetool log entries format ...
#14966 /opt/glite/etc/init.d/rgma-* template scripts ...
#15292 WMProxy fails in cancelling a collection...
#15579 glite-rgma-publish does not own...
#15591 No link between CE/SE and Site...
#16034 The logmonitor daemon crashed...
#16191 LogMonitor crashed - problem...
#16233 FTS: File Reason Class always...
#16281 Niether glite-LFC_mysql nor...
#16332 inadequate management of return...
#16334 too much logging due to signals...
#16487 serviceStatusDetails fails to...
#16524 Wrong management of Input...
#16547 FTS: case sensitivity of...
#16548 Need to update the RGMA server's...
#16627 incomplete query to lb in case...
#16631 misleading error messages from...
#16679 job wrapper template inadequate...
#16746 incorrect count of shallow...
#16751 make message due to prologue...
#16769 wms-client is unable to read...
#16781 voms server code doesn't reconnect OK Maria
#16813 "Blocking" site can hold up...
#16828 WMProxy unable to correctly...
#16857 wrong exit status in case of...
#16874 RetryCount attribute inside DAG...
#16970 the default value of the retry...
#16973 support for epilogue...
#17021 Integers in stream producers are...
#17065 treatment of LSF e-mails...
#17069 LSF job execution ends up in...

Bugs fixed in gLite 3.0.2 RC2

gLite Middleware

Bug Status Checked by
#9777 unable to get logging-info -2 info...   Di
#15050 job-status returns Running and...   Di
#15450 rare Unable to Register the Job...   Di
#16034 The logmonitor daemon crashed... not verified  
#16191 LogMonitor crashed - problem with...    
#16295 bulk submission fails when DN... OK Di
#16502 automatic sequence code generation...   Di
#16506 max_open files too low for a lot...    
#16507 overview: logmonitor crashes with exit status 2... not verified (see bug #16034)  
#16524 Wrong management of Input Sandbox...    
#16732 YAIM: Oracle instantclient... OK Robert
#16769 wms-client is unable to read...    
#16828 WMProxy unable to correctly handle...    
#16900 glite-config RPM software... OK (see release notes) Robert
#17116 MyProxy service type should be published ... OK Gergo
#17154 Publication of the glite-WMS... NOT FIXED Laurence
#17230 glite-SE_dcache shouldn't have dependency on pnfs rpm OK Louis/G
#17256 Harmonize on fetch-crl or... OK Gergo
#17569 problem in LFC Python interface... OK Gergo
#17616 Change of CA repository...   Robert
#17817 Error in glite ce configuration... OK Gergo

Bugs fixed in gLite 3.0.2 RC5

Bug Status Checked by
#15524 site bdii publishes using the base...    
#17391 C API crashes with a segmentation... OK Andreas
#17866 BDII rpm is not relocatable; file...    
#17967 top-level BDII overload due to...    
#17968 BDII should cache LDIF sources    
#17969 BDII must listen on INADDR_ANY    
#18130 BDII FCR filter lets some entries...    
#18204 BDII sorts LDAP records...    

Bugs fixed in gLite 3.0.2 RC6

Bug Status Checked by
#17540 Bug #13789 appearing again due to...    
#17877 Handling of VOView's...    

gLite Middleware patches

Patch Status Checked by
#746 URGENT : LFC-interfaces-1.5.7-2 OK Gergo
#747 edg-profile-2.0.9-1.noarch.rpm   Andreas
#749 R-GMA server servlet update
#750 dcache-client update    
#753 bdii-3.6.0 improves info system...    
#757 LB logging-info update    
#758 WMS update for Bug #13789...    
#759 Handling of VOView's...    
#765 bdii 3.7.0 improves performance    
#767 rgma c api build problem OK Andreas
#768 bdii 3.8.0 fixes FCR filter bug    
#774 New parameter for configuring...    
#775 overview: AuthorizationCheck...    
#776 overview: VOMS fqan plugin in WMS...    
#777 overview: max_open files too low...    
#778 overview: Attribute VOMS_FQAN not...    

Operations
Bug Status Checked by
#17116, MyProxy service type should be... OK Gergo

TAR UI bugs to be fixed

BUG Status (To be) Checked by
#15886, Errorneous default value for GLITE_SD_PLUGIN   Robert
#17309 error messages in configure_node related to condor in glite-UI   Robert
#17322 lcg-infosites and lcg-info from relocatable... OK Andreas
#17413 voms-proxy-init use only the first voms server defined   Robert
#17444 relocatable UI/WN installation doesn't containts log4j.jar OK Andreas
#17450 PERLLIB is not populated with edg/lib/perl...   Andreas
#18018, Stale files after configuration.   Robert
#18019, Too much necessary variables   Gergo
#18021, LFC_HOST not set OK Gergo
#18022, edg-* commands not configured   Robert
#18023,GGUS:#9492, Hardcoded '/bin/awk'   Robert
#18024, glite/etc/vomses not filled correctly   Robert
#17521, PATH incorrect, lcg-infosites not working OK Andreas
#18028, Misleading naming for the tarball OK Gergo
#18029, Configuration fails on Debian   Gergo
#18031, Typo in config_certs_userland OK Gergo
#18032, rgma-client-check fails OK Gergo
#18035, PYTHONPATH definition OK Gergo
#18059, PYTHON version confusion OK Gergo

=================================================================================================================

Notes for test results 3.0.2-RC2 27- Jun

R-GMA client check

Your proxy is valid until: Thu Jun 29 23:53:21 2006
-bash-2.05b$ rgma-client-check

*** Running R-GMA client tests on lxb1765.cern.ch ***

Checking C API: Success
Checking C++ API: Success
Checking CommandLine API: Success
Checking Java API: Success
Checking Python API: Success

*** R-GMA client test successful ***

Harmonize edg-system-utils and fetch-crls

Checked, now all the metapackages depends on fetch-crl instead of edg-system-utils. I guess this is enough as a check.

LCG - gLite Certification

Current roles:

  • cert release manager: Oliver
  • cert testbed manager: Louis
  • coordination of tests: Andreas, Di
  • PPS coordinator: Nick
  • contact to integration team: Joachim
==========================================================================

Test results for for glite3.0.0 RC

Release candidate version tbd by cert testbed manager
Cert testbed ready since tbd by cert testbed manager
Moved to PPS tbd by release manager
Started PPS deployment tbd by PPS coordinator

Test SFT Information System Gilbert testsuite on lcg WMS gLite bulk submission FTS VOMS R-GMA DPM LFC
Responsible Louis   Di, Gilbert Di, Mario Di, Lin Gergo Maria Laurence Gilbert, Jean-Philippe Gilbert, Jean-Philippe
3.0.0_RC2 7 April 2:00 pm OK   OK,OK failed due to bug 15761 not tested yet (tbd after WMS is ok) Failed OK OK OK OK
3.0.0_RC3 26 April 3:45 pm OK   FAIL FAIL FAIL not tested yet (tbd after WMS is ok) FAIL FAIL FAIL FAIL
3.0.0_RC4 04/28/06 FAIL   FAIL OK (Few failures flaged wrongly by testsuite) failed due to bug 16295 Install, configure OK. Functionality not yet tested. not tested again (o.k. in prev RC) not tested again (o.k. in prev RC) FAIL (maybe OK, but misleading presentation) FAIL (maybe OK, but misleading presentation)
3.0.0_RC4+ 05/02/06 FAIL (only the CA test failed, all others are OK)   OK Fail failed due to bug 16295 Install - OK
Configure - OK
Simple submission - OK
not tested again (o.k. in prev RC) OK (rgma-client-test) OK (now in Gilbert testsuite) OK (now in Gilbert testsuite)
3.0.0 update_1 06/01/06 FAIL (only the CA test failed, all others are OK) gstat OK OK (the FAIL flags seem to be errors in the framework; investigation shows that the tests are OK) FAIL (4 out of 100 jobs have running status forever) RPMs unchanged, test relaunched, result OK versions of VOMS rpms changed; the client was in use by several other tests and showed no errors rgma-client-check o.k. on UI OK (now in Gilbert testsuite) OK (now in Gilbert testsuite)

==========================================================================

*Notes for test results 3.0.0_RC2 7 April 2:00 pm *

Output of DPM/LFC tests run by Gilbert:

- globalSuite lxb1737 globus OVERALL Score: OK=30 FAILED=0 LOG: /var/tmp/grodid/global/lxb1737/2006-04-07/fil155652/logFile - globalSuite lxb1737 vomsR OVERALL Score: OK=30 FAILED=0 LOG: /var/tmp/grodid/global/lxb1737/2006-04-07/fil155716/logFile - globalSuite lxb1921 globus OVERALL Score: OK=30 FAILED=0 LOG: /var/tmp/grodid/global/lxb1921/2006-04-07/fil155724/logFile - globalSuite lxb1727 vomsR OVERALL Score: OK=30 FAILED=0 LOG: /var/tmp/grodid/global/lxb1921/2006-04-07/fil155739/logFile

Result of FTS test

  • install (YAIM) - OK
  • configure (YAIM) - FAILED (not yet completely implemented)
    • fta-info.def should be sourced in site-info.def
  • configure (manual) -
    • Bug number 11216 is still there (since 2005.09)

  • simple submission - failed because of failed configuration

==========================================================================

*Notes for test results 3.0.0_RC2 26 April 2:00 pm *

Summary:

There is a major problem with lxb0724 (SEdache). Also the nodes lxb1737 (dpmMysql) and lxb2036 (dpm_pool) have a problem. Minor problems might be present on lxb0741 (WN32_2_1) and lxb2018 (CE)

Test highlighted as failed:

Test Affected nodes


5 lxb0724 (SEdcache) 6 lxb0724 (SEdcache) 7 lxb0724 (SEdcache) 8 lxb2018 (CE) 16 lxb0741 (WN32_2_1) 17 lxb0741 (WN32_2_1) 19 lxb0724 (SEdcache) 20 lxb0724 (SEdcache) 21 lxb0724 (SEdcache) 23 lxb0724 (SEdcache) 24 lxb0724 (SEdcache) 24 lxb0724 (SEdcache) 27 lxb0724 (SEdcache) 28 lxb1737 (dpmMysql), lxb2036 (dpm_pool) 29 lxb2036 (dpm_pool), lxb1737 (dpmMysql) 30 lxb1737 (dpmMysql), lxb2036 (dpm_pool)

Test highlighted as timeout:

15 only OKs in the detailed output

Comment: lxb0724 (SEdcache) is most of the time accessed from lxb0741 (WN32_2_1) (Tests 19, 20, 21, 23, 24) but my guess is that the problem lies on lxb0724 because in tests 5, 6, 7 lxb0724 refuses to be accessed also from other machines.

Failed commands

Problem with lxb0724 (SEdcache)


Test: 5,6,7,19,20,21

doTransfer globus-url-copy -nodcau gsiftp://lxb0724.cern.ch//pnfs/cern.ch/data/dteam/grodid-23411_local-lxb0724.cern.ch file:///tmp/grodid-23411--grodid-23411_local-lxb0724.cern.ch FAILED ==> error: a system call failed (Connection refused)

lcg-rep --verbose --insecure --vo=dteam -d lxb0724.cern.ch lfn:grodid-24358-dstorm-serie-5.data

++++ 24358 060426-011710 CGSI-gSOAP: Could not open connection ! lcg_rep: Connection refused Using grid catalog type: lfc Using grid catalog : lxb1941.cern.ch

Problem with lxb0741 (WN32_2_1)


Test: 16,17

GfalC: gfal_test.c GfalC: compiled fine GfalC: cmd: ./gilb -g 56 -R -W dteam001-gfal-S11015-I0002-CElxb2035-WNlxb0741-13221-060426-031613.jpb gsidcap://lxb0724.cern.ch:22128//pnfs/cern.ch/data/dteam/dteam001-gfal-S11015-I0002-CElxb2035-WNlxb0741-13221-060426-031613.jpb GfalC: exec failed

The test works on the other WNs (lxb0731, lxb0735)

Problem with lxb1737 (dpmMysql)


Test: 28, 29, 30

STEP: 10 OPERATION: rfcp CMD: rfcp lxb1737.cern.ch:/dpmstorage/dteam/2006-04-26/ggtglob1fil064307.371.0 /var/tmp/grodid/global/lxb1737/2006-04-26/fil064307/fcrap1r TURLr: lxb1737.cern.ch:/dpmstorage/dteam/2006-04-26/ggtglob1fil064307.371.0 : Permission denied (error 13 on lxb1737.cern.ch)

Problem with lxb2036 (dpm_pool)


Test: 28, 29, 30

STEP: 14 OPERATION: rfcp CMD: rfcp DPMsource.lxb1706 /dpm/cern.ch/home/dteam/grodid/tglob/Mlxb1737/D2006-04-26/H064307/ggtglobrfil064307 RFCP: /dpm/cern.ch/home/dteam/grodid/tglob/Mlxb1737/D2006-04-26/H064307/ggtglobrfil064307 : Permission denied (error 13 on lxb2036.cern.ch)

Problem with lxb2018 (CE)


Test: 8

This seems to be a minor problem.

Command: globus-job-get-output https://lxb2018.cern.ch:20023/19335/1146007245/ Clean up job...[FAIL]


Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf VOMS-report-gLite3.0-RC2.pdf r1 manage 411.0 K 2006-04-10 - 13:15 UnknownUser VOMS test results RC2
Edit | Attach | Watch | Print version | History: r90 | r88 < r87 < r86 < r85 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r86 - 2006-07-28 - DiQing
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback