PPS Pilot Follow-up Meeting Minutes Fri 13 Mar 2009

  • Date: Fri 13 Mar 2009
  • Agenda: 54498
  • Description: Pilot of Cream CE: check-point
  • Chair: Antonio Retico

Attendance

  • PPS: Antonio Retico

  • PIC: (problems to connect)
  • FZK: (apologise)
  • CNAF: Daniele Cesini, Danilo Dongiovanni
  • PADOVA: Sara Bertocco

  • CMS: Absent
  • Alice: Patricia Mendez

  • JRA1/Cream/WMS: Massimo Sgaravatto
  • SA3: Gianni Pucciani, Alessio Gianelle
  • SA1: Nick Thackray

Review of action items (tasks)

SA1/SA3 tasks

Status of the subtasks of TASK:7981 (see them in the PPS tracker ) .

Antonio:

TASK:7427 and TASK:7157 (SAM Nagios) in progress. Karolis ponted out that we need to find a way now in SA1 to certify a new site running CREAM only, as SAMAP is not supporting CREAM yet. This should be brought at the operations meeting . Antonio pointed out that although it is not dramatic if the sites in the pilot fail SAM tests (e.g. the SE tests failing in Padova), it is a lot better if we can manage and get SAM generally "green" . That would help us to spot possible malfunctioning inserted by a new updates.

TASK:8983 : as there are now two CEs running for PIC this task should be closed and replaced with a service operation task. Raquel reported via e-mail of some issues on ppsce03 which she is working on.

other tasks

Notes:

Status and results of the pilot service (by VOs and sites)

Alice

Good news is that CERN-PROD is now working in production mode with Alice

Question to Massimo: As at CERN we have two CEs we would like to share the load among them while doing direct submission. Does the CLI provides a feature like that (e.g. the possibility to submit to a list of CEs) or do we need to handle it at the level of the application? * Massimo: This is not currently done but not difficult to do at any level * Antonio: As it should not be difficult for Alice to manage that at the level of the application I think that we should not request this development on the CLI for the time being * Patricia agrees

Patricia: I'm preparing a talk to CHEP where I'll present the results of the four experiments using CREAM. Alice will likely have a different weight because it is more advanced in the usage.
The tests currently being done by Andrea Sciaba' using the ICE-WMS installation at CNAF are related to the same work, although the results will be presented by Patricia.

CMS Massimo gives some info about the testing curently in progress by CMS:
Andrea Sciaba' has managed to run some jobs using the WMS at CNAF (cert-rb-01.cnaf.infn.it) which is updated to a version of ICE corresponding to PATCH:2459. He initially suffered for the BUG:47996 (database corruption when ICE exits). This was immediately fixed by Alvise Dorigo and the fix deployed in the same patch
However Andrea is aware that as the WMS at CNAF is currently pointing to the CREAM nodes in production, he won't be able to run performance tests beacause of the version of CREAM currently deployed in production

Status and results of the development (by developers)

Updates given by Massimo and Gianni (off-line) on the set of patch marked to be followed by the EMT for quick certification: This is the combined view

  • patch #2845: Second update of CREAM and CEMon Clients for slc4/i386 platform
    • Ready for integration
    • Tasks for SA3 testers existing * patch #2459: First update of ICE
    • Update of ICE, to be installed "on top" of WMS patch#2562
    • updated with fix for BUG:47996:  moved back to "With provider"
    • finished but waiting for patch documentation to be agreed on (Andreas Unterkircher, Massimo)
  • patch #2748: Third update of CREAM CE for slc4/i386 platform
    • various fixes for CREAM e BLAH - affecting the CE
    • With Provider to test fix 
    • expected by today (Friday 13 Mar)
  • patch #2750: YAIM-CREAM-CE 4th update
    • Certified
  • patch #2830: [ YAIM-WMS ] New yaim-wms to properly configure the ICE section
    • Ready for integration
    • Tasks for SA3 tester to be created

Antonio: we are expecting to have the full set above available in certification by Monday/Tuesday.

The concurrent release of SLC5 WNs is scheduled by the 23rd-Mar , so I don't think it is realistic to release this set before the 8th. The release should be done before mid-April though, then the sooner the better.

Gianni: Is the pre-certification being done using the test suite developed by INFN

Massimo: yes, except the tests of the submission through the WMS to verify the regression of BUG:47996 which are done manually. All the tests done by Alessio and Paolo in pre-certification are described in the patches.

Antonio: I think that this test should be inserted as a possible regression test in the test suit.

Alessio and Gianni will talk to see how regression testing is handled in the current tools

Open Issues (by VOs, sites, deployment teams)

List of Open bugs and relevant decisions

BUG:47911

Recommendations for release and deployment

Decision about termination/extension of the pilot

Antonio recaps the plans for the next future:

  • The new set of patches will be moved to production
  • after that release PIC and FZK will start running at full production mode at that version and will be formally out of this pilot
  • the pilot will be focused on a certain numbers of ICE WMSs (as many as the users need)
  • The installations at PADOVA and CNAF will be still available for pre-certification
  • There is the proposal, moved at the last gdb to dedicate a time slot of these installation to the test of the performance criteria for the replacement of lcg-CE as being developed by Nick. There coul d be done by using Laurence's test suit.

Massimo: using the infrastructure in PADOVA and CNAf at this purpose would mean that we have to stop the pre-certification activity for the duration of the tests. This will cause some delays for us. The criteria have to be tested though. If there is no alternative we'll do it.

Antonio: This is understood. There is a trade-off though. The alternative would be to set-up another little infrastructure dedicated to these tests, which could make the overall time-to-production get longer. The good thing about your infrastructure is that it is based on several hosts and not on virtual machines, so the results of the tests would be more significant

Massimo: what about those criteria which imply ~ one month of testing?

Nick: We are probably going to re-phrase it. As CREAM is now in production we can rely on feedback provided directly by the site administrators and we won't need to run a test against a dedicated instance of CREAM. Checking the criteria I think that a test using could be carried on in about a week.

Massimo: please don't start without telling us.

Antonio: of course the activity will be coordinated, also because before running any test SA3 we'll need exact info about the versions installed. This will probably by you some time to work before on the test of the fix for BUG:47911

Nick: BTW which are the fixes which in your opinion we should wait for before starting the tests?

Massimo: At least PATCH:2748 and PATCH:2459. Then possibly whatever fixes BUG:47911

Nick will bring at the next check-point a possible timeline for these tests, to be discussed with Laurence.

Open Question Nick: are the criteria relevant for ICE submission or direct submission. Do we need two set of tests? (to be studied)

Next check point will be on the 1st of April at 10.00

AOB



This topic: LCG > WebHome > LCGGridDeployment > GLitePreProductionServices > EGEE_PPS_Coordination > PPSMeetings > PPIslandKickOff > PPIslandFollowUp2009x03x13
Topic revision: r3 - 2009-03-16 - AntonioRetico
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback