PPS Pilot Follow-up Meeting Minutes Tue 25 Jun 2009

  • Date: Tue 25 Jun 2009
  • Agenda: 62251
  • Description: Pilot of Cream CE: check-point
  • Chair: Antonio Retico

Attendance

  • PPS: Antonio Retico
  • CERN: Absent
  • FZK: Angela
  • PIC: Absent
  • CNAF: Daniele, Danilo
  • CMS: Stefano Lacaprara
  • GRNET: Absent (tried to connect)
  • JRA1/Cream/WMS: Massimo Sgaravatto
  • SA3: Gianni Pucciani, Nikolay Klopov

Review of action items (tasks)

Not covered.

Status and results of the pilot service (by VOs and sites)

Acceptance test by SA3

Gianni:GRNET has published tests results (results linked on the pilot wiki page). Important to notice that the tests were done using machines allegedly less powerful than the ones used in production by many sites. In particular this means that we had to periodically clean-up the database in order to save space. While doing so we could notice that the submission rate sustained degraded when the database got filled.

Antonio: suggested to open a bug to track the issue observed with the submission rate. These test results provide very good information for the developers but we need as well to understand the real impact of the issue observed on the production-level machines. It is important to repeat as soon as possible the tests at CERN and FZK

Gianni will insist with GRNET to get the test exported as soon as possible.

Update received via e-mail form GRNET: It seems almost certain to us that if we start running performance tests against CREAM instances that are not under our administrative control we will have the same issues that we had when we attempted to use other site's WMS's during testing PBS/Torque some time ago. This time it is liable to be even worse, since with the tests we have run so far we have found our CREAM instance to become very unstable. The people that oversee the test execution (us) need to be able to have immediate access to the site in order to analyze what is happening in realtime and when things go wrong to restore the stability of the system or make necessary adjustments and restart the test. This is unavoidable due to the long duration and stressful nature of these tests.

In any case, as Nikos already wrote, at this time we don't have a ready for release test plan and test suite.

Best regards, Konstantinos Koukopoulos

CMS

Antonio resumed what was asked to CMS: to set up a parallel instance of CRAB using the WMSs at CNAF and GRIF. He confirmed that those WMS run the correct version of ICE and can also access to the lcg-CE in production and reported how CNAF T1 and PIC claim to be ready to receive CMS jobs.

Stefano: i will set-up the new CRAB instance and ask CMS users to try it to submit analysis jobs. But there is one issue. Sites supporting CRAM in production are mostly T1s and CMS users by policy are not allowed to run analysis jobs at the T1s. T1s will can be used by the standard production.

Antonio will follow up this issue in two ways:

  1. chasing the few T2s currently supporting CREAM o see if they can fullt support CMS (SW dir and data)
  2. meeting the production manager of CMS to see if part of the production can be diverted to use the pilot WMS

PIC

Absent

FZK

Nothing to report

Status and results of the development (by developers)

Massimo: there are two malfunctioning to report, both affecting the tests.

  • PATCH:3044 has been produced o fix a problem in ICE dealing with the purger. This patch should be deployed soon.
    • Antonio: this patch is on the critical path in certification (will be likely deployed in PPS directly). It will be for sure installed as soon as available by Michel Jouvin (GRIF) who detected the problem. So the submission chain will be fixed rapidly.

  • BUG:47152 in LCMAPS affects glexec on CREAM if the user is mapped to a static account (that's the case for CMS). As a consequence the chain WMS3.2 --> CREAM --> production WN is stil not functional for CMS until the fix (PATCH:2973, certified) is not released.
    • Antonio: unfortunately there are doubts for the release to production of PATCH:2973 because of an issue seen by ATLAS at a site with this patch in the context of the SCAS/glexec pilot. This is going to be further discussed with Atlas in a meeting on the 26th of June.

PATCH:2666 is now in certification. This will go through the standard path and won't be object of a pilot unless it's needed.

Currently working on the release of CREAM on SL5

Open Issues (by VOs, sites, deployment teams)

None

Recommendations for release and deployment

None

Decision about termination/extension of the pilot

the next check point is fixed for Monday 6th July at 10.00. It will be chaired by Gianni.

AOB


Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2009-06-26 - AntonioRetico
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback