PPS Pilot Follow-up Meeting Minutes Tue 25 Jun 2009
- Date: Tue 25 Jun 2009
- Agenda: 62251
- Description: Pilot of Cream CE: check-point
- Chair: Antonio Retico
Attendance
- PPS: Antonio Retico
- CERN: Absent
- FZK: Angela
- PIC: Absent
- CNAF: Daniele, Danilo
- CMS: Stefano Lacaprara
- GRNET: Absent (tried to connect)
- JRA1/Cream/WMS: Massimo Sgaravatto
- SA3: Gianni Pucciani, Nikolay Klopov
Review of action items (tasks)
Not covered.
Status and results of the pilot service (by VOs and sites)
Acceptance test by SA3
Gianni:GRNET has published tests results (results linked on the pilot wiki page).
Important to notice that the tests were done using machines allegedly less powerful than the ones used in production by many sites. In particular this means that we had to periodically clean-up the database in order to save space. While doing so we could notice that the submission rate sustained degraded when the database got filled.
Antonio: suggested to open a bug to track the issue observed with the submission rate. These test results provide very good information for the developers but we need as well to understand the real impact of the issue observed on the production-level machines. It is important to repeat as soon as possible the tests at CERN and FZK
Gianni will insist with GRNET to get the test exported as soon as possible.
Update received via e-mail form GRNET:
It seems almost certain to us that if we start running performance tests against CREAM instances that are not under our administrative control we will have the same issues that we had when we attempted to use other site's WMS's during testing PBS/Torque some time ago. This time it is liable to be even worse, since with the tests we have run so far we have found our CREAM instance to become very unstable. The people that oversee the test execution (us) need to be able to have immediate access to the site in order to analyze what is happening in realtime and when things go wrong to restore the stability of the system or make necessary adjustments and restart the test. This is unavoidable due to the long duration and stressful nature of these tests.
In any case, as Nikos already wrote, at this time we don't have a ready for release test plan and test suite.
Best regards,
Konstantinos Koukopoulos
CMS
Antonio resumed what was asked to CMS: to set up a parallel instance of CRAB using the WMSs at CNAF and GRIF. He confirmed that those WMS run the correct version of ICE and can also access to the lcg-CE in production and reported how CNAF T1 and PIC claim to be ready to receive CMS jobs.
Stefano: i will set-up the new CRAB instance and ask CMS users to try it to submit analysis jobs. But there is one issue. Sites supporting CRAM in production are mostly T1s and CMS users by policy are not allowed to run analysis jobs at the T1s. T1s will can be used by the standard production.
Antonio will follow up this issue in two ways:
- chasing the few T2s currently supporting CREAM o see if they can fullt support CMS (SW dir and data)
- meeting the production manager of CMS to see if part of the production can be diverted to use the pilot WMS
PIC
Absent
FZK
Nothing to report
Status and results of the development (by developers)
Massimo: there are two malfunctioning to report, both affecting the tests.
- PATCH:3044
has been produced o fix a problem in ICE dealing with the purger. This patch should be deployed soon.
- Antonio: this patch is on the critical path in certification (will be likely deployed in PPS directly). It will be for sure installed as soon as available by Michel Jouvin (GRIF) who detected the problem. So the submission chain will be fixed rapidly.
- BUG:47152
in LCMAPS affects glexec on CREAM if the user is mapped to a static account (that's the case for CMS). As a consequence the chain WMS3.2 --> CREAM --> production WN is stil not functional for CMS until the fix (PATCH:2973
, certified) is not released.
- Antonio: unfortunately there are doubts for the release to production of PATCH:2973
because of an issue seen by ATLAS at a site with this patch in the context of the SCAS/glexec pilot. This is going to be further discussed with Atlas in a meeting on the 26th of June.
PATCH:2666
is now in certification. This will go through the standard path and won't be object of a pilot unless it's needed.
Currently working on the release of
CREAM on SL5
Open Issues (by VOs, sites, deployment teams)
None
Recommendations for release and deployment
None
Decision about termination/extension of the pilot
the next check point is fixed for Monday 6th July at 10.00. It will be chaired by Gianni.
AOB