PPS Pilot Follow-up Meeting Minutes Tue 22 Jul 2008
- Date: Tue 22 Jul 2008
- Agenda: 37274
- Description: pilot of Cream CE: check-point
- Chair: Antonio Retico
Attendance
- PPS: Antonio Retico
- CNAF: Daniele Cesini, Danilo Dongiovanni
- FZK: apologise
- UPATRAS: -
- PIC: -
- Cream (Cluster of Competence): Massimo Sgaravatto, Alessio Gianelle, Sara Bertocco
- SAM: represented by Antonio
- Nagios: -
- CMS: Enzo Miccio
- Alice: Patricia Mendez
- Atlas: -
- LHCb: -
Status of the pilot service (by VOs and sites)
progresses on tasks
Status of the subtasks of TASK:7143
.
Notes:
- TASK:7139
: set-up of VO SW area on WN still in progress. This is because at CNAF the SW area is on gpfs and experts were called to help. The work has been slowed down also by the unscheduled downtime the site suffered. Patricia (Alice) said that however they could make progress on the other CE installed at FZK
- TASK:7142
: everything is ready. The task is kept open. Will be closed as official start of phase2.
- TASK:7153
TASK:7144
: SAM team is at work. We needed to set-up a new environment (SAM server DB, webservice, bdii2oracle script) first. Now the existing cream CEs in PPS are in our DB and CYFRONET is working to the submission. This is the same test as proposed by CMS, except that it is done using the ops test. This is a message for CMS as well that now we are ready to receive their SAM tests. In parallel, and with an eye to avoid overlapping, the SAM team is studying new tests. the new version of bdii2oracle script, enable to gather CREAM CEs is going to be released in the production instance of SAM as well. In addition to that it turned out that also FCR needs a bit of configuration. A new TASK:7427
has been opened for that
- TASK:7157
no progress reported
- TASK:7159
cream CEs are now available in the pps information system, but gstat does not show them, so it probably needs extra configuration. gstat developers will be contacted0 ( after the meeting GGUS:38891
was opened at this purpose)
- TASK:7274
, TASK:7279
: The service management tasks were updated as a consequence of the decision to extend the pilot (see later)
- TASK:7278
: Done. the management of the VOBOX at FZK should be included in TASK:7274
. The nodename should be mentioned in the pilot description
Feedback Alice (Patricia)
Alice is working with the site at FZK in production mode and real conditions. The WMS based submission using the VOBOX is failing due to
BUG:37563
(number of proxy delegation limited to 9)
Massimo explains that the usage of VOBOX, by adding extra levels of delegation hits the limit imposed by the bug in VDT. This limit in not reached in case of VOBOX submitting to lcg-CE (BLAH missing in the chain) or VOBOX Submitting directly to the
CREAM (no WMS in the chain). There is a patch for that in certification (Actually two patches,
PATCH:1981
, ready for certification and
PATCH:1979
, in certification, for 32bit and 64bit architecture respectively)
Patricia: Direct submission to
CREAM was successful. Thanks to Massimo Sgaravatto for his help. The submission worked very well. Changes to Alice JDLs were necessary. Now we are changing the LCG module in Alien to enable submission to cream CEs.
Massimo: what is exactly Alice's idea? Do you want to submit using the WMS in the future and now you are doing the direct submission as a workaround or is the direct submission the option you prefer?
Patricia: Alice has always pushed to have the possibility to do direct submission. Before Cream this was impossible. So the idea is to submit directly to the CEs. We want however to have at the same time the chance to submit through the WMS as a fall back solution.
Feedback CMS (Enzo)
Enzo: Not much to report due to vacations
Antonio: As I said before, we are now ready to receive SAM tests from CMS. We will do the submission with the OPS use cases, and it would be useful if CMS could verify their ones as well
Update from JRA1
Status of ICE WMS
Massimo: We released two patches
PATCH:1755
and
PATCH:1790
with cream and CLI. These patch are now certified. We still don't have on official WMS+ICE.
PATCH:1841
contains ICE+WMS + several bug fixes. We are still testing internally
PATCH:1841
and we are not able for the time being to foresee the delivery date. The WMS+ICE installed in PPS following Alessio's instructions is usable but has known issues. Functionality and interfaces are however unchanged.
Initial planning of phase2
Antonio: As already agreed two weeks ago, before moving to phase2 we will wait for this patch to be available. This will give an extra value to phase two. In consideration of the status of
PATCH:1841
I think we should stay in this configuration for a bit longer
Massimo: This does not seem to be impacting Alice for the time being. As far as the WMS is concerned what is exactly the point of moving the pilot in production?
Antonio: the current configuration is OK but is not suitable for scalability tests, because the batch resources behind don't allow heavy submission. SAM tests can be done anyway. Also CMS knows how to direct their test to the PPS system
Massimo: As a condition to move to phase 2 do you expect to have
PATCH:1841
certified before or just something that in our opinion is ready for these tests?
Antonio: We expect to have this patch in a status that reflects the following conditions:
* closed and delivered to certification
* installable
* not breaking the system
* sufficiently documented
This status is somewhat intermediate between "Ready for Certification" and "Certified". This status does not exist in Savannah, but most probably will be introduced.
The conditions in which this pilot was started were very particular, with an earlier version of the software in certification (according to the old process) and a newer one in PPS (following the new process).
Basically what we would expect as a pre-condition to start the pilot we expect two assertions to be done
- JRA1: "the software is in our opinion ready to be given to the users" --> Patch in "Ready for Certification"
- SA3: "the software is in our opinion safe to install and suitable for beta testing" --> patch in "Ready for Pilot" (tentative status name)
In this pilot I played the role to be of SA3 and, by evaluating the status of the software and the documentation, I judged the status of the software proposed as stable enough to start working on it with no major pain
The conclusion is that in this very moment we don't need to move to phase2 two because everybody seems to be able to work in this configuration, All activities (SAM, gstat, Nagios, experiments) are progressing. So there is no point in complicating the system.
Recommendations for release and deployment
Summary of technical issues which we want fixed before moving to phase2:
Decision about termination/extension of the pilot
In consideration of what discussed above, the decision is made to extend this phase of the pilot within the PPS infrastructure and to delay the deployment in production of five weeks from now.
The long delay is due to the fact than many of the actors will be on vacation by mid-August. On the other side, the existing services have to be kept up and running in the meantime because they are now part of Alice's production pool.
The next check-point meeting will be on the 26th of August
AOB
Antonio: What do we do with the version of cream coming now out from certification? This is now arriving to PPS, so, following the old process, it will be installed in the PPS sites and then moved to production. Between the PPS and the production phase normally an activity from the experiment is expected, but in this case it does not make sense, because the experiment are already working on the newer version. Normally in these cases a pre-deployment test is done and then the software made available immediately to production without further staging in PPS. This is what we think to do in this case as well. Are there any objections to this idea?
We have also to consider that the production SAM, following the modifications done in PPS, is being re-configured to see the Cream CE (extract them from the BDII). Tests are not critical for the time being, We are trying, with this pilot, to minimise secondary effects, but we must be prepared to have something failing
Massimo: it has to be clear that
PATCH:2001
, the new version of cream, has more bugs fixed, but all the critical bugs found by Di in the
the now certified version of
CREAM (
PATCH:1755
) were fixed. So the "old" version has nothing critical and it is perfectly safe to install
Of course all the fixes in
PATCH:1755
were reported on
PATCH:2001
Daniele:it is however important to do the pre-deployment of
PATCH:1755
and we'll do it. So, wouldn't it be a good idea to attach these machines to the pilot and let the same tests flow in ?
Massimo: It is also important to notice that the installation instructions in
PATCH:1755
were slightly improved in the meantime with respect to the ones originally used in the pilot so we confirm that a pre-deployment test is necessary
Antonio: Yes, this sound like a good idea, we'll do it and distinguish these services on the web page
Actions
See child tasks of
https://savannah.cern.ch/task/?7143
-->