PP Kick-Off Meeting Minutes Tue 15 Apr 2008

  • Date: Tue 15 Apr 2008
  • Agenda: 32129
  • Description: Pilot service of SL4 WMS at CERN-PROD
  • Chair: Antonio Retico

Attendance

  • PPS, CERN: Antonio Retico
  • CMS: Enzo Miccio
  • Atlas: David Rebatto

Service Introduction (by PPS)

The gLite WMS and LB services for SL4 were released to PPS. In parallel with the usual PPS cycle, a pilot service of the new WMS + LB will be run at CERN-PROD in standard production configuration. Users are invited to make use of the system and provide feedback.

CNAF (not present at this meeting) has also expressed earlier the availability to participate to the pilot.

Use cases (by Users)

CMS

CMS is already using several WMSs at CNAF, with satisfying results. The service starts being used in "test mode" by CMS users for production MC analysis.

The idea is to add the pilot WMS at CERN-PROD to this set of machines and leave CMS users access to it transparently. The service could easily be put out of production by CMS in case of problems.

Specifically this means that the pilot service will not need to be tagged with special flags in the information system

CMS is aware that running the service in this configuration could affect in some way, in case of unexpected failures, the production traffic of jobs. So it is clear that, as it happens currently for the other WMSs in this set, the pilot should not be used for mission-critical applications.

On top of that, in order to stress the system and verify its reliability, CMS plans to connect the submission robot and reach a consistent submission rate (see later in the "Metric" section) .

The combination of the two submission method should allow to verify stability and reliability of the system on the long term as well as its capability to respond to more realistic analysis-driven use cases.

ATLAS

ATLAS agrees with the general approach of CMS.

They are using already the new WMS at CNAF and Milan with alternate results.

Metrics (by Users)

Both VOs are requested to provide a metric to define the success of the pilot.

CMS suggests a reliability test based on a 20kjobs/day submitted continuously over a week, to be confirmed and eventually discussed with the certification team


Post Scriptum (Antonio, PPS):
After a quick check with the certification team it turned out that the maximum rate sustained by the WMS in certification was 15kjobs/day, which already exceeded by 50% the acceptance criteria, fixed to 10kjobs/day. Test with highest submission rates are possible and encouraged to verify the limits of the system, but they should not be used as a condition for acceptance.

General Agreement on Service Level and Conditions

Both VOs point out the opportunity to have a dedicated service per VO (as in the normal production scenario). This in the order of keep the service under control and avoid interferences.

The point is generally taken by PPS. The availability of CERN-PROD to run more than a pilot service has to be verified, though, with the service manager. Eventually one of the two services could be run at CNAF.

Neither Atlas nor CMS have strong preferences about running the pilot at CNAF or at CERN. However the first option would be to have both services at CERN.

Atlas points out that a custom configuration action (creation of indexes) is needed on the LB, which has to be performed by the service administrator in addition to the standard configuration.

Atlas also stresses its expectation for a high responsiveness and reactivity by the persons in charge of the service during the pilot operations.

Timeline

As the service has already passed the pre-deployment test in PPS the service should be set up as soon as possible.

Unfortunately the administrator of CERN-PROD in charge of the pilot will not be back before the end of this week. It is reasonable to expect the service at CERN to be ready by the 23rd of April. CMS/Atlas will be informed when the relevant services are ready, and they should start the operations immediately after.
A very conservative deadline for the commissioning of the pilot service by the sites and the start of the operations by the VOs is however set as the 27th of April

Two weeks of continuous VO operations are expected (ending in the worse case the 12th of May), after which, in case no problems arise, the service will receive the green light to wide deployment in production, with the best possible level of confidence.

A deadline for the whole pilot activity is set to the 20th of May (including a grace period for possible delays).

If the success condition is not met by that date, and no significant decisions or re-negotiations are made earlier, another meeting will be held with all the parts to decide about the future of the pilot.

AOB

Actions

Assigned to Due date Description State Closed Notify  
EnzoMiccio 2008-04-17 Provide a quantitative success criteria for the CMS testing on the pilot WMS

Reply (18/4/08): steady state real user activity for the long period (1-2 weeks) without major misbehaviours

2008-05-14 edit
DavidRebatto 2008-04-17 Provide a quantitative success criteria for the Atlas testing on the pilot WMS

Reply(18/4/08): steady state real user activity for the long period (1-2 weeks) without major misbehaviours

2008-05-14 edit
AntonioRetico 2008-04-17 Clarify availability of CERN-PROD to run more than a service instance and confirm CNAF availability.

Reply (18/4/08): Availability of both sites confirmed

2008-05-14 edit


These minutes can only be changed by members of:

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2008-05-14 - AntonioRetico
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback