PPS all-sites meeting minutes

  • Chair: Antonio Retico, Nick Thackray

  • CERN_PPS: Danica Stojiljkovic;
  • CESGA-PPS: Alvaro Simon, Javier Lopez
  • DESY-PPS: Christoph Wissing
  • PPS-CNAF: Daniele Cesini, Danilo Dongiovanni
  • PPS-UPATRAS: Vasilios Kolonias, George Goulas
  • UKI-LT2-IC-HEP-PPS

Hand-over in PPS coordination (by Nick)

Nick, who has been running the service since early 2005, has been assigned other tasks within SA1. Antonio is taking over as PPS coordinator and he hopes to be able to do as as well as Nick did.

Introduction (by Antonio)

PPS is going through a major re-organisation in these days. This process has started during the EGEE07 conference at the end of September. In order to keep track of this process, six tasks were defined. They were presented to the Regional Managers at the EGEII-EGEEIII transition meeting. The aim of this meeting and of the two that will follow in the next two months is to monitor the status and progress of these tasks. Although the main concepts and motivations of the re-organisation have been discussed now in several occasions, as we didn’t have a all-sites meeting for a long time, we I’ll go briefly through some of the important points which should be clear to everybody in order to understand what we are talking about.

The goal of this meeting is to share with the sites the last changes in mandate and strategy of PPS for EGEEIII, to verify the status of the re-organisation tasks and of course to answer questions risen by PPS site administrators.

Changes in mandate and strategy (by Antonio Retico)

The first important general concept is that in EGEEII the PPS is not supposed to run all the gLite services full-time and quasi-production level as it does now. The PPS resources will be dedicated to two main threads or areas of activity: the pre-deployment testing (or middleware quality services, MQS) and the set-up of pilot services in production (or Middleware preview services, MPS). So all the PPS sites/site admins, are requested to have a place in the framework of one or both these activities.

The general idea of the MQS is to cover with a deployment test possibly all the deployment scenario of some relevance for the production. The functional testing of the installed middleware service is very limited. Limited to Sam tests when applicable or to simple test cases implementable by the administrators themselves. This will be one of the two major use cases of the SAM infrastructure in PPS (we’ll se the other one later on) The output of the MQS, as it happens now for the pre-deployment test reports, will be taken into account within the preparation of the release to production.

The MQS work roughly the same way the pre-deployment test works now with few important exceptions:

  1. There will not be anymore a central test coordinator (role currently covered by Mario David) responsible for collating and editing the service-specific update reports into a general update-report. In the new model, each deployment scenario we cover will have a responsible or coordinator, who will handle the test cases, the test execution and reporting independently, holding the full responsibility for the test. The test managers will have of course to share procedures and templates, which are for the time being only partially available.
  2. The PPS Sam infrastructure will continue being operated but at a reduced rate (only one site). This is a natural consequence of the fact that the PPS services (namely those published in the PPS BDI)I will be there only for deployment testing purposes. So they will be discontinuous by definition and the need for an ultra-reliable monitoring is reduced.

Questions:

  • David Colling: The MQS seem to be highly overlapping with the deployment test done with the SA3 certification testbed.
  • Antonio: The test case is sligtly different because different is to condition of the patch when it reaches the certification and when it reaches the PPS. Normally the difference is in the release notes, which constitute the object of the PPS deployment test. However, the risk of duplicating work is real and that's why people from SA3 (namely Louis Poncet) is now formally involved in the PPS activity. I envisage a future where the SA3 distributed testing and the PPS deployment testing work in close synergy. Our general guideline is to save effort and not to overspend it

The MPS is the really new concept in PPS. This service area is based on the idea of replicating/integrating in PPS the successful interactions between users and developers typical of the so-called “pilot” or “experimental” services. The pilots have successfully dealt with the certification of new core services like the WMS and main versions of the client tools (using the shared area set up by certification).Although effective, this effort wasn’t spent in the framework to any formal service. That’s why we thought of introducing the concept of pilot in the offering of the pre-production service. There are important differences between the pilots and the services we have been running so far (and actually we are still running)

  1. Service instances are set-up upon somebody’s demand and not “by definition of PPS”. E.g. we are now interested, as SA1, in running a pilot of Cream CE. That’s because we want to verify the behaviour of SAM and other monitoring tools and figure out some operations scenarios. We may put it in other words: the ops VO is interested in a pilot of Cream, holding certain characteristics. We hope of course that other user VOs, like the experiments’, at some point will want to try out an early deployment of Cream CE to see how it interacts with their applications. Should it not happen though, the PPS would limit its contribution to the pre-deployment test, without betraying at all its own mandate.
  2. In most cases the services set-up in preview will have to be published in the production BDII by production sites. The distinction of a PPS service from a production one will not be based on the BDII, but on a particular attribute in the information published (GlueServiceStatus) . As a consequence of that the distinction made in the GOCDB between production and PPS sites will make less and less sense. For sure we will need a way to register people working in production sites and doing preproduction activity to have a complete inventory of PPS resources. A first case in this respect is is represented by a couple of sites in the CE region who are willing to help with release testing (check of the release notes before production) These are at all effect production sites, so they don’t need to be registered in GOCDB as PPS sites

Tasks for transition

In order to illustrate the tasks for transition. Antonio went through some slides of the presentation about mid-term planning given to the last SA1 Coordination meeting.

The slides are attached to the agenda ( http://indico.cern.ch/materialDisplay.py?materialId=0&confId=36928) Namely slides 11, 12, 22-28 were presented

A summary of the progress report and comments follows:

MQS

  1. Extend pre-deployment testing

    • Find gaps in the current testing
    • Re-convert existing PPS sites to cover gaps
    • Appoint service test managers
    • Adapt tools for test reports
    • Start Operations
    • Start date: 10 Jun
    • Due date: 21 Jul
    • Task Coordinator (proposed): Mario David
    • Team: Antonio (CERN); Esteban, Alvaro (CESGA)
    • Regions/sites concerned: ALL

    • Antonio publicly apologises with Mario David, leader of the task, because, although the task started long time ago, there was no occasion yet to meet and discuss the requirement in depth. Something has already been produced by Mario in term of candidate test managers, but Antonio reserves of giving further comments only after having discussed with Mario
    • Question (Alvaro): is the system set-up by Pablo Rey at Cesga going to be used for deployment reports
    • Answer (Antonio): This is not sure yet. One thing that was observed during that development was the need for a deep synchronisation with attributes in Savannah. This has given us the idea to start using Savannah nore intensively to manage the pre-deployment tasks first and in general all PPS technical tasks. When we will be further in the analysis of the MQS we'll be able to say if we will need this external database in support of the Savannah tracker or not
  1. Decommission permanent service

    • Disconnect sites not used in pre-deployment
    • Notify concerned Users
    • Start date: 29th Jun
    • Due date: 13th Aug
    • Task Coordinator (proposed): Antonio
    • Team: -
    • Regions/sites concerned: ALL

    • not started yet

MPS

  1. Review Release Procedure

    • Define new use of CNAF repository
    • To be used for pilot services
    • Cut 2nd part of the procedure (Upgrade of PPS Sites)
    • Start date: 10th Jun
    • Due date: 3rd Jul
    • Task Coordinator (proposed): Danilo Dongiovanni (CNAF)
    • Team: Antonio, Esteban, Alvaro, Mario
    • Regions/sites concerned: CERN, SWE, IT

    • A draft of the new procedure was produced and it is now under review. Danilo does not want to disclose anything before this preliminary review is done
  1. Set-up Client Preview Mechanism

    • Set-up pilot instance of tool for client distribution
    • Indentify sites in production for BC and NBC client updates
    • Identify site to run the tool once it�s set up
    • Start operating client release at CERN_PPS
    • Export changes to concerned production sites
    • Review PPS Release Procedure accordingly
    • Integrate changes with the overall release process
    • Follow-up decommissioning of existing PPS UIs and WNs
    • Start date: 24th Jun
    • Due date: 21st Aug
    • Task Coordinator (proposed): Antonio
    • Team: Andreas (from SA3), Mario
    • Regions/sites concerned: initially CERN and LIP, then others

    • Not started yet, Karolis Eigelis, from CERN_PPS has joined the team and will start working to the set-up wihin the next 15 days
    • Question (Alvaro): How long will the release continue with the existing procedure?
    • Answer (Antonio): the will go on until the service won't be fully decommissioned and a replacement for the client tools will be available, so, likely, until the end of August

SUP

  1. Set-Up Activity Management and Reporting System

    • Create and customise Savannah project
    • Define tasks and task weights
    • Populate project users database
    • Develop tools for automation of task creation and assignment
    • Start operations and fine-tuning
    • Develop tools for reporting and accounting
    • Start date: 19th Jun
    • Due date: 5th Sep
    • Task Coordinator (proposed): Antonio
    • Team: ?
    • Regions/sites concerned: Initially CERN, ?

  1. Documentation (service description, website, EGEE08)

    • Service Description
      • Finish implementation of use cases and effort estimate (URGENT!)
    • Website
      • Changes in the mandate (urgent)
      • New usage rules (moderately urgent)
      • New service layout (later)
    • EGEE08
      • Contribution �The New PPS� (later)
    • Start date: 10th Jun
    • Due date: 16th Sep
    • Task Coordinator (proposed): Antonio
    • Team: ? Volunteers welcome
    • Regions/sites concerned: CERN

    • The task will go through the whole summer with the overall goal of bring the website to a consistent state and presentations/posters/papaers ready for EGEE08

Next meeting

29th of July on EVO (to be confirmed)
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2008-07-02 - AntonioRetico
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback