LCG Grid Deployment - CERN ROC - ROC Incubator

Plan and milestones for the creation of the new ROC Iniciativa de Grid da America Latina e Caribe


  • ROCName: IGALC
  • StartDate: 8-Nov-09
  • DueDate: 18-Dec-09
  • Status: In Progress

This page is used to define requisites, plans and milestones in the set-up of the new ROC IGALC and bring it up to operations.
This main actors of this process are the ROC IGALC (candidate) and the ROC CERN (incubator).
In all of the following, we will assume that both parties have read the Definition of a ROC and associated functions

References will be used also for the socument describing the step-by-step creation of a new ROC.

Requisites for the new ROC personnel.

The following is the minimal set of pre-requisites that need to be fulfilled start before the new ROC is set into operations.

  • Training-on-the-job (minimum 1-month for the future technical manager(s)
    • The training is completed by participating to the ROC CERN ROD activities in order to get familiarity with the infrastructure and procedures. For a maximum efficiency it's highly recommended to arrange a stage at CERN at this purpose.
    • I can be done in parallel the technical task for the creation and go along throughout the whole execution of the project. It's advised however to start it as soon as possible in order to profit of the experience during the design of the regional infrastructure.
    • Equivalent recent experience can be accepted if certified by another ROC.
  • A record of 4 attendances to the weekly OPS meeting

These pre-requisites are posed in the interest of the sites to be operated in the scope of the future ROC
In addition it would be good if the following started at the same time:

  • Operating Nagios at regional level
  • Operating the top level BDII

Design of the Infrastructure

The infrastructure would need to be defined well in advance, at least one month/3 weeks prior to the starting of the operations. A typical infrastructure may require:

  • If relevant, the registration of a new DNS domain name
  • The name of the new ROC, the ROC manager, the ROC manager's deputies, the ROC security contact.
  • mailing list to contact the ROC, for ROC management purposes
  • mailing list to contact the ROC for security purposes
  • mailing list to contact the ROC for user support purposes
  • Web site for the new ROC as a reference point for the public and maybe the administrators of the sites belonging to the ROC
  • Internal twiki or other collaboration tool targeted at the members of the ROC team
  • A regional top-level BDII (accessing all the sites in the EGEE infrastructure)
  • Certification infrastructure
    • A certification bdii, containing the entries that are available in production, plus the services run at the site(s) to be certified
    • A WMS able to submit to the CE(s) belonging to the sites to be certified
    • A ROC Nagios

Tasks for ROC CERN (before kick-off meeting)

  • Produce ROC start-up project wiki (this page) : done
  • send information documentation/request contacts and draft infrastructure plan: done
  • call kick-off meeting: done
  • Provide draft start-up plan: done

Tasks for ROC IGALC (before kick-off meeting)

  • Provide contacts : done
    • Mailing list
    • Responsible people
  • Definition of service infrastructure : done
    • systems layout : done
    • User support (GGUS/not GGUS):done
    • Sites to move: done

Joint tasks (meeting)

  • Review and Approve execution plan: done

Input documents (before kick-off meeting)

General Documentation (by ROC CERN )

Draft Execution Plan (by Antonio Retico)

A plan for the start-up has been drafted by the CERN ROC.

The main constraint to the schedule is represented by the schedule of the GGUS releases . In fact due to many dependencies all the operational tools have to be re-configured in a nearly synchronous way, which can happen only in proximity of a GGUS release, which is scheduled once per month

Excluding that we manage to catch the November release (not feasible), we are left with two windows around the 16th of December and the 27th of January.

The December window forces us to adopt a (very tight schedule that will deliver the new ROC IGALC in operations just before the Christmas Holidays. It requires very focused commitment by IGALC. To be discussed.

The January window allows us to adopt a (second schedule) that will deliver the new ROC IGALC in operations at the end of January. The tasks are of course the same so this plan is not necessarily more relaxed than the first. Only it only later so it gives the new ROC more time to get used and digest the many new bits of information and procedures they are dealing with these days. Furthermore it fits better with the dates proposed by IGALC for visiting CERN. Compatibly with the deadlines of IGALC I would vote for this second plan.

Tentative milestones have been set accordingly in this document (dates from the January schedule in square brackets)

  • December plan StartUpIGALCDecember.PNG

  • January plan StartUpIGALCJanuary.PNG

Proposal for Infrastructure (by Diego Carvalho)


  1. DNS domain name: igalc.org
  2. name of the new ROC:the ROC manager, the ROC manager's deputies, the ROC security contact.:
            ROC Name: IGALC

                        PT-BR:            Iniciativa de Grid da America Latina e Caribe
                        SP:                   Iniciativa de Grid de America Latina y Caribe
                        EN:                  Latin America and the Caribbean Grid Initiative
                        FR:                  Initiative de Grille de l'Amérique Latine et Caraïbe 

            ROC Manager:                                    Diego Carvalho ( d.carvalho@igalc.org )
            ROC Manager Deputy:           Ramon Diacovo  ( ramon@igalc.org )

            ROC Security Contact:           Diego Carvalho ( d.carvalho@igalc.org )

            ROC Services Operator:         Frederico de Oliveira ( fred@igalc.org )
            ROC System Manager:           Allan Alvaro ( allan@igalc.org ) 
  1. mailing list to contact the ROC, for ROC management purposes: roc@igalc.org
  2. mailing list to contact the ROC for security purposes: security@igalc.org
  3. mailing list to contact the ROC for user support purposes: support@igalc.org
  4. Web site for the new ROC as a reference point for the public and maybe the administrators of the sites belonging to the ROC: http://www.igalc.org
  5. Internal twiki or other means of communication targeted at the members of the ROC team: http://www.igalc.org/wiki
  6. Certification infrastructure
    • A ROC bdii, containing the entries that are available in production, plus the services run at the site(s) to be certified: cert-is.igalc.org
    • A WMS able to submit to the CE(s) belonging to the sites to be certified: cert.eela.ufrj.br
    • A ROC Nagios: nagios.igalc.org
  7. Desired date to start: ASAP

Output Documents (after kick-off meeting)

Minutes of kick-off meeting


Description: kick-off of the new ROC IGALC

Date: 11-Nov-09

Agenda:

  • presentation
  • assessment of the required infrastructure
    • services
    • sites
    • support structure
  • review and approval of schedule * tasks and milestones

Chair: Antonio Retico

Participants:

  • ROC IGALC: Diego Carvalho, Frederico de Oliveira, Ramon Diacovo
  • ROC CERN: Antonio Retico
Discussion: (these minutes are composed out of the discussion and a few subsequent e-mail exchanges)

*Presentation*
Short introduction of the parties. Antonio is deputy ROC Manager at CERN since 2005 and is in charge of following up the creation of the new ROC. IGALC operates in the framework of the EELA project where operations procedures (e.g. operators sending ticket to the sites based on the results of monitoring tools) are already existing and applied. In particular the project has got experience with SAM (running their own instance), Nagios (service running in Spain) and gLite middleware services such as BDII, VOMS and WMS. So they believe that startng operating as an EGEE a ROC will be just a matter of adapting procedures that they already apply to the new requirements.
Antonio points out that the implementation of regional procedures of a ROC are totally under the ROC's control, provided that the EGEE parameter are met (in terms of quality of service). Currently the QoS is defined by the minimal availability and reliability requirements and the control mechanism is instrumented in EGEE by the COD (Central Operator on Duty) that watches on the timely follow-up of alarms across the regions. e.g. the EGEE requires that the sites are able to meet certain minimal a/r criteria based on the results of the SAM, but doesn't define the site certification procedure. In the framework of the start-up of the new ROC the regional site certification procedure is to be defined and it will be checked by ROC CERN in order to verify its compliance to the existing EGEE metrics.

*Assessment of the proposed infrastructure*

Antonio notices that some of the links provided (e.g. the web site and the domain) look like placeholders. He points out that AFAIK EGEE is concerned the only real requirement is that ROC contact points are duly defined on a public web and responsive . The web site (as the wiki for internal documents) is not a requirement in itself but rather a best practice based on experience. A dedicated section in other existing web site is perfectly acceptable. As an example Diego shows the EELA operations web. Antonio insists on the responsiveness of the contact points (mailing lists) provided. He will use them throughout the whole creation process to test.

A clarification on the requirements about BDIIs:

  • there is one non-functional requirement coming from EGEE for each region to run a top-level BDII. This BDII is a "reflector" that has to contain all the EGEE sites (list to be provided) and not only the sites pertinent to the region. It should be accessible by all EGEE users (although it will be preferably accessed by users in the region). This requirement is meant to distribute the load on the information system among different servers.This BDII in not to be considered a ROC operational service but has to be run by a site in the region.
  • for site certification purposes the ROC is recommended to run a top-level BDII as an operational service. This one can contain a subset of the sites (e.g. only sites in the regions, both certified and uncertified). This is not a requirement but a best practice.

Sites:
Sites in predicate for the migration to the new ROC are recognised to be UFRJ-IF and CEFET-RJ

Support:
The ROC is fully responsible of the dispatch of tickets internally, provided that the GGUS workflow is satisfied. Antonio noticed that different choices have been made by the various ROCs in that respect. A longer permanence at CERN for training would allow to have a better idea of the implications of running a regional TT system. This will however be out of the scope of the start-up because Diego confirms that the visit at CERN won't last longer than one week.

Approval of the plan: The plan presented was reviewed and approved with no reserve. A particular attention has to be given to the participation to the CERN ROC Rota which is difficult to fully achieve remotely. Agreement to use IM as a fast communication method.

The following points will be attacked initially by IGALC

Implementation

Tasks for ROC IGALC (after kick-off meeting)

Task Start Date Due Date Status
Complete the check-list to join as the CERN ROD rota 9-Nov-09 16-Nov-09 In Progress
CERN ROD rota 16-Nov-09 20-Dec-09 In Progress
Installation of the ROC Nagios 17-Nov-09 27-Nov-09 not started
Installation of the ROC bdii 30-Nov-09 02-Dec-09 done (to be checked)
Set-up of the user support infrastructure (GGUS) Done
Installation of the certification infrastructure 3-Dec-09 07-Dec-09 In progress
Communication of the ROC security contact and mailing list to OSCT 18-Nov-09 18-Nov-09 not started
Setting up of the VOMS dteam infrastructure 10-Dec-09 15-Dec-09 not started
Sign SLA with sites 16-Nov-09 23-Nov-09 not started

Tasks for ROC CERN (after kick-off meeting)

Task Start Date Due Date Status
assisting in the installation of the ROC infrastructure 03-Dec-09 07-Dec-09 in progress
Supervision of the ROD shifts 16-Nov-09 20-Dec-09 In Progress
Enable the ROC IGALCover various operational tools 04-Dec-09 14-Dec-09 In Progress
Communicate to dteam the creation of the new ROC dd-Mon-YY dd-Mon-YY not started
Notify SA1 management to include the new ROC representatives in the ROC managers mailing list 10-Dec-09 10-Dec-09 Done
E-mailing the ROC CERN sites of the change of ROC dd-Mon-YY dd-Mon-YY not started
Run check and tests against the new ROC infrastructure dd-Mon-YY dd-Mon-YY not started
dd-Mon-YY dd-Mon-YY not started

Input documentation (for implementation phase)

Output documentation

Description of the infrastructure


Final assessment


Milestones

  • Start Date: 8-Nov-09 [30-Nov-09]
  • Start/finish participation in the ROC CERN ROD activities: 8-Nov-09 [30-Nov-09]
  • Start Participation to the OPS meeting: 8-Nov-09 [30-Nov-09]
  • Finish Design Phase: 12-Nov-09 [3-Dec-09]
  • Relevant GGUS releases: 16-Dec-09 [27-Jan-10]
  • Checkpoint to see that the ROC infrastructure is completely defined: 17-Dec-09 [27-Jan-10]
  • Start of Operations: 18-Dec-09 [28-Jan-10]

After the creation of the ROC (tasks for ROC CERN )

  • Check regular participation to the OPS meeting
  • Monitoring of the new ROC ROD activities, assist wherever possible
  • Check status of new ROC web site/twiki, assist wherever possible
  • check at the end of the first and second month after the ROC is set up that they are correctly publishing
    • accounting
    • site availability/reliability
  • Follow closely and assist in the certification of the first new site in the new ROC

Topic attachments
I AttachmentSorted ascending History Action Size Date Who Comment
PNGpng StartUpIGALCDecember.PNG r1 manage 95.1 K 2009-11-05 - 01:42 UnknownUser  
PNGpng StartUpIGALCJanuary.PNG r1 manage 107.5 K 2009-11-05 - 02:28 UnknownUser  

This topic: LCG > WebHome > LCGGridDeployment > CERNROC > CERNROCIncubator > CERNNewROCIGALC
Topic revision: r6 - 2009-12-10 - unknown
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback