Open Action Items from SA1 Coordination

New action items should be added directly in the SA1ActionItemsDB. There are comments on the page to describe how to do this.

ID Description Creation Date Due date Submitter Assigned To State Seek Edit
Action on all ROCs to check gstat2 storage numbers with sites 2009-12-03 2010-02-15 Main.MaiteBarroso all.ROCs open (go to action)
000047 Action on GGUS to investigate the site support metrics defined in the SLA. Draft implementation plan by the end of October.
20091103 Update by Maria. Implementation date is the end Nov. GGUS release. Specifications in https://savannah.cern.ch/support/?110706
01/12/2009: pending on Maite to have a look at the proposal and give feedback
16/03/2010: ongoing implementation
2009-09-24 2009-10-30 Main.MaiteBarroso Torsten.Antoni (go to action) edit
000058 prepare a procedure to deal with most urgent vulnerabilities, including timelines to start suspending sites, so this can be easily enforced by ROCs.

This was discussed at the EGEE PMB level, with the following conclusion:

SUMMARY: it is the PMB who decides the timeline to start suspending sites. In previous critical vulnerabilities, it was set to 7 days.


The PMB considers the security of the infrastructure as of paramount to the operation and reputation of the project. The potential damage that such vulnerabilities could cause to the infrastructure both in terms of loss of service and the damaging publicity were of great concern to the PMB.

The project has mandated the Grid Security Officer to work with the ROC Security Contacts (as well as the JSPG and GSVG) to pro-actively manage security policy and its operational implementation. The use of the security monitoring tool to mimic normal user site access patterns and to discover non-intrusively the host configuration information in order to infer potential site vulnerabilities was fully supported by the PMB. The PMB was disappointed that despite the established and agreed operational structures, sites were slow in responding, or refused to perform the necessary routine systems maintenance.

All sites are reminded that there is an established policy and procedure that allows a site to be suspended in general if the site is deemed to pose an immediate threat for the infrastructure. This is stated in the Grid Site Operations Security Policy (https://edms.cern.ch/file/819783/2/GridSiteOperationsPolicy-v1.4a.pdf):

"When notified by the Grid of software patches and updates required for security and stability, you shall, as soon as reasonably possible in the circumstances, apply these to your systems. Other patches and updates should be applied following best practice.
[...]
The Grid may control your access to the Grid for administrative, operational and security purposes and remove your resource information from resource information systems if you fail to comply with these conditions."

The federation representative of the PMB are following up within their own regions to understand why their sites were not immediately patched. However, the PMB noted that if the fix has not been installed at a particular site by a given deadline then the PMB reserves the right to remove the offending site from the EGEE production infrastructure in accordance with the established security policy. As the Grid Security Officer we ask you to inform the sites of these issues and as part of the site access agreement, the sites are mandated to follow these instructions or their access to the infrastructure could be curtailed.

We are now approach the deadline for addressing the vulnerability that raised this issue. If, following circulation of this notice, sites are still after 7 days exhibiting these vulnerabilities, please work with the ROC Security Contacts to curtail access of these sites to the infrastructure or ensure that there is a clear upgrade plan in place to eliminate these vulnerabilities.


2009-11-18 2009-11-30 Main.MaiteBarroso Romain.Wartel (go to action) edit
000118 Maite to check with Spanish NGI if there are any significant issues for the transition, regarding O-E-2. 2010-04-20 2010-04-27 Nick.Thackray Maite.Barroso (go to action) edit
000119 James will get information on the process for including uncertified sites into the regional Nagios instances. He will then work with Vera to get this information into the operations procedures document. 2010-04-20 2010-05-04 Nick.Thackray James.Casey (go to action) edit
000120 All ROCs to give details on how many sites still need 32 bit middleware, which middleware services they need this for and how many worker nodes the sites has. 2010-04-20 2010-04-27 Nick.Thackray ALL.ROCS (go to action) edit
000121 Check on the status of the DPM bug with regard to Gstat 2. 2010-04-20 2010-04-27 Nick.Thackray All.OCC (go to action) edit
000122 Give feedback on the updated version of the EGI VO Management specifications. 2010-04-20 2010-04-27 Nick.Thackray All.ROCs (go to action) edit
000562 Steve and ROCs to find a few representative sites to understand what the main issues are with the storage installed capacity published in gstat, work with them to solve them, and document the solutions (if relevant). After that, we will re-discuss it here to involve all sites.
01/12/2009: all ROCs to check Gstat2 published capacity with their sites
2009-07-28 2009-08-30 Main.MaiteBarroso Main.AllROCs (go to action) edit

Closed Action Items from SA1 Coordination

ID Description Creation Date Due date Submitter Assigned To State Seek Edit
000043 Action in all ROCs to include the SLA agreement as part of the site certification process on their procedures. 2009-09-24 2009-10-30 Main.MaiteBarroso Main.AllROCs (go to action) edit
000044 Action on OAT deployment (James and Nick): provide a set of milestones to achieve decision 1. It should include detailed dates for remaining developments to be finished (central dashboard interfaced to Nagios, availability calculation, etc), deployment testing by some regions, and releases.
01/12/2009: need further discussion
2009-09-24 2009-10-30 Main.MaiteBarroso Nick.Thackray (go to action) edit
000046 Action on OAT deployment of providing a draft schedule to the regions on what is expected from them to roll out Decision 1 in production.
01/12/2009: need further discussion
2009-09-24 2009-10-30 Main.MaiteBarroso Nick.Thackray (go to action) edit
000048 Action on OAT deployment to publish checklist for release of operations tools, including packaging, documentation, repositories, licensing and wide testing BEFORE it is released.
01/12/2009: checklist updated, this action can be closed
2009-09-24 2009-10-30 Main.MaiteBarroso Steve.Traylen (go to action) edit
000049 request change of name of HEPSPEC06 benchmark.

Update 20/10/2009: request made to Hepix, proposal to change it to SPEC-CPP06
Update 03/11/2009: Hepix did not agree to the change of name

2009-09-24 2009-10-30 Main.MaiteBarroso Maite.Barroso (go to action) edit
000050 Clarify site state transitions, including difference between suspended and uncertified.
Mail sent to Helene to find out if this is already documented somewhere.
01/12/2009: clarified today, this action can be closed
2009-10-20 2009-10-30 Main.MaiteBarroso Maite.Barroso (go to action) edit
000051 Review document describing functions and responsibilities of new ROCs, http://indico.cern.ch/getFile.py/access?sessionId=7&resId=0&materialId=2&confId=71024
03/11/2009: no feedback received
2009-10-20 2009-10-30 Main.MaiteBarroso All.ROCs (go to action) edit
000052 Notify ROC managers of the outcome of the Gilda discussion at the security meeting on Thursday 22nd October
Done by Dave and Romain:
2. Gilda (training) security issues

Romain reported on the EGEE plans to integrate the Gilda training resources into the production infrastructure. There have been several e-mail threads on this. The issues were discussed earlier this week at the SA1 operations meeting. Romain reported that many concerns were raised about this and not just security-related ones.

The security concerns include the fact that the Gilda CA does not have robust enough identity vetting procedures to be accredited by IGTF.
Gilda has a separate VO which has been registered with EGEE and one suggested approach has been to leave Sites to decide to support the Gilda VO and to also trust the Gilda CA. The sites would then have to agree to take on any risks.

SCG discussed the security issues at some length and concluded that:

a. The risks are more complex than individual sites agreeing to take on the associated risks. Any risks associated with Gilda work running at EGEE production sites potentially threatens the whole infrastructure.

b. SCG is very concerned that EGEE procedures have allowed the Gilda VO (which violates currently adopted security policies) to be officially registered. SCG recommends that the Gilda VO should not be registered with EGEE. Several NGIs handle training at the national level and already have a local training CA. Perhaps this would be the best model to follow?

c. SCG strongly recommends that Gilda certificates should not be issued online without appropriate identity vetting of the applicant.

d. The Gilda Portal should meet the requirements of the recently adopted VO Portal Policy.

2009-10-20 2009-10-30 Main.MaiteBarroso Romain.Wartel (go to action) edit
000059 - Propose plan to migrate to HEPSPEC06
Done, proposed and agreed at SA1 coordination meeting 19/01/2010
2009-11-18 2009-11-30 Main.MaiteBarroso Maite.Barroso (go to action) edit
000060 CIC dashboard team, James and Maite to finalize the plan to deliver a central dashboard interfaced to Nagios, documenting dependencies in testing and deployment, and acceptance criteria
01/12/2009: done, can be closed
2009-11-18 2009-11-30 Main.MaiteBarroso James.Casey (go to action) edit
000061 Maite to inform NA4 about EGEE CREAM-CE deployment decisions
01/12/2009: done
2009-11-18 2009-11-30 Main.MaiteBarroso Maite.Barroso (go to action) edit
000062 XXX to follow up on CREAM CE test failures to improve the situation in production
01/12/2009: excellent work done by the CCOD team, this can be closed
2009-11-18 2009-11-30 Main.MaiteBarroso Maite.Barroso (go to action) edit
000063 Nick to publish EGEE service baseline list
Done: https://twiki.cern.ch/twiki/bin/view/EGEE/SupportedServiceVersions
2009-11-18 2009-11-30 Main.MaiteBarroso Nick.Thackray (go to action) edit
000075 Action on Michaela to make the agreed changes to the suspended/uncertified state proposal as discussed today
Done
2009-12-03 2009-12-15 Main.MaiteBarroso Michaela.Lechner (go to action) edit
000076 Action to summarize the Gilda proposal as agreed today, and distribute it to ROC managers and EGEE management
04/12/2009: summary sent to ROC managers
2009-12-03 2009-12-15 Main.MaiteBarroso Main.MaiteBarroso (go to action) edit
000077 Action on Nick to document and publish the procedure to enforce the EGEE baseline with ROCs and sites
Done, see:
https://twiki.cern.ch/twiki/bin/view/EGEE/SupportedServiceVersions
https://edms.cern.ch/document/985325/1
2009-12-03 2009-12-15 Main.MaiteBarroso Nick.Thackray (go to action) edit
000339 Find dates and location for next F2F SA1 coordination meeting, if possible, colocated with the other SA1 meetings (OAT, COD, etc)

UPDATE 13/01/09: Propose to have next face-to-face SA1 meeting in June at CERN. Can't have it with COD meeting in Finland in June as EGEE review rehearsals and review itself are in June.
Proposing to have it some time during week of 8th – 12th of June at CERN. Maite will send out proposed dates to ROC Managers.

UPDATE 03/02/09: Meeting will be on June 9th at CERN.

2008-12-16 2009-02-13 Main.Maite, Main.Barroso Main.MaiteBarroso (go to action) edit
000340 Propose one site per ROC to deploy a CREAM CE as proposed by Antonio Retico in slide 7 of his presentation on 16/12/08 to the SA1 coordination meeting. Please, send a mail to antonio.retico@cernNOSPAMPLEASE.ch with the site contact information.
UPDATE 19/05/09:
Maite received a detailed list from Antonio. The meeting decided this action can be closed.
WLCG milestone requires all sites with CREAM CEs installed to declare them beginning of June.

UPDATE 19/02/09:

Production grid: 17 nodes --> I propose to close this action and open one for the missing ROCs

* (UK-Ireland)
* RAL
1: 1000.gridpp.rl.ac.uk
1: 2000.gridpp.rl.ac.uk
1: 3000.gridpp.rl.ac.uk
1: 700.gridpp.rl.ac.uk
2: 500.gridpp.rl.ac.uk

* (Russia)
* RU-Protvino-IHEP
3: ce0004.m45.ihep.su

* (Asia Pacific)
* KR-KISTI-GCRT-01
3: ce01.sdfarm.kr

* IN-DAE-VECC-01
3: gridce01.tier2-kol.res.in
3: gridce02.tier2-kol.res.in

* (Central Europe)
* HEPHY-UIBK
3: test-lcgCE.uibk.ac.at

* (France)
* ESRF
4: grid-ce02.esrf.eu

* (South East Europe)
* HG-02-IASA
20: cream-ce01.marie.hellasgrid.gr

* (South West Europe)
* PIC
1: ppsce02.pic.es
1: ppsce03.pic.es
1: ce-test.pic.es

* (Germany-Switrerland)
* FZK-LCG2
48: cream-1-fzk.gridka.de
* GSI
1: grid29.gsi.de


UPDATE 17/02/09:
Mail will be sent (by whom?) to all ROCs to get update. Alexander confirmed that Russia have a CREAM CE (name?) installed.

UPDATE 03/02/09:
Italy update: starting today the installation
APROC update: no response. Last interaction recorded by Antonio: timeline requested on January 20, no reply
SEE update: in Greece, progress but no news, will be ready by next week
Nordic: working according to plan.
Benelux update: working according to plan.
UKI update: Antonio is in contact with the site
SWE:Antonio is in the loop.We are deploying in pre-production, when successful will install it in production.
FR: Nothing changed from last time
CE: scheduled installation, will be ready in 2 weeks
RUSSIA: agreement reached, one site will install it. Will update you later this week on which site and by when it will be installed.

UPDATE 13/01/09: Which ROCs are proposing to install and CREAM CE?
• Italy: Intend to have production instance at tier-1 site within next 2 weeks.
• Asia Pacific: (no response)
• SEE: Planning to install at least one in next couple of weeks.
• Nordic: Will be at least 1 month for KTH. Due to lack of staff.
• Benelux: We plan to install a CREAM CE on febr. 12 and 13.
• UK: 2 being installed but not working yet.
• SWE: Setting one up at PIC now. Other sites waiting for ICE WMS.
• France: Planning to have it in March but site admin is now off sick for 2 months, so unsure.
• CE: No decision which site but decision to be made Monday 19 Jan.
• Russia: (no response)

UPDATE: 5 May 09:
Antonio checked several days ago to see how many CREAM queues are available in prod. 40 for ALICE, 10 for CMS.
Italy: Installed. 1 in prod on Turin
• AP: (not present)
• SEE: (not present)
• SWE: 3 CREAM CEs: 2 in PPS (but published in production), 1 in production (but only supports OPS and dteam VOs)
• France: In progress
• CE: 1 installed
• Russia: 1 installed
• UK/I: at least one installed
• SEE: at least one installed
• Benelux: One at SARA
• NE: not sure

UPDATE: 19 May 09:
Maite will upload the document received by Antonio R. and close this action.

2008-12-16 2009-01-30 Main.Maite, Main.Barroso Main.AllROCs (go to action) edit
000341 Definition of core services: distribute the list of core services per region, so it is analyzed by Helene, Maite, Nick and Kai (ROC representative); they should clean them, and distribute the proposal to the ROCs.

UPDATE 13/01/09: In progress.

UPDATE: 17/02/09: Not needed any more. Gilles announced that confusing “Core” flag will be removed in GOCDB as soon as coordination with CIC portal is done (after Catania).

2008-12-16 2009-01-30 Main.Maite, Main.Barroso Main.Gilles (go to action) edit
000342 Definition of core services: check the list of core services in your region, to be distributed by Gilles & co, and give feedback about it

UPDATE 13/01/09: List isn't ready yet. Will distribute the list before the next meeting.

UPDATE: 17/02/09 Linked to 341 and closed.

2008-12-16 2009-02-13 Main.Maite, Main.Barroso Main.AllROCs (go to action) edit
000343 Site suspension: Maite to distribute the proposal agreed at the SA1 coordination meeting, 16/12/08; Helene and all ROCs to comment

UPDATE 13/01/09: The proposal given by Maite will be tried for 3 months and re-evaluated at that point. Also, Maite will include a comment that the ROC of the problematic site can request that the OCC gets involved earlier than stated in the proposed process. All cases of site suspension that require the intervention of the OCC will be reported in the Quarterly Report.

2008-12-16 2009-01-30 Main.Maite, Main.Barroso Main.AllROCs (go to action) edit
000344 Make a proposal for a better coordination of regional certification after the releases, so more sites can profit from the information

UPDATE 13/01/09: Maite will get an update from Antonio.

UPDATE: 17/01/09: this was discussed at last meeting, together with the post-mortem of releases:
http://indico.cern.ch/getFile.py/access?sessionId=8&resId=1&materialId=0&confId=44167
The action can be closed.

2008-12-16 2009-01-30 Main.Maite, Main.Barroso Main.AntonioRetico-Maite (go to action) edit
000345 Review and give comments to the "Proposal for a process for retiring obsolete services and old versions of services", http://indico.cern.ch/conferenceDisplay.py?confId=44163

UPDATE 13/01/09: No further comments were received so it is considered approved.

2008-12-16 2009-01-13 Main.Maite, Main.Barroso Main.AllROCs (go to action) edit
000346 Propose the list of versions to be supported for the mw clients


UPDATE 13/01/09: Nick has been discussing with various interested groups. He will send out a list of the middleware clients that SA1 should support. The list can then be modified as required.

UPDATE 03/02/09: Nick will send the related document to the ROC managers list asap

UPDATE 17/02/09: Nick will send the related document to the ROC managers list ASAP

UPDATE 05/05/09: Done several weeks ago. Close.

2008-12-16 2009-02-13 Main.Maite, Main.Barroso Main.NickThackray (go to action) edit
000347 First draft of regional operations model for each ROC ready

UPDATE 13/01/09: In progress. Note deadline is End Of Jan! Would be very useful to have this in time for All Activities meeting - please can all ROCs get this information to Maite several days before the All Activities meeting if at all possible.

2008-12-16 2009-02-13 Main.Maite, Main.Barroso Main.AllROCs (go to action) edit
000366 Need to decide what metrics will be needed for the new seed resources (e.g what is available, what is being used per VO, per site, etc.). Also, need to identify if these metrics can already be measured or if modifications to the tools are needed.

UPDATE 03/02/09: some progress, but nothing ready yet. Maite asked for a draft to be sent the ROC managers as soon as possible.

UPDATE 17/02/09: Maite will evaluate document and distribute it.

UPDATE 05/05/09: Draft has now got comments from RAG. Close to version for ROC managers to review.

UPDATE 19/05/09: New requirement to be added:
- A check in the metrics on the number of jobs and CPU usage outside seed resources (if any)
- To measure whether the VOs in a period of more than 6 months do start using the production infrastructure and/or bring their own resources.

2009-01-13 2009-02-28 Main.NickThackray Main.Tomas (go to action) edit
000370 Tracking of the deployment of the ROC NAGIOS instances.

UPDATE 03/02/2009:
CE: done; SEE: done it; France: we have nobody assigned to it, and have no effort for it currently as we have staffing problems; UK: one site has a local instance, we are looking into deployment it for the ROC; Italy: installation is difficult; DECH: we tried to install it, but we have a problem. We cannot complete the configuration with YAIM.

UPDATE 17/02/09:
Steve will provide a status. Kai confirmed that SWE have a regional Nagios. Shu-Ting said AP also have one. Cyril: France will also have one (for dashboard testing).

UPDATE 18/05/2009:
From Wen Mei (ROC_DECH): We have ROC NAGIOS installed, upgrade to the latest version. The Yaim configuration is able to completed and the Nagios based ROC DECH monitoring is running. All the sites and VOs are monitored. We are checking all the available services found by the Nagios probes. We had problem to create and retrieve Myproxy credential and the problem has been fixed now.

This can be closed. to be continued through the OAT.

2009-02-03 2009-03-30 Main.DianaBosio Main.SteveTraylen (go to action) edit
000372 Comment on the proposal for Grid Configuration Data, as discussed at the SA1 coordination meeting on Feb 3

UPDATE 17/02/09:
* Not all are on the OAT discussion list. James said “we’re still debating” (about GOCDB vs Information System). The IS approach is a good idea, but might be too late. We need estimations on how much effort would each option cost, and who would do it (not clear in the IS case)

UPDATE 5/5/09:
* Technical discussions at OAT F2F week between Gilles + Laurence. Have worked out the similarities between GOCDB and GLUE information models. Work is now on BDII side to expose the information that could be merged into the GOCDB. From Gilles :

 Any info to insert in a regional GOCDB   
will then be inserted from a defined XML format, no matter whether it
comes from a web interface, another GOCDB, an external tool or a
siteBDII.
Potentially the only work needed to move towards a solution where sites
publish their endpoints or downtimes in their sBDII would be on your
side... in the sense that it would require:
- a separated branch for storing this info in Glue at Site level
- a publisher that formats this info into the defined XML expected by
GOCDB

UPDATE 11/06/09: no further feedback received. The action can be closed.

2009-02-04 2009-02-20 Main.DianaBosio Main.AllROCs (go to action) edit
000373 Send to the ROC managers a list of services which are not yet catered for in the managed roll out of new releases

UPDATE 17/02/09:
* Antonio to send a file. Related to 344 and should be closed

2009-02-04 2009-02-20 Main.DianaBosio Main.AntonioRetico (go to action) edit
000374 Identify one site per service for the managed roll out of new releases

UPDATE 17/02/09:
Collapse into previous one (373).

2009-02-04 2009-02-20 Main.DianaBosio Main.AllROCs (go to action) edit
000375 Roadmap summary for regionalisation of operations in 2009

UPDATE 03/02/2009: the schedule for Helene will be available in a document which will be available
in the next two weeks.

UPDATE 17/02/09: DECH & Russia input still missing (but may be in back-log of Helene's mails).

UPDATE 05/05/09: Entry date of last 4 ROCis June 15. They need to give feedback by 11 May. Close.

2009-02-04 2009-02-20 Main.DianaBosio Main.ROCRussia (go to action) edit
000376 Roadmap summary for regionalisation of operations in 2009 2009-02-04 2009-02-20 Main.DianaBosio Main.ROCDECH (go to action) edit
000377 Roadmap summary for regionalisation of operations in 2009.

UPDATE 03/02/2009: we are ready, but we need the possibility to have some special implementation
for our ROC, for instance to have per country view. We need to discuss the implementation details with Helene.

2009-02-04 2009-02-20 Main.DianaBosio Main.ROCSEE (go to action) edit
000416 Operations manual: Recommendation to separate the process from its implementation through the tools in next version, so it is easier to maintain and it is not tool-dependent.

UPDATE 05/05/09: In progress.
This is done in the most recent version, released on 18th September 2009, and available here: https://edms.cern.ch/document/840932

2009-04-21 2009-09-01 Main.MaiteBarroso Main.HeleneCordier (go to action) edit
000417 Read UMD and staged roll out documents

UPDATE 05/05/09: See discussion in this meeting's minutes.

UPDATE 11/06/09: Discussed at several meetings and at the F2F meeting. This can be closed.

2009-04-21 2009-05-05 Main.MaiteBarroso Main.AllROCs (go to action) edit
000418 UKI, SE, DE-CH, RU to get in contact with Helene to provide names for starting the R-COD 2009-04-21 2009-05-05 Main.MaiteBarroso Main.AllROCs (go to action) edit
000488 Create a document describing the new model for the TPM in year 2 of EGEE III.
Done, stored in https://edms.cern.ch/document/1000210/ and sent to Maite and ggus-info@cernNOSPAMPLEASE.ch
09/06/2009: this was discussed at the F2F meeting at CERN
2009-05-05 2009-05-19 Main.!NickThackray Main.Maria_D. (go to action) edit
000489 All ROCs to describe how they see the TPM working within their region in the second year of EGEE III.
09/06/2009: this was discussed at the F2F meeting at CERN
2009-05-05 2009-05-19 Main.!NickThackray Main.All_ROCs (go to action) edit
000490 In the document titled "Changes to the EGEE‐III DoW for Year II", in the table on operations tools, clarify that the table is referring to the OAT M3 milestone. 2009-05-05   Main.!NickThackray Main.Maite (go to action) edit
000491 In the document titled "Changes to the EGEE‐III DoW for Year II", in the table on operations tools, in the "Notes" column, for the Operations Portal, clarify the meaning of "Provide a regional view." 2009-05-05   Main.!NickThackray Main.Maite (go to action) edit
000492 All ROCs to review the Installed Capacity document( https://twiki.cern.ch/twiki/pub/LCG/WLCGCommonComputingReadinessChallenges/WLCG_GlueSchemaUsage-1.8.pdf) and raise any objections before 2 June.
No objections raised.
2009-05-05 2009-06-02 Main.!NickThackray Main.All_ROCs (go to action) edit
000493 All ROCs to review and give comments on milestone MSA 1.9
11/06/09: this is now done.
2009-05-05   Main.!NickThackray Main.All_ROCs (go to action) edit
000516 Define messaging deployment scenario based on early experience, matching scale needed and regional needs.
By next F2F meeting, so regions can plan HW and expertise
Done at the SA1 F2F meeting in Barcelona.
2009-06-09 2009-09-15 Main.MaiteBarroso Main.OAT (go to action) edit
000517 Define C-COD model to be prototyped in next 3 months (who will do it, how many FTEs)
Discussed at SA1 coordination meeting, http://indico.cern.ch/conferenceDisplay.py?confId=63028
2009-06-09 2009-06-30 Main.MaiteBarroso Main.HeleneCordier (go to action) edit
000518 Define Key Performance Indicators for R-COD
- Format similar to MSA1.3, seed resources metrics, etc
- To be automated and displayed by the related tools/teams
Done. first results shown at the F2F meeting in EGEE09: http://indico.cern.ch/getFile.py/access?sessionId=13&resId=0&materialId=2&confId=67238
2009-06-09 2009-07-30 Main.MaiteBarroso Main.HeleneCordier (go to action) edit
000519 discuss with SA3 and JRA1 to define an implementation to the "production repository always consistent" requirement
This requirement is moved to the staged rollout discussions, being followed up by Antonio Retico
2009-06-09 2009-06-30 Main.MaiteBarroso Main.AntonioRetico (go to action) edit
000520 Discuss and agree mw support with product teams/mw providers so it is properly staffed in the (near) future 2009-06-09 2009-09-01 Main.MaiteBarroso Main.TorstenAntoni (go to action) edit
000521 CENTRALLY COORDINATED FEW (2-3) FULL-TIME TPM TEAMS
Refine and put in place the agreed TPM model in Y2:
- More automation in ticket assignment to regions
- Role of the central TPM teams: ticket assignment (of the few not automatically routed), escalation, monitoring

Update for 2009-06-30 meeting (Maria):
new TPM model updated to reflect the decision of the f2f SA1 mgnt meeting of 2009-06-09.
- More automation will be evaluated later as the number of tickets the TPM gets are only about 35 per week.
Work item on this is savannah #108496
- Role of the central TPM teams should be in the updated OLA.
If 'central TPM teams' refers to 'central coordination tasks', these will be in section 6 of the User Support strategy note.
Central coordination tasks are:
a.continuous training
b.documentation update
c.intervention on tickets being late or misrouted
d.ticket monitoring, analysis and reporting to the relevant project management bodies.

Update for 2009-09-01 meeting (Maria):
The following was sent to USAG, ROC-managers, TPM and VOs' mailing list on 2009-07-09:
"We need the following actions done before our next meeting on 2009-08-27:
* ROCs to bid for the TPM role in the new model
* SA1 management to allow discussion slot at their Sept. 1st meeting in
case we need a plan B,
i.e. if there are not 4 ROCs to do TPM shift according to the new model"
As per mid-August, bids were received from IT and NE.

Update for 2009-10-06 meeting (Maria):
USAG @ EGEE'09 summary: http://indico.cern.ch/materialDisplay.py?materialId=4&confId=68477
SA1 @ EGEE'09 summary: https://savannah.cern.ch/support/?108496#comment5
Yet another step to automation: Allow assignment to the ROC by the submitter: https://savannah.cern.ch/support/?108708#comment5

Update form SA1 coordination meeting in Barcelona:
The decision taken at todayís SA1 coordination meeting about the new EGEE TPM model is to continue with the present rota involving all teams for some time longer, and migrate directly to the NGIs that will be awarded with the associated EGI global task, O-E-7: Triage of Incoming Problems. The estimated timeline is to stay with the present TPM model till the end of the year, and to migrate to the EGI model by the beginning of 2010. This is only a tentative timeline because we donít know who the awarded NGIs will be, and the transition plan needs to be agreed with them. As soon as this is known, it will be the task of the USAG to work out a feasible transition plan with them and to put it in place.

2009-06-09 2009-09-01 Main.MaiteBarroso Main.TorstenAntoni (go to action) edit
000522 Distribute new template for "quarterly country reports"
Done
2009-06-09 2009-06-20 Main.MaiteBarroso Main.MaiteBarroso (go to action) edit
000538 Give to Maite the list of the ROCs and partners that did not contribute to DSA1.3 for escalation to the AMB.
Done and sent to Bob Jones for escalation.
2009-06-30 2009-07-02 Main.DianaBosio Main.OgnjenPrnjat (go to action) edit
000539 At the next TMB, bring to the attention of the VOs the proposal for the new HEP spec metric for comments.
Discussion scheduled for 11th August

Agreement at the SA1 coordination meeting on 009-10-06:
Any site serious about being in the Grid should be able to afford this. If money is a problem for a site then this information should be collected (this is up to the ROCs to do).
We will change the name to make it HEP neutral.

2009-06-30 2009-07-28 Main.DianaBosio Main.MaiteBarroso (go to action) edit
000540 Ask the sites for comments on the proposal for the new HEP spec metric and report at the SA1 coord meeting on July 28th.
Discussion scheduled for 11th August

Agreement at the SA1 coordination meeting on 009-10-06:
Any site serious about being in the Grid should be able to afford this. If money is a problem for a site then this information should be collected (this is up to the ROCs to do).
We will change the name to make it HEP neutral.

2009-06-30 2009-07-28 Main.DianaBosio Main.AllROCs (go to action) edit
000541 Comment on the May availability-reliability reports. 2009-06-30 2009-07-14 Main.DianaBosio Main.AllROCs (go to action) edit
000543 Comment on the c-COD model and metrics presented by Marcin at the SA1 coord meeting on June 30th.
No comments received
2009-06-30 2009-07-14 Main.DianaBosio Main.AllROCs (go to action) edit
000544 Write a proposal for the SA1 transition to the staged roll-out releases.
Done and presented at SA1 coordination meeting on 28th July
2009-06-30 2009-07-14 Main.DianaBosio Main.AntonioRetico (go to action) edit
000545 Send a reminder e-mail to the ROC managers with all the details on the services that would require to undergo SL5 deployment tests. 2009-06-30 2009-07-02 Main.DianaBosio Main.AntonioRetico (go to action) edit
000546 ROCs with sites running 3.0 VOMS servers should upgrade to gLite 3.1. 2009-06-30 2009-07-14 Main.DianaBosio Main.AllROCs (go to action) edit
000561 Follow up with gridview changes to downtime procedures, and new report with low availability/reliability causes as reported in GOCDB

Done. The reports are being produced on a monthly basis since September. - John

2009-07-28 2009-08-30 Main.MaiteBarroso Main.JohnShade (go to action) edit
000563 Modify ops procedures to include new proposal on downtimes.
Done, available here: https://edms.cern.ch/document/1032984
2009-07-28 2009-08-30 Main.MaiteBarroso Main.MaiteBarroso (go to action) edit

Page settings.

These page can only be changed by:

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2008-12-16 - AntonioRetico
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback