Regionalisation of COD Service
Mandate
This working group is dedicated for devolution of COD service into regions which is considered as a first step towards EGI/NGI model. The aims were to work out model of operations in a new scheme and identify requirements on other COD teams, ROCs and sites. Since all regions migrated to new regional model the goal of this group is to assess the model and to identify requirements for EGI project for COD activity.
Contact
The pole1 mailing list is:
project-eu-egee-sa1-cod-pole1-rcod@cernNOSPAMPLEASE.ch
ROCs mainly involved: FR, CE, NE, AP
Meetings
Minutes
Actions List
no |
Assigned to |
Since |
Description |
Status |
1 |
Malgorzata Krakowian |
10.09.09 |
Add new cases when ROD could want to close alarms with status <> OK |
Closed |
2 |
Marcin Radecki |
20.08.09 |
take point 3 (C-COD roles), and for each sub item try to assess who is involved in these tasks in old COD activity |
Closed |
3 |
ShuTing Liao |
20.08.09 |
create workplan and check if some manpower is needed |
Closed |
4 |
Marcin Radecki |
10.09.09 |
collect historical data about services |
Closed |
Themes
Requirements on COD tools
Recommendation for EGI-NGI SLA ideas
- declaration if NGI would like to use GOC wiki or own knowledge base to store knowledge
- providing NOD (NGI Operator on Duty) service with specific rules on handling alarms and tickets
-
Assessment of the model - ROD metrics
Knowledge Sharing
Support for sites realized by 1st Line Support teams is the place from where the knowledge about operational problems with grid services can be derived. Thus it is very important to provide means to record this knowledge. The knowledge can be either derived from GGUS tickets (no additional effort needed) or, if some particular 1st Line Support team wants to organize that this way, can be registered in a wiki page in a form of a recipe. Such a wiki page can be set up as a knowledge source in GGUS Knowledge Search engine.
Ideas
This section is designed to identify issues in current model and collect ideas for future model improvement.
Switch off possibility to turn off alarms with error status
Cases when ROD could want to turn off alarm with status <> OK:
- site/node has alarms and then went to downtime - to prevent from old alarms when SD will end
- site/node has alarms and then went to uncertified/nonproduction status - alarms will stay on the dashboard
- core services problem
- problem with monitoring tool
Solution proposal:
ROD should not be able to close alarms with status <> OK. In cases as above mentioned, ROD should transfer alarm to C-COD with situation description. C-COD should make the decision to close or not the alarm.
Other documents