Comment | Date | Version | Author |
---|---|---|---|
Complete revision accounting for new NAGIOS tools | March 2010 | 2.0 | Vera Hansper, Malgorzata Krakowian, Peter Gronbech |
Procedures for C-COD in regional operations model | 18 Sept 2009 | 1.0 | Vera Hansper, Malgorzata Krakowian, Helene Cordier, Michaela Lechner |
Split of manual into ROD and C-COD parts, including revisions | 30 June 2009 | 0.3 | Vera Hansper |
First Draft of manual | 4 March 2009 | 0.2 | Malgorzata Krakowian, Vera Hansper |
First merge of ROD model and COD ops manual | 19 january 2009 | 0.1 | Vera Hansper |
Split from COD OPS manual | 05 january 2009 | 0.0 | Ioannis Liabotis |
Step [#] | Max. Duration [work days] | Escalation procedure |
---|---|---|
1 | 3 | When an alarm appears on the ROD dashboard (>24 hours old): 1st mail to site admin and ROC |
2 | 3 | 2nd mail to site admin and ROC; At the end of this period escalate to C-COD |
3 | 5 | Ticket escalated to C-COD, C-COD should in that week, act on the ticket by sending email to the ROC, ROD and site for immediate action and stating that representation at the next weekly operations meeting is requested. The discussion may also include site suspension. |
4 | (IF no response is obtained from either the site or ROC) C-COD will discuss the ticket at the FIRST Weekly Operations Meeting and involve the the Operation and Coordination Center (OCC) in the ticket | |
5 | 5 | Discuss at the SECOND weekly operations meeting and assign the ticket to OCC |
6 | Where applicable, C-COD will request OCC to approve site suspension | |
7 | C-COD will ask ROC to suspend the site |
Handover from (old C-COD leader) to (new C-COD leader) Issues raised during the week: * ROD: number of alarms/tickets not handled by the ROD Issues pending: ROD: * site: reason why site appears on the C-COD dashboard and last status Other issues: * Did you encounter any tickets that changed 'character' ? (i.e. no longer a simple incident that can easily be fixed, but rather a problem that may result in a Savannah bug) -- means that the use-cases wiki has to be updated. * Any alarms that could not be assigned to a ticket (or masked by another alarm)? * Any tickets opened that are not related to a particular alarm * Anything else the new leader should know? * Instructions received from recent Weekly Operations Meeting (only for the leader taking over) Weekly Operations Meeting: * List unresponsive sites: note name of Site and ROC, as well as GGUS Ticket number and reason for escalation * Report any problems with operational tools during shift * Did you encounter any issues with the C-COD procedures, Operational Manual? * Report encountered problems with grid core services * Any Savannah/GGUS tickets that need more attention to a wider audience?After the Weekly Operations Meeting the C-COD leader should send a summary to the C-COD mailing list with relevant information for the next leader on shift. Again, this should be done via the HANDOVER tool on the operations dashboard.
"You shall immediately report any known or suspected security breach or misuse of the GRID or GRID credentials to the incident reporting locations specified by the VO and to the relevant credential issuing authorities. The Resource Providers, the VOs and the GRID operators are entitled to regulate and terminate access for administrative, operational and security purposes and you shall immediately comply with their instructions." (Grid Acceptable Use Policy, https://edms.cern.ch/document/428036)
"Sites accept the duty to co-operate with Grid Security Operations and others in investigating and resolving security incidents, and to take responsible action as necessary to safeguard resources during an incident in accordance with the Grid Security Incident Response Policy." (Grid Security Policy, https://edms.cern.ch/document/428008/)
"You shall comply with the Grid incident response procedures and respond promptly to requests from Grid Security Operations. You shall inform users in cases where their access rights have changed." (Virtual Organisation Operations Policy, https://edms.cern.ch/document/853968/)