Operators' help guide
The operators receive an alarm in case of a problem with one of the Physics Services machines like :
- any filesystem full
- machine not responding
When an alarm is raised, the operators follow the instructions corresponding to the machine encountering a problem.
We describe here where to find these instructions and when and how to update them.
1. Where to find the Operators' Help Guide ?
The operators use the following web page to know the instructions for all the machines they are in charge of :
- View Operators Guide for PDB
- Edit Operators Guide for PDB (and click "Edit" after login)
All the Physics Services machines refer to the same help guide called "PDB".
The changes need to validated by Dirk Geppert or by the CVS librarian (e-mail)
in Dirk's absence.
2. Description
This web page describes what to do in case of a problem with the Physics Services machines.
The instructions are different depending on the group of machines :
- Group 1 : IT-DB monitoring and backup master (PDB-BACKUP1)
- Group 2 : LCG/RLS production service, 24*7 availability
- Group 3 : COMPASS and HARP database servers
- Group 4 : Services without 24*7 availability and development nodes
Group 2 contains all the LCG/RLS machines (application servers and databases) that are in production with 24*7 availability
as well as the spare machines, that can go on production at any time.
The development nodes belong to Group 4 and shouldn't appear in Group 2, until they are back as spare machines.
3. When to change the Operators' Help Guide ?
You should update the Operators' Help Guide in the following cases :
- when a new machine needs to be watched out by the operators : add the machine in the wanted group
- when a machine is taken away from the Physics Services responsibility : remove the machine from the Help Guide
- when a test machine becomes a spare/production machine : change machine from Group 4 to Group 2
- when a spare/production machine becomes a test machine : change machine from Group 2 to Group 4
Topic revision: r1 - 2005-12-07
- unknown