GGUS alarm to IT-DB workflow problem

Description

GGUS alarm to IT-DB workflow did not work properly on the night of 11th October

Impact

  • IT-DB team did not get the alarm during the night (even if the dba on call was already working on the problem thanks to IT-DB internal database monitoring). Dba on call could not communicate with Atlas shifter.
  • EOS best effort support person was unnecessary called during the night.

Time line of the incident

  • 12-Oct-11 03:30 - GGUS alarm ticket (ALARM TICKET #75234) is created by Atlas shifter (problem: ATLAS Oracle problem: account atlas_t0 on ATLR is locked)
  • 12-Oct-11 03:40 - My guess is that EOS best effort support was called about this time????
  • 12-Oct-11 03:48 - ticket is updated by operator: Oracle RAC for ATLAS Offline services, Piquet has been called. Please standby.
  • 12-Oct-11 10:14 - IT-DB team receives the ticket via SNOW (INC071257).

Analysis

  • Massimo: checked operator instructions. On front page some services like CASTOR are singled out. Some problems result in very long drop-down list. Discussed with Fabio to put also DBs on front page to avoid human errors. They will add some info on how to treat emails to find the appropriate expert.
  • Maite: something similar for SERCO team.
  • MariaDZ: two issues; 1) operators understand from description who to call. 2) Have special workflow for GGUS alarms that does not go to 2nd line support but 3rd line so that experts can be immediately notified. None from DB group is in this 3rd line list.

Follow up

  • Update cern-prod mailing list - Done (Maite)
  • Update SNOW-GGUS triage instructions for grid second line support at cern (serco) - Done (Maite)
  • Update operator procedures for alarms - Done (Eva)

-- EvaDafonte - 14-Oct-2011

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2011-11-01 - EvaDafonte
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback