DB outage caused by problems in filer dbnasr1132

Description

  • Several databases (PDBR,LHCBR,ATLR,ATONR,SUSI-TEST,ZORA-TEST,TIM-DB) remained in a "suspended" state due to NFS paths no longer available to some of their volumes. A couple
of instances crashed but could be restarted properly later.

Impact

  • Access to the databases was not possible during the outage. There was no data loss though.

Time line of the incident

  • 24-July-2013 18:50 - Filer dbnasr1132 shows hw problems with a network card, the cluster software does not manage to complete a takeover (from dbnasr1131 partner) and we lose access to the volumes in dbnasr1132.
  • 24-July-2013 20:10 - Eric is already checking, Marcin is contacted by Atlas, several Support tickets created by users (lhcb,atlas,ams)
  • 24-July-2013 20:20 - Ruben contacted and he takes the case
  • 24-July-2013 21:45 - P1 case open (by Ruben) with Netapp support
  • 24-July-2013 11:45 - after long iterations with support, a boot process is initiated in dbnasr1132
  • 25-July-2013 00:25 - dbnasr1132 boot process completes (~40minutes to complete!!), the partner (dbnasr1131) manages to complete a take-over and service is restored from one filer only. Support suggests to performa giveback. The full cluster functionality is restored, DB services recover automatically
  • 25-July-2013 01:30 - Nilo checked that all DB instances were working. Crashed instances (zora-test and atonr1) restarted.

Analysis

  • The Netapp support managed to discovered information in the trace files Ruben sent that points to a HW issue. A new motherboard and network cards will be sent.

Follow up

  • Motherboard and Network card replaced
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2013-08-08 - NiloChinchilla
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    DB All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback