Description

Instance 4 of Atlas offline was reboteed 4 times during the period of 8 days. All reboots happend between 4AM - 5AM and occured every second day.

Impact

  • The reboots affected COOL sessions connected to instance 4 of ATLR database. During those reboots database was fully accessible and each time services were properly relocated between cluster nodes which minimized service impact of those reboots.

Time line of the incident

  • Sunday 28.11 4:21AM atlr4 reboot - node rebooted by cluster, services relocated to remaining nodes.
  • Tuesday 30.11 4:41AM atlr4 reboot - node rebooted by cluster, services relocated to remaining nodes.
  • Thursday 02.12 4:21AM atlr4 reboot - node rebooted by cluster, services relocated to remaining nodes.
  • Saturday 04.12 4:28AM atlr4 reboot - node rebooted by cluster, services relocated to remaining nodes.

Analysis

Different symptoms for each reboot were observed therefore the root cause was not clear for a period of few days. Initially COOL application was suspected and lots of effors were made to investigate this. In parallel all database jobs specific for this instance or COOL application were carefully reviewed. To provide detailed diagnostic informations OSWatcher and addidional database jobs responsible for creating AWR snapshots every 2 minutes between 4AM and 5AM were deployed. Thanks to forementioned additional monioring informations internal script for rotating Oracle logs and trace files were discovered as a root cause of this problem. This script is deployed on every DB machine and has never caused that kind of problem.

Follow up

After clean-up in local directories problem has not reappeared. We are still working on this issue to fully understand it and to prevent from future occurences.

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2010-12-16 - MarcinBlaszczyk
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    DB All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback