ACCLOG frozen instance 2, LHCLOGDB service unavailable

Description

ACCLOG instance number 2 was inaccesible due to memory problem. Service LHCLOGDB running on this instance was inaccessible.

Impact

  • Service LHCLOGDB was inaccessible

Time line of the incident

  • 28-Feb-13 04:03 - ACCLOG2 instance froze due to problems with memory
  • 28-Feb-13 04:19 - RACMON sms alert sent to the shift phone.
  • 28-Feb-13 04:25 - Investigation started
  • 28-Feb-13 04:44 - Service LHCLOGDB manually relocated to the surviving instance.
  • 28-Feb-13 04:48 - Instance killed by person on shift
  • 28-Feb-13 04:48 - Automatic restart by clasterware failed with "unable to allocate Large Pages"
  • 28-Feb-13 04:53 - Successfull manual start of the isnstance.
  • 28-Feb-13 05:12 - Service LHCLOGDB relocated to its preferred instance.

Analysis

  • ACCLOG2 instance was inaccessible from 04:03. Monitoring reported:
acclog: Error monitoring service lhclog: ORA-01034: ORACLE not available
acclog: Error monitoring service lhclog: ORA-27123: unable to attach to shared memory segment

  • alert.log was full of
Process W000 died, see its trace file
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process

  • Clusterware did not notice any anomalies and the LHCLOGDB service was not automatically relocated to the surviving instance 1, therefore service was not available.
  • Connecting to the instance did not work, needed to be killed. Restart of the instance by the clusterware failed with huge pages allocation.
  • Manual restart after few minutes did not encounter memory problems and went ok.

Follow up

  • ALL IT-DB monitoring tools (RACMON, EM11, Legacy scripts) detected the problem very fast.
  • Clusterware is not detecting such problems
  • Service relocation to be implemented with scripts in such a case - not very streightforward.
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2013-03-03 - AntonTopurov
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    DB All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback