Few short interruptions of replication of CMS data from online to offline

Description

On Tuesday 13th July at around 1:30 AM the replication of CMS data failed due to LogMiner error caused by memory fragmentation. The monitoring software automatically restarted Streams processes. Later on at around 9:00 the replication failed again as a result of a DBA mistake.

Impact

The impact on CMS was minimal since in both cases the replication was restarted very quickly. The failures increased a bit replication latency (up to 15 minutes).

Time line of the incident

  • Tuesday 13th July 1:30 AM - replication fails due to memory issues. Few minutes later it is automatically re-started.
  • Tuesday 13th July 8:15 AM - replication was intentionally stopped by the weekly job performing cleanup operations
  • Tuesday 13th July 8:30 AM - DBA did not realized that replication is not working and restarted it manually.
  • Tuesday 13th July 9:00 AM - replication failed again because one of the Streams packages got invalidated by the cleanup
  • Tuesday 13th July 9:15 AM - replication restarted manually

Analysis

The memory issue that caused the first failure is a known issue and can be worked around by increase of amount of physical memory installed on CMSONR servers. The DBA mistake was related to a documentation issue. There was no information in the documentation about the job performing cleanup of streams tables.

Follow up

Memory extension has been agreed with CMS and will be implemented soon. Documentation problem has been fixed.

-- JacekWojcieszuk - 16-Jul-2010

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2010-07-16 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    DB All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback