Storage intervention affecting severely OVM infrastructure

Description

A clean-up action on the storage affected several production volumes used by production OVM.

Impact

  • All DBoD instances, application servers running on that OVM pool.

Time line of the incident

  • Fri Feb 22 09:34:04 CET: apps_ovm_g3 set offlined
  • Fri Feb 22 09:36:05 CET: apps_ovm2gen3a, apps_ovm2gen3b set offlined
  • ~ Fri Feb 22 09:42:00 CET: volumes set back online.

Analysis

A clean-up operation on the storage for safety reasons is made up of two steps:
  • offlining the volumes: NFS accesses is blocked.
  • destroying the volumes a few days after

Destruction is always done a few days after if not unexpected impact is detected. Sadly a misunderstanding of the admin in charge took wrong volumes for the clean-up.

Follow up

Two incidents where open to follow-up this issue: https://itssb.web.cern.ch/service-incident/several-db-demand-instances-down/22-02-2013 & http://itssb.web.cern.ch/service-incident/itdb-virtualisation-not-available/22-02-2013

Downtime for affected services varies from 30 to 60 minutes.

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2013-02-22 - RubenGaspar
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    DB All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback