XLDAP / nscd / nss_ldap issue

Description

A standard Quattor node reconfig run on lxbatch caused a very high load on the XLDAP servers, which triggered problems with the nscd daemon on other SLC5 nodes.

Impact

  • Logins to lxplus were not possible for a period of around 2 hours
  • Checkins to SVN and CVS were affected for a period of around 2 hours
  • Lxbatch blackholed lots (tbd) jobs - batch queues were set inactive for around 1.5 hours.

Background

...

Time line of the incident

All times CEST, May 24th.

10.55 - Standard Quattor reconfig of lxbatch applied to pick up reconfig

nc-client --cl lxbatch --tag spma_ncm

Analysis

Follow up

Meeting:

  • ncm-authconfig will be changed to avoid flushing its cache immediately after restart
  • Stuck 'nscd' daemons should be escalated to Linux support for analysis and report to Redhat (nothing in Redhat's public tracker).
  • Batch reconfig (from not.d) should bespread over a longer time to reduce potential impact

Links

Edit | Attach | Watch | Print version | History: r6 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2011-05-23 - GavinMcCance
No permission to view PESgroup.WebLeftBar
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PESgroup All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback