Kerberos issue


An upgrade to the Kerberos KDC that supports the Linux services caused several authentication related problems on the Batch service, Castor and interactive Linux services (lxadm and VOBOX).


  • Several users had problems logging into lxvoadm and lxadm, and the VO boxes (misc tickets).
  • CMS and ATLAS had problems accessing xrootd files on Castor from the batch nodes (GGUS alarm ticket)
  • ATLAS T0 batch jobs were unable to obtain a Kerberos token (jobs failed) for around 30 minutes (GGUS alarm ticket).


The Kerberos services for CERN are based on Microsoft's Active Directory following the migration from the Heimdahl kerberos service earlier in the year. Following the completion of the Exchange 2003 mail service to Exchange 2010, it was possible to raise the security level to Windows 2008 levels in order to increase functionality (such as authenticated-only access to group members for privacy improvements). One of the new features was the advanced encryption services (AES 128 and 256) support for Kerberos authentication will be available.

This change has been scheduled with the online community to be performed during a technical stop since an issue with the Active Directory service would have a significant impact on the environment in the technical network.

Time line of the incident

All times are in Swiss time.

08:10 Active directory settings modified

Functional levels for the Domain and forest were moved from 2003 to 2008 levels.

  • TBD Add issue with ssh details

14:07 xrootd access from cmsRun not working since this morning

CMS raise a GGUS alarm ticket (

14:21 xrootd access from cmsRun not working since this morning

CMS decreased importance of their GGUS alarm ticket ( Changed priority from top priority to less urgent since they were suspecting some issue on their code.

17:41 Massive failures due to lost AFS token on ATLAS Tier-0 LSF batch nodes

Atlas raise a GGUS alarm ticket (

18:06 xrootd access from cmsRun not working since this morning

First workaround verified and proposed to experiment ( It required to add one line in the users job.

19:37 xrootd access from cmsRun not working since this morning

Final workaround verified and proposed to experiment ( This does not need the extra line (although it can live with it in most cases).

22:53 xrootd access from cmsRun not working since this morning

Ticket closed ( After observing successful reading (xrdcp) from LSF jobs running as cmsprd ticket is closed


The Kerberos tokens issues for the Batch system are retrieved using a master certificate authentication, through the kinit command. To differenciate the Active Directory tokens from the Heimdal tokens during the migration phase where both KDC coexist, the various scripts are checking token encryption to make the difference (as the realm is the same on both KDCs).

The Active Directory upgrade did an unexpected change on the kinit token issuance based on certificate, providing AES256 encrypted tokens when ArcFour tokens were expected. As a result, Active Directory tokens are not identified properly and cannot be renewed, preventing batch jobs to run properly. Kinit command allows to force an encryption type, but this give an error 'invalid password' which is definitely not expected here, and the we are not able to understand yet.

Follow up

  • Prepare test case for issue and raise a support call with Microsoft: Created 'Urgent' Incident 111051122694692 (
  • Agree simple regression test for kerberos token extension which can be used when testing new configurations of AD.
  • Understand the delay between the initial change at 08:30 and the alarm tickets being created (around 5 hours later)
  • Review the plan for the Heimdahl Kerberos de-commissioning to see if we can remove the dual Kerberos configuration in batch/lxplus and simplify it ?


Edit | Attach | Watch | Print version | History: r13 | r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r9 - 2011-05-12 - TimBell
No permission to view PESgroup.WebLeftBar
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PESgroup All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback