LHCb Nightly Build System Troubleshooting and Operation

It's responsibility of the Deployment Shifters to check that the nightly builds are functional.

Introduction

First of all, be sure that you read LHCbNightliesImplementation to understand how the various bits and pieces stick together.

It should not happen, but, as a matter of fact, Jenkins Jobs might fail for a number of reasons. Shifters will receive a mail for every (nightly build related) failed Jenkins job, as well as for the first successful job after a failure.

In the most common cases the failure is due to a glitch of the infrastructure (communication between Jenkins and its slaves, connection with git/svn servers) and the automatic retry we use for most jobs will be enough to recover.

How to Read Jenkins Mails

There are two types of mails: failure and back to normal.

Failed Builds

In case of failure, the shifter will get a mail with a subject like:

Build failed in Jenkins: <job name> <build name or id>

where the <job name> is the string referring to the jobs described in LHCbNightliesImplementation and the <build name or id> could be a numeric id (if the job failed very early) or a human readable name for the job build, like <flavour>.<slot>.<id>.

The body of the mail starts with a link to the failed build in the Jenkins web interface, followed by an excerpt of the console output of the build.

Build Back to Normal

For the first successful build after a failure, shifter will receive a mail with a subject like:

Jenkins build is back to normal : &lt;job name> &lt;build name or id>

The body of the mail consists only of a link to the successful build in the Jenkins web interface.

It is important to follow the link and check that the preceding build ("Previous Build" link) to check why it failed, because sometimes a build fails so early that the failure mail is not sent.

Problems with the Machines

Jenkins rely on a master node and various slave nodes.

Slaves are used (in our configuration) for CPU intensive tasks.

-- MarcoClemencic - 2015-07-06

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2015-07-06 - MarcoClemencic
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback