Technical Description of the LHCb Nightly Build System

This is a brief summary of the implementation technical details of the LHCb Nightly Build System, useful to understand which are the various components of the system and how they interact with each other.

Elements

The LHCb Nightly Build System is constituted of a few main components:

These components interact so that for each slot, project and platform, we checkout, build, test, and report the results of builds and tests.

The Scheduling

The scheduling of the operations (checkout, build and test) is achieved through the continuous integration (CI) system Jenkins, which can be seen as a cron on steroids.

Jenkins Basics

Jenkins is an extensible continuous integration system written in Java. Its main goal is to allow developers to regularly build and validate their projects. The great flexibility of Jenkins makes it possible to use the system in very different scenarios, ranging from the build and test of a project, to very complex workflows, like in the LHCb case.

The key component of Jenkins is the Job: the prototype for a unit of work run by the system. When Jenkins executes a Job, the operations defined in its configuration are run. For example, in a simple case, a Job can be configured to check out the sources of a project from a repository, build the project and run the tests. In more complex scenarios, Jobs can be executed in different environments, or trigger other Jobs according to some conditions.

To effectively help in the testing of projects, Jenkins allows execution of Jobs on other machines (slaves) identified by name of labels. For example, a build-and-test Job could be run on a Linux, a Windows and a MacOS machine to ensure that the code always works.

Jobs in Jenkins can be triggered by different conditions, for example by a change in the source code repository, or a change in the content of a URL, or regularly (like with the UNIX cron daemon).

Scheduling of LHCb Jobs in Jenkins

The LHCb Nightly Build System uses a few interdependent Jobs in a Jenkins instance, divided in two groups: build and test.

The builds are handled by four generic Jobs, that, depending on some parameters, operate of one slot or another:

nightly-builds-workflow.svg
Execution flow of Jenkins Jobs used for the builds.

nightly-slot
run once a night to start the checkout and build of every enabled slot

nightly-slot-checkout
triggered once per slot by nightly-slot, checks out the projects defined in the slot configuration; if there are preconditions to check (e.g. wait for the LCG nightly builds), the Job nightly-slot-precondition is triggered (once per requested platform), otherwise it triggers directly nightly-slot-build-platform (once per requested platform)

nightly-slot-precondition
triggered only if needed by nightly-slot-checkout, it checks the preconditions defined in the slot configuration (e.g. waiting for the LCG builds); when/if the precondition is met, it triggers nightly-slot-build-platform

nightly-slot-build-platform
when started, declares to the tests jobs which are the expected builds (slot, project, platform), then starts the build of all the projects in the specified slot for the specified platform

nightly-tests-workflow.svg
Execution flow of Jenkins Jobs used for the tests.

The tests are run asynchronously with respect to the build, so that they can be started as soon as the build of the project to be tested is ready. This is achieved using two dedicated Jobs:

nightly-slot-test-poll
run every 10 minutes, starting from 4:00 AM, checks which of the builds declared by nightly-slot-build-platform are ready to be tested, and triggers one nightly-slot-test for each slot, project platform (with a maximum of 20 triggered jobs per execution, to avoid overloading Jenkins).

nightly-slot-test
triggered by nightly-slot-test-poll, takes the project binaries (produced by nightly-slot-build-platform) to be tested, and runs the tests

All jobs are configured to retry the execution in case of failure and send a mail to lhcb-core-soft-alarms@cernNOSPAMPLEASE.ch (the mail is sent also when a Job succeeds after a failed execution). See LHCbNightliesTroubleshooting for more information on the failure conditions as well as on manual operation of the builds.

Distributing the Load on Different Machines

As already mentioned, Jenkins has a mechanism to execute Jobs on different machines, either to access more computing power or to run on different configurations and architectures.
cpus-to-executors.svg
Mapping between virtual machines and Jenkins slaves in LHCb Nightly Build System.

This functionality relies on the concepts of slaves and executors, where, in the simplest configuration, slaves maps to machines and executors to CPUs, but nothing prevents to have several slaves on one machine and map executors (logically) on multiple CPUs. Slaves can be grouped logically with labels, which can be used to declare where a Job should be executed.

The LHCb Nightly Build System uses a pool of 12 x 8 cores SLC6 virtual machines and 3 x 8 cores SLC5 ones (with more platforms to come), plus another machine to run Jenkins and CouchDB. The machines are organized in

  • 12 slaves * 1 executor, labeled slc6-build
  • 12 slaves * 6 executors, labeled slc6
  • 3 slaves * 1 executor, labeled slc5-build
  • 3 slaves * 6 executors, labeled slc5

Build Jobs are sent to slaves labeled -build, while checkout and tests are sent to labels.

Since the release procedure uses the Nightly Builds System, to ensure that release builds can be done even when the Nightly Build System is very busy, we added an extra label -release which is attached to all the -build slave plus one, so that, for example, release builds for SLC6 have access to 12 (shared) + 1 (exclusive) executors.

The Artifacts' Archive

The archive of artifacts is essentially a storage space that can serve the contained files via HTTP. We store there all the products of the nightly builds, from source and binary tarballs to test results summaries. This archive has a crucial role in the system because it is used for deployment of the builds to AFS and CVMFS as well as to exchange files between the different Jenkins Jobs.

The implementation is very simple: we use ssh to write to an HTTP served directory on lhcb-archive.cern.ch. There is one directory per nightly build flavour, containing one directory per slot. Each slot directory hosts the build data in a directory named after the numeric unique id of the slot build, with moving soft links to these directories named with the corresponding date (YYYY-MM-DD), the day name (three letters abbreviation), and Today.

The Nightly Build Steps

The Scripts

The Jenkins Jobs use a collection of scripts to perform their tasks. These scripts are in a GIT repository hosted at gitlab.cern.ch: https://gitlab.cern.ch/lhcb-core/LbNightlyTools.

The package is organized in a few subdirectories, including

python
Python modules containing the actual code of the scripts
scripts
mainly small scripts delegating the actions to Python modules
jenkins
wrapper scripts used to call the main scripts from Jenkins Jobs

Each Jenkins Job with a role in the LHCb Nightly Build System is configured to get the latest version of LbNightlyTools from the main repository before executing its main actions. Then, depending on the operation it must perform, it calls the corresponding script in the jenkins directory (checkout.sh for check out, build.sh to build, etc.).

To simplify the maintenance, reduce code duplication and improve readability, the actions scripts in jenkins rely on (bash) functions defined in some scripts in jenkins/utils.d.

An important feature of the scripts used in the Nightly Build System is that they can be run by hand to build and test a private copy of the LHCb software stack or of part of it.

Start

Every night, just after midnight, a Jenkins Job is run to start all the enabled slots.

First we download (from Git) the files describing the configuration of the slots, then we identify the slots that are enabled and enable a checkout job for all the enabled slots.

The checkout jobs are

Checkout

For each slot, we run a checkout job, which retrieves (again) the slots configuration files, then checks out from repositories the source code of the projects to be built in the current slot.

After the checkout, the sources are patched to form a consistent set (fix dependencies), with the applied changes stored in a patch file for reference.

Sources are finally packed in compressed tarballs and sent to the artifacts' archive together with a copy of the relevant configuration file and some checksum files.

If the slot requires some preconditions to be checked, then we trigger jobs to check/wait for the preconditions (one per platform to be built), otherwise we directly trigger the build jobs (one per platform).

Preconditions Check (optional)

At the moment, this step is used only to wait for the completion of SFT builds, but it's a very flexible mechanism that could be used to throttle builds, or to disable slots/platforms depending on some special conditions.

This job gets from the artifacts' archive the configuration files of the current slot and runs the precondition function, which will define when (or if) to trigger the build job.

Build

When the build job is started, it gets configuration and source tarballs for the current slot from the artifacts' archive. Then we record in a special directory the list of expected build artifacts, needed to know which test jobs should be started.

Each project is built in dependency order, and, project by project, the binary files are packed and sent to the artifacts' archive together with build log summaries.

Test Polling

Every 10 minutes, starting from 4:00 AM, we check if any of the projects declared by the build step is ready to be tested. Then we trigger a test job for each slot/project/platform ready for testing.

To avoid congestion in Jenkins, we trigger at most 20 test jobs every 10 minutes.

The reason why we wait until 4:00 AM before starting the tests is that in this way we essentially dedicate the full CPU power of the build farm to the build for the first few hours. By then, a good fraction of the builds will be done and we can use (almost) all the CPU power for tests. This showed to have a better throughput of the builds than other scheduling policy we tested.

Test

The test job gets the build artifacts for the project + platform to test (using the same tool used to install the nightly builds on AFS and CVMFS) and runs the test.

The test results summaries are sent to the artifacts' archive, from where they can be checked by the developers.

The Dashboard

The results of the Nightly Builds are reported on a web page, the dashboard. The information is organized in one table per slot with one row per project built and two columns per platform, for build and tests.

Jenkins jobs store details about the build and test results to a CouchDB database in form of JSON documents. Map-reduce algorithms in the CouchDB instance extract an format the information in these documents so that they can be used to fill the tables of the dashboard.

The HTML, JavaScript and CouchDB views are available in the couchdb directory in https://gitlab.cern.ch/lhcb-core/LbNightlyTools, in the form of CouchApps managed with the tool erica.

More Details

Configuration Details of Jenkins Jobs

-- MarcoClemencic - 2015-07-01
-- MarcoCattaneo - 2016-08-26 - updated obsolete URLs, fixed typos

Topic attachments
I Attachment History Action Size Date Who Comment
SVG (Scalable Vector Graphics)svg cpus-to-executors.svg r2 r1 manage 58.1 K 2015-07-02 - 17:41 MarcoClemencic  
SVG (Scalable Vector Graphics)svg nightly-builds-workflow.svg r1 manage 110.0 K 2015-07-02 - 17:43 MarcoClemencic  
SVG (Scalable Vector Graphics)svg nightly-tests-workflow.svg r1 manage 74.7 K 2015-07-02 - 17:43 MarcoClemencic  
Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2016-08-26 - MarcoCattaneo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback