LHCbDIRAC Release and Certification Process

Introduction

This document is trying to define a policy for LHCbDIRAC releases and certification. Most of its content is common sense and usual best practice in software processes.

LHCbDirac release process

Whenever new functionality or bug fixes are considered to be worth putting in production, the following process should take place. This is true for any of the 4 projects: Dirac, LHCbDirac, DiracWeb, LHCbDiracWeb.

There are three types of changes:

  • Functional releases: new functionality, refactoring, performance improvements etc. These changes may affect more than one system (and often changes are correlated between systems).
  • Patch releases: bug fix, small fix in logic, typo etc... These affect a single system, or can group a few changes in more than one system but small and without correlation.
  • Emergency fix (see below)

Functional releases

The process for a functional release is the following (in each system):

  • Test the changes locally (unit tests if feasible, use the development setup)
  • Commit to SVN and advertize it to the system responsible
  • If/when needed: system tagged in SVN by the system responsible. The tag is added in the tag collector.
  • Prepare release notes, document the changes internally (in the code), update the documentation.
  • Preparation of a candidate release:
    • Increment the "v" (major release if the interfaces do change or the behavior is significantly different) or "r" field. Create a candidate (-pre) release (a.k.a. pre-release).

The release should be done on the basis of the advertized tags (in the tag collector, e.g. versions.cfg) for all systems by the release manager. There is no need to involve developers at this stage. This candidate release should mandatorily be deployed and certified on the certification setup.

Patch releases

The process for a patch release is the following:

  • Test the change locally, although this step may be skipped, e.g. fix a typo
  • Commit to the SVN branch and advertize it to the system responsible. The fix should also be ported to the trunk.
  • System tagged in SVN by the system responsible (in his absence this can be done by the person who committed the fix/change).
  • Preparation of a patch release:
    • Increment the "p" field. Only the patched system(s) should be included in the new release: no new functionality or unnecessary changes should be included.
    • All unaffected systems should use the same local tag as the patched production release.

Emergency fixes

Under normal circumstances, no hot fix should be applied on the production nor on the certification setups. This can however be done on the development system under the responsibility of the developer(s). This policy must be followed by everybody, which may mean that all developers should be able to make a patch release and deploy it, once advertised or requested. This policy will ensure that a patch release will not imply a regression by not applying an existing hot fix.

In exceptional circumstances however, when minutes are important for bringing back the whole system to work, a hot fix may be applied directly on the production setup, and the relevant component restarted. Within hours of an emergency fix however, a patch release should be done and deployed that contains that fix.

LHCbDirac certification

The certification process is mandatory for all releases except for patch releases. Note however that in case of complex changes in a patch release, a careful testing should be performed, either using a unit test or conditioning the new code for special cases before allowing it for all jobs.

The pre-release should be installed using standard installation methods on lhcb-cert-dirac.cern.ch. The LHCbDirac client should be installed on LHCBDEV.

LHCbDirac certification is meant at asserting that the functionality of the new release is at least as good as that of previous releases. All components of the system should be involved and tested: Production system, WMS, DMS, BK etc...

The certification manager in concertation with developers will deploy a series of tests at different levels. Tests whose results can be automatically checked will be automatised. Some other tests should be made interactively (like testing he web portal behavior), in which case a checklist should be followed and results reported manually in a certification form.

The certification tests will be run by the certification manager and produce a web-based report, either automatically or manually (see above).

Test should involve (non-exhaustive):

  • Simple job submission, monitoring, output retrieval (e.g. using ganga)
  • DMS basic tests (data movement, registration, deletion, including FTS transfer, monitoring and accounting)
  • Web portal functionality for monitoring, accounting, etc... (a set of URLs to be defined)
  • SAM job submission
  • Production creation using a well established workflow
    • Basic job interactive test (optionally, only if the creation of the job test was changed)
    • Production approval
    • Production creation, job submission, monitoring and accounting. The full correct behavior of the whole chain (including merging and distribution, data registration and accessibility) should be tested.

Obviously any new functionality should be thoroughly tested using dedicated tests. These same tests, possibly with simplification, should then be integrated into the certification suite and act as regression tests to ensure a new release does not break former functionality.

The certification results should be broadcasted to developers who must react to negative results, investigate and fix before a new iteration takes place.

Workflow modules / template validation

This should happen whenever a new workflow is to be created (e.g. new steps, new output policies for upload or distribution) or changes have taken place in LHCbDirac modules. The workflow validation should take place on the production system, as they are meant to be used immediately (unless they are linked with new functionality, in which case they should be run as additional tests in the certification system). Care should be taken that the output data is registered as certification/test data in the bookkeeping, such that users do not try and use them. In case this validation's result implies some modifications in LHCbDirac (other than workflow templates), a patch release of LHCbDirac should be deployed on the production system.

The current deployment of workflow templates (in a developer's AFS public directory) is not sustainable and very dangerous! A release and deployment procedure of workflow templates must be put in place with highest priority.

The workflow validation is done jointly by the workflow developers. The success of the validation will be determined by the following criteria (again non exhaustive):

  • Jobs are created, submitted and running without any crash
  • Output data should be checked (presence, size and content), which may imply the involvement of outside members of the collaboration.
  • Data registration and accessibility (from BK and DMS) should be checked

Production validation

Whenever a new set of applications has to be used in a production, the full workflow should be validated following the production validation procedures. This validation is performed jointly by the production managers team and the applications developers.

-- PhilippeCharpentier - 13-Jan-2011

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2011-01-17 - PhilippeCharpentier
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback