LHCb Nightly builds setup and configuration

The system expects to find build machines (both server and clients) configured in the way described here: LHCbNightliesPrerequirements. This page lists also the currently available machines and their status

System bootstrap

Nighty builds are started every night with use of acrontab of lhcbsoft user. Configuration is kept in SVN repository:

https://svnweb.cern.ch/trac/lhcb/browser/LHCbNightlyConf/trunk/configuration.xml

Nightlies summary webpage:

http://cern.ch/lhcb-nightlies

Configuration editor:

https://cern.ch/lhcb-nightlies/editor.html

Setting up a new machine

How to set up a new machine as build machine

Server

Updated version of LHCb Nightlies run in client-server architecture. Server reads configuration file once, while it's started - we start it every night, just after midnight. Configuration is automatically checked out from SVN repository. To be able to start server on the same port the following day, we also kill the server just before the midnight:

58 23 * * * buildlhcb07 ~/bin/killtree.py kill `ps xf | grep "nightliesServer.sh" | grep -v "grep" | awk '{print $1}'` > /build/nightlies/OldLogs/KillServer.log 2>&1
20 0 * * * buildlhcb07 ~/bin/nightliesServer.sh > /afs/cern.ch/lhcb/software/nightlies/www/logs/nightliesServer_`date +\%a`_buildlhcb07.txt 2>&1
The server is set up to use buildlhcb07, aka lxbuild171. The server name is hard coded in the start-up scripts nightliesServer.ch and nightliesClient.sh, when moving the server the corresponding scripts and the acron settings have to be updated

restart/test mode

As standard the server picks the configuration.xml present at midnight of the new day to have a normed environment. As standard port 61007 is used (check for new machines, if the firewall settings allow access to this ports). To use the most current version of configuration.xml in SVN, the server can be started with a restart flag, i.e. ~/bin/nightliesServer.sh restart. Port 61009 is used for serving this configuration; clients have to be started with the same flag to access the restart server.

As test environment, the server can be started with the test flag; it pushes the most recent configuration-test.xml as configuration to the clients, that were started with test as well. Here, port 61008 is used. Be aware, that in the startup scripts the pathes to the nightlies files are switched as well to the development versions, when setting the test flag.

Clients

Client must be given at least one parameter on the command line - it's the CMTCONFIG which it supposed to be running with. After startup, each client connects to the server and gets one (slot,platform) pair which matches given CMTCONFIG (dbg platforms can be built with opt cmtconfigs...). When it finished the build, it takes another (slot,platform) pair as long as there is still one ready to built. If not, the client quits the loop. Some platforms are waiting for specific conditions to be fulfilled - if so, they may not be taken by the client when the client is ready. In that case, it may happen that the client finishes its work, and there are still platforms to be built. Therefore there should be more than one lients started during the night - some of them maybe even in the afternoon.

7 1 * * * lxbuild111 /afs/cern.ch/user/l/lhcbsoft/bin/nightliesClient.sh slc4_amd64_gcc34 | tee /afs/cern.ch/lhcb/software/nightlies/www/logs/client_lxbuild111_1_`date | awk '{print $1}'`_mainlog.txt
8 1 * * * lxbuild135 /afs/cern.ch/user/l/lhcbsoft/bin/nightliesClient.sh x86_64-slc5-gcc43-opt | tee /afs/cern.ch/lhcb/software/nightlies/www/logs/client_lxbuild135_1_`date | awk '{print $1}'`_mainlog.txt
9 1 * * * lxbuild156 /afs/cern.ch/user/l/lhcbsoft/bin/nightliesClient.sh slc4_ia32_gcc34 | tee /afs/cern.ch/lhcb/software/nightlies/www/logs/client_lxbuild156_1_`date | awk '{print $1}'`_mainlog.txt
10 1 * * * volhcb27 /afs/cern.ch/user/l/lhcbsoft/bin/nightliesClient.sh x86_64-slc5-gcc43-opt | tee /afs/cern.ch/lhcb/software/nightlies/www/logs/client_lxbuild157_1_`date | awk '{print $1}'`_mainlog.txt 

Clients can be started for a specific slot and platform with

 /afs/cern.ch/user/l/lhcbsoft/bin/nightliesClient.sh PLATFORM SLOT

client restart

As the server clients can be started with the restart flag to use the most current configuration.xml (server started with restart flag set has to be running, i.e. the most current version of configuration.xml is the one present when starting the server, that serves it to the clients)

If the restart of a client is intended to supersede a already job, that has been running, several steps have to be considered. Most of the steps are taken care of, when using the ~/bin/restart.py --slot SLOT --platform PLATFORM script. The individual steps are:

  • a running job creates a control file $LHCBNIGHTLIES/SLOTNAME/DAY/isRunning-PLATFORM
    • if the file is present for a given slot&platform restart, the job will not start as to avoid collision with its predecessor
      • if the predecessor is known to have died, the file can be deleted
      • else the predecessor has to be killed manually on its build machine; to kill all daughter processes the killtree script should be used with something like: ~/bin/killtree.py kill `ps xf | grep "PLATFORM" | grep "SLOT"  | grep -v "grep" | awk '{print $1}'`
  • a finished job creates a control file $LHCBNIGHTLIES/SLOTNAME/DAY/isDone-PLATFORM -- it does not influences a restart
  • a predecessor's build files have to be purged locally (if running on the same build machine), the copied files on AFS are overwritten when the new job copies its files.
  • the same holds for the status and test files accessible from the status webpage. These files are located at $LHCBNIGHTLIES/www/logs/SLOTNAME/SLOTNAME.DAY_PROJECT_HEAD-PLATFORM(/.log/-log.html/-log.summary/.qmtest/qmtest.log)
    • if not (re)moved at the start of a restart, the new status files overwrite the old ones one by one -- which can cause confusion, when two status versions are mixed up

client test

To run a test job, the client can be started with the test flag, i.e.

/afs/cern.ch/user/l/lhcbsoft/bin/nightliesClient.sh PLATFORM SLOT test

Summary web page generation

The webpage script has to run on the same build machine, that is used for building Coverity since it accesses its files locally (if the webpage script is run on another machine, the Coverity related informations will be missing)

*/15 * * * * lhcb-coverity export AFSROOT=/afs && export PYTHONPATH=/afs/cern.ch/user/l/lhcbsoft/PRODUCTION/prod-LCG:/afs/cern.ch/lhcb/software/releases/LBSCRIPTS/prod/InstallArea/python:$PYTHONPATH && /afs/cern.ch/user/l/lhcbsoft/PRODUCTION/prod-LHCb/LbRelease/LHCbNightliesWebpage.py /afs/cern.ch/lhcb/software/nightlies/www/index-LHCb-cache.html > /afs/cern.ch/lhcb/software/nightlies/www/index-LHCb-cache.out 2>&1

A html page index-LHCb-cache.html and a corresponding xml file index-LHCb-cache.xml are created. When AFS is lagging, the webpage script can be running for quite some time, since it accesses all status files. To avopid collisions with each other, a webpage instance is checking, if aanother one is still running. The check can be bypassed with the force flag, i.e. LHCbNightliesWebpage.py OUTPUTNAME.html force

Cleaning steps

AFS

Build files copied to AFS

5 0 * * * lxplus rm -rf /afs/cern.ch/lhcb/software/nightlies/SLOTNAME/`date | awk '{print $1}'`/** 

15 0 * * * lxplus find /afs/cern.ch/lhcb/software/nightlies/www/logs/lhcb-* -iname '*' -mtime +6 -print -exec rm -rf '{}' \; > /afs/cern.ch/lhcb/software/nightlies/www/logs/nightliesClient/removedFiles.log

Log files

Log files in $LHCBNIGHTLIES/www/logs/SLOTNAME/, picked up by the webpage scripts

10 0 * * * lxplus rm -rf /afs/cern.ch/lhcb/software/nightlies/www/logs/SLOTNAME/SLOTNAME.`date | awk '{print $1}'`_*

/build directories

local build directories
10 0 * * * BUILDMACHINE rm -rf /build/nightlies/lhcb-*/`date | awk '{print $1}'`

Monitoring

SLS sensors have been set up to monitor different aspects of the nightlies system
  • the local build machines disk usages
  • the AFS volumes usages, i.e. for each mounted volume below /afs/cern.ch/lhcb/software/nightlies
  • the webpage update intervalls
  • the RSS feed status, i.e. the sqlite database sometimes got corrupted during AFS errors
  • the Coverity processe running on =lhcb-coverity=s, i.e. the database status, the web interface status and the general Coverity status
    • currently the monitor shows an error on Mondays, since the LCG nightlies are not available. Since as input LCG's nightlies build each Monday are used as input, the corresponding files are missing on Monday's while they are build

Configuration

current configuration file is taken from SVN repository (not anymore from AFS):

unless you are sure what you are doing, to edit configuration use only the web editor:

 https://cern.ch/lhcb-nightlies/editor.html 

xml file format:

<configuration>
   <general>
      <parameters>
         <parameter name="" value="" />
         &lt; ... &gt;
      </parameters>
      <ignore>
         <error value="" />
         <error value="" type="" />
         <warning value="" />
         <warning value="" type="" />
      </ignore>
      <mailto>
         <project name="lhcb">
            <mail address="marco.cattaneo@cern.ch" edit="false" parent="false" />
            <repository path="isscvs.cern.ch:/local/reps/lhcb" type="cvs" />
         </project>
      </mailto>
   </general>
   <slot name="slot1">
      <paths>
         <path name="builddir" value="/build/nightlies/%SLOT%/%DAY%/%CMTCONFIG%" />
         <path name="buildersdir" value="/build/builders/%SLOT%" />
         <path name="releasedir" value="/afs/cern.ch/lhcb/software/nightlies/%SLOT%/%DAY%" />
         <path name="wwwdir" value="/afs/cern.ch/lhcb/software/nightlies/www/logs" />
      </paths>
      <cmtprojectpath>
         <path value="/afs/cern.ch/lhcb/software/DEV/nightlies" />
      </cmtprojectpath>
      <platforms>
         <platform name="slc4_amd64_gcc34" />
         <platform name="x86_64-slc5-gcc43-opt" />
      </platforms>
      <waitfor
         flag="/afs/cern.ch/sw/lcg/app/nightlies/dev1/%DAY%/isDone-%PLATFORM%" />
      <cmtextratags value="use-distcc,no-pyzip" />
      <lblogin linux="echo hello" />
      <runafter linux="" />
      <days mon="true" tue="true" wed="true" thu="true" fri="true" sat="true"
         sun="true" />
      <projects>
         <project name="Lbcom" tag="LBCOM_v9r5p1" />
         <project name="Phys" tag="PHYS_HEAD" headofeverything="true" />
         <project name="Online" tag="ONLINE_v4r43">
            <dependence project="Gaudi" tag="GAUDI_v21r10p1" />
            <change package="Online/Presenter" value="v0r21" />
            <addon package="AccessPath/Package" value="version" />
         </project>
      </projects>
   </slot>
   <slot name="slot2"> <!-- slot contents -->
   </slot>
   &lt; ... &gt;
</configuration>

configuration/general/parameters currently available:

<parameter name="wwwurl" value="http://cern.ch/lhcb-nightlies" />
<parameter name="logurl" value="http://cern.ch/lhcb-nightlies/logs" />
<parameter name="rrdurl" value="http://cern.ch/lhcb-nightlies/rrd" />
<parameter name="wwwtitle" value="Summaries of nightly builds for LHCb" />
<parameter name="mailfrom" value="karol.kruzelecki@cern.ch" />
<parameter name="mailsubjectprefix" value="[LHCb Nightlies]" />
<parameter name="shownotfinishedplatforms" value="true" /> 
  • wwwurl, logurl, rrdurl - URLs to find the files (index.html, log files, test statistic graphs)
  • wwwtitle - header of the Nightlies summary page
  • mailfrom - (cern) e-mail address - sender of the build/test results
  • mailsubjectprefix
  • shownotfinishedplatforms - if set to true platforms which are not finished will also be shown on the summary page (isStarted file flags are used then instead of isDone)

Schema constraints:

  • parameter - 'name' and 'value' are obligatory
  • parameters - The tag 'parameters' must exist but may be empty

configuration/general/ignore

Each line of the log files is compared to all of the given error/warning statements and if any of error/warning values can be found in the line, the line is not marked as one containing an error/warning. Errors and warings are considered separately, but the values are common for all of the slots. Regular expressions (type="regex") or fnmat/glob expressions (type="fnmatch") can be used here. In the case of regex - remember about .* in the beginning and the end of the expression (unless you are sure what you are doing).

<error value="" />
<error value="" type="" />
<warning value="" />
<warning type="regex" value=".*example.*" /> 
Schema constraints:
  • error - 'value' is obligatory, 'type' is optional
  • warning - 'value' is obligatory, 'type' is optional
  • ignore - The tag must exist but may be empty, order is errors first and then warnings

configuration/general/mailto

All of the projects which are used in one or more slots must have a section here as follows:

<project name="lhcb">
   <mail address="marco.cattaneo@cern.ch" edit="false" parent="false" />
   <repository path="isscvs.cern.ch:/local/reps/lhcb" type="cvs" />
</project> 
Name of the project should be given as they are (LHCb and DaVinci, not LHCB, DAVINCI). mail tag is used to send summary of the builds to the responsible person after the build is finished. The section can contain none, one, or more mail tags. edit and parent attributes are not in use at the moment. repository tag is not in use at the moment.

Schema constraints:

  • mail - 'address' is obligatory
  • repository - 'path' and 'type' are obligatory
  • project - 'name' is obligatory, order is mails first and then repositories. Must be configured for every project which is used in one or more of the slots, can be empty (no mail, repository tags)
  • mailto - this tag must exist

configuration/general/builders

<builders>
   <project name="Geant4" path="%LHCBRELEASES%/../nightlies/plugins/Geant4" />
</builders>
In this section it can be configured which builders should be copied from the given location instead of being generated automatically (environment variables can be used as shown - case sensitive).

Schema constraints:

  • project - 'path' and 'name' are obligatory
  • builders - this tag is optional

per slot configuration

defined values, usage ofenvironmental variables

  • %SLOT% - slot name
  • %DAY% - today - three letter weekday (Mon, Tue, ...)
  • %YESTERDAY% - yesterday
  • %PLATFORM% - platform name
  • Environment variable can be used in the same way - between two percent characters (case sensitive)

main slot settings


<slot name="lhcb2"
   description="lhcb 2 HEAD of everything on top of GAUDI_v21r0 and LCG_56"
   mails="true" hidden="false" computedependencies="false" renice="+19">  
  • name - to create a new slot AFS space has to be allocated for releasedir. lhcb[1-6] are production slots, lhcb-test[1-2] are test slots started from different (not lhcbsoft) account.
  • description - extra information about the slot
  • mails - if set to false no summary e-mails will be send after the each of the projects in this slot is finished
  • hidden - if set to false slot will not be visible on the summary page (used by LCG Nightlies for the release slot)
  • computedependencies - if set to false, QUICK=2 is passed to make
  • renice - nice value used for the processes that run nightlies for the slot

Schema constraints:

  • slot - 'name' is obligatory.

Order of the children tags is :

  • paths obligatory
  • cmtprojectpath obligatory
  • platforms obligatory
  • waitfor optional
  • ...
  • cmtextratags optional
  • ...
  • lblogin optional
  • ...
  • runbefore optional
  • ...
  • runafter optional
  • ...
  • days obligatory
  • projects obligatory

paths configuration

<paths>
   <path name="builddir" value="/build/nightlies/%SLOT%/%DAY%" />
   <path name="buildersdir" value="/build/builders/%SLOT%" />
   <path name="releasedir" value="/afs/cern.ch/lhcb/software/nightlies/%SLOT%/%DAY%" />
   <path name="wwwdir" value="/afs/cern.ch/lhcb/software/nightlies/www/logs" />
</paths> 
all paths in this section are obligatory

  • builddir - local build directory (must exist in filesystem and write permission has to be set for lhcbsoft user)
  • buildersdir - local directory for storing building scripts (plugins which in case of LHCb software are common, generated automatically)
  • releasedir - AFS directory to copy to configuration.xml file (Nightlies copy the file once and all the other servers will use exactly the same version of configuration.xml from AFS). The directory is used also for copying the software to, after the build is finished. Must exist and write persmission must be set for lhcbsoft account
  • wwwdir - where to copy html log files

Schema constraints:

  • path - 'value' and 'name' are obligatory

CMTPROJECTPATH

<cmtprojectpath>
   <path value="/afs/cern.ch/user/m/marcocle/public" />
   <path value="/afs/cern.ch/sw/Gaudi/releases" />
   <path value="/afs/cern.ch/sw/lcg/app/nightlies/dev1/%DAY%" />
   <path value="/afs/cern.ch/sw/lcg/app/releases" />
   <path value="/afs/cern.ch/lhcb/software/releases" />
</cmtprojectpath> 
CMTPROJECTPATH variable, order is important, local build directory is prepended.

Schema constraints:

  • path - 'value' is obligatory

CMTEXTRATAGS

 <cmtextratags value="use-distcc" /> 
Setup of CMTEXTRATAGS value.

Schema constraints:

  • cmtextratags - 'value' is obligatory

"wait for" flag

<waitfor flag="/afs/cern.ch/sw/lcg/app/nightlies/dev1/%DAY%/isDone-%PLATFORM%" /> 
If waitfor flag is configured, compilation of each platform (see below) is not started until the file is available in given location.
<platforms>
   <platform name="slc4_amd64_gcc34_dbg" />
   <platform name="x86_64-slc5-gcc34-dbg" />
</platforms> 
This section defines platforms to be build in the slot. If wait for is used, this is the order the files (isDone files) are looked for. The one which is found (if they are different for each of the platforms) makes the system start checkout and build stages for the platform it describes. If wait for is not used - this is the order of platforms to be build. if any platform is specified as a parameter for the starting script (nightliesClient.sh), only the platforms found there will be taken into consideration for the machine.

Schema constraints:

  • waitfor - 'flag' is obligatory
  • platform - 'name' is obligatory
  • platforms - must contain at least one platform tag

days

<days mon="true" tue="true" wed="true" thu="true" fri="true" sat="true" sun="true" /> 
Setting any of the weekdays to false will make the Nightlies not run this day.

Schema constraints:

  • days - 'mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun' are obligatory

list of projects

<projects>
   <project name="Gaudi" tag="GAUDI_v21r0">
      <dependence project="lcgcmt" tag="LCGCMT_56" />
   </project>
   <project name="LHCb" tag="LHCB_HEAD" headofeverything="true">
   </project>
   <project name="Lbcom" tag="LBCOM_HEAD" headofeverything="true" />
   <project name="Gauss" tag="GAUSS_HEAD" headofeverything="true">
      <dependence project="geant4" tag="GEANT4_v91r2p1" />
   </project>
   <project name="Panoptes" tag="PANOPTES_HEAD" headofeverything="true" />
</projects> 
List of projects to be build in the slot; order is important. For each of the projects the following keys are obligatory:
  • name - real name of the project (LHCb, DaVinci, not: LHCB, davinci)
  • tag

Additional keys (not obligatory):

  • headofeverything - if set to true version of all the packages from project container requirements file will be changed to head.
  • docs - Doxygen documentation will be generated after build of the project, if set to true

Schema constraints:

  • project - 'name' and 'tag' are obligatory
  • projects - May be empty (no project tags)

tuning the project configuration

<project name="Gauss" tag="GAUSS_HEAD" headofeverything="true">
   <dependence project="geant4" tag="GEANT4_v91r2p1" />
</project>
<project name="LHCb" tag="LHCB_HEAD" headofeverything="true">
   <dependence project="gaudi" tag="GAUDI_HEAD" />
   <change package="Phys/LoKiArrayFunctors" value="vanya_20090424" />
   <change package="Det/DetDescSvc" value="v2r2" />
                   <addon package="Online/RootCnv" value="v1r8" />
</project> 
The following can be set additionally for each of the projects in the slot:
  • dependence - modify the project.cmt file to use different project version (for projects one depends on). IMPORTANT: for the time being, while setting the dependence, lowercase project names should be used. Usage of the real project names must be tested.
  • change - change the specified package version; it overwrites headofeverything ("headofeverything but...").
  • addon - adds a package which is not present in the requirements of the project

Schema constraints:

  • dependence - 'project' and 'tag' are obligatory
  • change - 'package' and 'value' are obligatory
  • addon - 'package' and 'value' are obligatory
  • project - order is dependences first and then changes

Troubleshooting

Logs

  • each acrontab job output is sent to the lhcbsoft e-mail account unless output is empty. In the case of LHCb Nightlies there should be an e-mail for every job because some diagnostic information is always printed. This log contains checkout log which is not stored in any separate file. Build output is not printed to this log.
  • build log can be found for each project on the summary webpage and on build node local disk (files are kept on local discs for one day only!):
/build/nightlies/<slot>/<day>/<project>/<tag>/logs/<platform>.log
  • test log can be found for each project on the build node local disk only (files are kept on local discs for one day only!):
/build/nightlies/<slot>/<day>/<project>/<tag>/logs/<platform>-tests.log
  • test log summary can be found for each project on the summary webpage and on build node local disk (files are kept on local discs for one day only!):
/build/nightlies/<slot>/<day>/<project>/<tag>/logs/<platform>-qmtest.log

Possible problems

checkout problem

visible symptoms

  • some of the projects' builds are "red" in summary page with almost empty build log (only summary in the top of page)
  • different platforms broken at different projects, and starting from these projects everything is "red"

confirm

action

  • if only some platforms were broken, check the machines they where started on (lxbuild111/120/135) (for example consulting acrontab) - the process can be restarted only for the selected machines by performing: clean, and run actions for the given slot on the build server. *do not launch cleanAFS action unless you want to restart the slot on all platforms. IMPORTANT: if you do not clean AFS release area, the configuration.xml file from the night will be taken as a configuration for the restarted platforms.

How to restart the Nightlies

Production slots of the Nightlies run from lhcbsoft account. One has to login as lhcbsoft to be able to restart the Nightlies.

Restart by script

  • for restart a nightlies restart a running server instance is necessary
    • start on machine lxbuild135 a server with ~/bin/nightliesServer.sh restart
    • it will pick up the most recent version of the configuration
  • on a build node a slot on a specific platform can be restarted with the script /afs/cern.ch/user/l/lhcbsoft/bin/restart.py
    • it takes the slot and platform as argument, i.e.
      restart.py --slot SLOT--platform PLATFORM

Restart procedure by hand

  • a nightlies slot produces for a specific platform several files and directories, that have to be cleaned by hand before a restart -- to avoid collisions with the fresh restart
    • on the build machine the build directory /build/nightlies/SLOT/DAY/PLATFORM
    • the copied log-files on AFS of the form $LHCBNIGHTLIES/www/logs/SLOT.DAY_PROJECTs_PLATFORM plus the qmtest-directories with the same naming scheme
    • the empty files indicating if a build has been started or has finished
      • a finished build touches a file in $LHCBNIGHTLIES/SLOT/isDone-PLATFORM
      • if a build is still running a file $LHCBNIGHTLIES/SLOT/isStarted-PLATFORM exists
        • in this case one has to find the build on one of the build machines buildlhcb0[1-6] and kill it before restarting the build

the following steps are necessary when restarting a slot/platform by hand:

only specific (platform, slot) pair

If you skip the optional steps, old log files will be still visible on the summary webpage and they will be overwritten one by one, while the process go on. It may confuse people checking the webpage.

  • the platform you want to restart may not be still running (the build must be stopped in that case before going on - you may use killtree LHCb script)
  • (optional) copy and remove the old log files for a specific platform/slot from /afs/cern.ch/lhcb/software/nightlies/www/logs/ For example these files: lhcb-head.Wed_*x86_64-slc5-gcc43-opt*
  • (optional) remove the isDone file for the slot/platform from AFS slot directory (for example: /afs/cern.ch/lhcb/software/nightlies/lhcb-head/Wed/isDone-x86_64-slc5-gcc43-opt)
  • remove local build files for the specific slot/platform (for example: /build/nightlies/lhcb-head/Wed/x86_64-slc5-gcc43-opt)
  • start (you may consider using screen for that): ~/bin/nightliesClient.sh x86_64-slc5-gcc43-opt lhcb-head. You should check if you are using the build server (slc4/slc5) you really intend to use.

all platforms in one slot

  • for the moment, the only solution is to repeat the procedure for "only specific (platform, slot) pair" from above

all platforms in all slots

  • make sure that all the nightliesClient s on all build servers are stoped
  • stop the nightliesServer (usually running on lxbuild135)
  • start a new nightliesServer
  • start one or more nightliesClient s

nightliesServer by default takes configuration from SVN, the last one from yesterday. If you want to restart the Nightlies because of the chacnge in configuration - you have to force the server to take another (for example recent) configuration revision.

how to use not default (last from yesterday SVN) configuration

Configuration is handled by nightliesServer. If you want to use other configuration source:

  • local file
  • URL
  • SVN, but not last from yesterday

you need another nightliesServer instance (another port number) and dedicated nightliesClients to connect to.

For the moment the easiest solution is to use nightliesServer-restart script, and nightliesClient-restart script, modifying all necessary parameters (source of configuraton) in the script itself.

Restarting

For a restart or a start with a different configuration several points have to be taken into account
  • on which port the server is listening
  • on which port the client for a specific slot is listening on
  • if there are still clients running a slot and writing to the slot directories
  • if there is a client idle waiting for work

A normal start of the server nightliesServer.py will pick up yesterdays configuration.xml to have a well-defined configuration and will listen on port 61008 (started without an argument the server and client are using port 61007). If nightliesServer.py is started with the restart argument it will pick up the most recent version of configuration.xml for all slots etc. Starting the server with the argument test it will use on port 61008 and use the most recent version of configuration-test.xml from svn (configuration-test.xml has to be edited via svn). Anyway, one has to take care, that for (re)starting a nightlies slot are all slots:

  • that no other client is still running for the same slot/architecture, i.e. if necessary use killtree.py or ps xf | grep lhcb-what2kill | grep -v whatnot2kill | awk '{print $1}' | xargs kill
  • the work directory of the slot(s) to run are clean, else it will probably pick up old stuff
  • that the server is running/started on the desired port with the desired configuration
  • the client is started with the desired slot and platform on the right port
    • if the client is run without a specific platform/slot it will run waiting for work -- and will probably pick up an undesired slot/platform from the server

Windows Nightlies

The support for Windows has been dropped

Legacy information

The nightlies for Windows have to be run on their on. The builds are started by a windows-port of crontab
  • the build-partition is F:\ with the build directory containing the crontab-dir and the actual build-dir in nightlies and scripts containing the Windows scripts
  • the AFS token is (in principle) renewed every hour with the script C:\Program Files\OpenAFS\Client\Program\afscreds.exe
  • everything is build locally on the machines, so the necessary files are (re)copied from AFS by python F:\build\scripts\cleanBuildDirs.py
  • the interface to the actual nightlies-scripts is done with f:\build\scripts\nightliesClient9.bat

Setup

the builds for the different slots/architectures are distributed over the windows machines
  • cerntslhcblhcb03: i686-winxp-vc9-dbg lhcb-head
  • cerntslhcblhcb04: i686-winxp-vc9-dbg lhcb-gaudi-head
  • cerntslhcblhcb05: i686-winxp-vc9-dbg lhcb-prerelease

Structure of LHCb Nightlies

File structure

  • nightlies related files
    • Nightlies: scripts necessary for running the nightlies, e.g. LHCbNightliesOld.py...
      • bin: helper programs around the Nightlies, e.g. nightliesClient.sh...
    • Webpage: scripts for displaying the results, e.g. LHCbNightliesWebpage.py,...
      • webpage helpers/cgi-scripts: nightlies.py,... (reside in /afs/cern.ch/lhcb/software/nightlies/www/cgi-bin)
      • javascript=/=css helpers --> reside in /afs/cern.ch/lhcb/software/nightlies/www/js/ /afs/cern.ch/lhcb/software/nightlies/www/css/
      • rss db (resides in /afs/cern.ch/lhcb/software/nightlies/db not in .../www/... !!!)
  • acrontab: configuration of the acrontab
  • sensors: sensor scripts monitoring the [[https://sls.cern.ch/sls/service.php?id=LHCb_Nightlies][system]

Relations

  • working/output directories
    • local
      • build directories
        • /build/nightlies/SLOTS/WEEKDAYS
    • afs ($LHCBNIGHTLIES --> /afs/cern.ch/lhcb/software/nightlies/)
      • build directories
        • /afs/cern.ch/lhcb/software/nightlies/SLOTS/WEEKDAYS
      • output/web
        • /afs/cern.ch/lhcb/software/nightlies/www/
          • summaries index-LHCb-cache.html and index-LHCb-cache.xml (generated regularly by acron:=LHCbNightliesOld=)
            • refer to job-outputs
              • merged from LHCbNightliesOld from local dirs to /afs/cern.ch/lhcb/software/nightlies/www/logs/
              • SLOT.DAY_PROJECT_PLATFORM{.log,-log.html,-log.summary,-qmtest.log,-qmtest (DIR) }
          • tests index-LHCb-cache.html and index-LHCb-cache.xml (generated by hand)
          • acron outputs: /afs/cern.ch/lhcb/software/nightlies/www/logs/lient.DAY_mainlog_MACHINE.cern.ch.txt

Stable/Testing

  • stable
    • started by acrontab daily
  • testing
    • no relations to acron-starts: are run by hand

Guides

adding a new slot

  • for new slot with name lhcb-newslot
    • if configuration-parameters are set via environment variables, check if they are getting set properly, i.e. $BUILDROOT, $AFSROOT,...
      • environment variables are exported by LbLogin, nightliesClient.sh
    • depending on the slot configuration -- build and output pathes have to be set and created accordingly (or vice versa)
    • 'standard' directories are
      • build directory:
        • on each work node buildlhcb0?: working directory /build/nightlies/lhcb-newslot (the base directory should be writable, in case the system has to create it)
          • corresponding parameters in the slot config:
              <path value="%BUILDROOT%/nightlies/%SLOT%/%DAY%/%CMTCONFIG%" name="builddir"/>
            <path value="%BUILDROOT%/builders/%SLOT%" name="buildersdir"/>
          • the directories for each day are created by LHCbNightliesOld.py when running on a node /build/nightlies/lhcb-newslot/Mon, /build/nightlies/lhcb-newslot/Tue, ...
      • release directory:
        • the directory/AFS volume containing the build outputs
          • the parameter in the slot's configuration for setting the directory looks like:
             <path value="%AFSROOT%/cern.ch/lhcb/software/nightlies/%SLOT%/%DAY%" name="releasedir"/>
        • due to the size limits of AFS volumes it is advisable to create small general directory/volume for the slot and add for each day an own AFS volume with enough AFS space allocated
          • $LHCBNIGHTLIES/lhcb-newslot aka /afs/cern.ch/lhcb/software/nightlies/lhcb-newslot
          • for smaller slots one volume for all days is sufficient and one can add the days as subdirectories
            • volume/mount point creation: afs_admin create -q 20000 /afs/cern.ch/lhcb/software/nightlies/lhcb-newslot q.lhcb.nght_new
            • subdirectories for each day mkdir /afs/cern.ch/lhcb/software/nightlies/lhcb-newslot/Mon,...
          • for larger projects it is reasonable to create a directory in the AFS-base directory for the slot and create new AFS-volumes for each day
            • e.g. create a volume called q.lhcb.nght_new_Mon and mount it to the directory Mon
              • afs_admin create -q 80000000 /afs/cern.ch/lhcb/software/nightlies/lhcb-newslot/Mon q.lhcb.nght_new_Mon
      • www directory:
        • the general www directory is located at $LHCBNIGHTLIES/www
          • the log files themselfes are located in $LHCBNIGHTLIES/www/logs
        • if the path is on AFS check if it is sufficiently large for the log-files
        • corresponding parameter in the slot config:
    • for production add new jobs to acron if necessary
      • add cleaning jobs to acrontab
        • 10 0 * * * lxplus rm -rf /afs/cern.ch/lhcb/software/nightlies/www/logs/lhcb-newslot.`date +\%a`_*
        • 5 0 * * * lxplus rm -rf /afs/cern.ch/lhcb/software/nightlies/lhcb-newslot/`date +\%a`/*

setting up a test system

  • check out the main system LHCbNightliesOld.py and LHCbNightliesWebpage.py
    • including helper NightliesXML.py
    • depend on LCG nightlies
    • depend on LbScripts: LbConfiguration, LbUtils
      • e.g. /afs/cern.ch/lhcb/software/releases/LBSCRIPTS/prod/InstallArea/python/
  • checkout scripts for clients and server
    • nightliesClient.sh & nightliesServer.sh
      • setup in both scripts the pathes, which are exported and used in the system
  • prepare the base directories
  • start server with nightliesServer.sh
  • set in nightliesClient.sh the machine where the server is running
    • (have to use a parameter for that!)
    • or start LCG-client directly (check for environment variables)

setting up a production system

  • basic steps as for a test system
  • check out helpers as killTree.sh
  • (semi-)automatization
    • checkout the acrontab
      • adapt the server and client starts as necessary
    • or set up a new acrontab
      • kill the server from the previous day and start a new one on a specific machine
        58 23 * * * lxbuild135 ~/bin/killtree.py kill `ps xf | grep "nightliesServer.sh" | grep -v "grep" | awk '{print $1}'` 2>&1
        20 0 * * * lxbuild135 ~/bin/nightliesServer.sh > /afs/cern.ch/lhcb/software/nightlies/www/logs/nightliesServer_`date +\%a`_lxbuild135.txt 2>&1
      • kill on each client machine any running nightlies
        3 0 * * * buildlhcb01 ~/bin/killTree.sh nightlies `date --date='1 day ago' +\%a` 2>&1
      • add starts for clients
        01 01 * * * buildlhcb01 ~/bin/nightliesClient.sh 686-slc5-gcc43-opt > /afs/cern.ch/lhcb/software/nightlies/www/logs/nightliesClient_`date +\%a`_1.txt 2>&1
      • update the webpage
        */15 * * * * lhcb-coverity export AFSROOT=/afs && export PYTHONPATH=/afs/cern.ch/user/l/lhcbsoft/PRODUCTION/prod-LCG:/afs/cern.ch/lhcb/software/releases/LBSCRIPTS/prod/InstallArea/python:$PYTHONPATH && /afs/cern.ch/user/l/lhcbsoft/PRODUCTION/prod-LHCb/LbRelease/LHCbNightliesWebpage.py /afs/cern.ch/lhcb/software/nightlies/www/index-LHCb-cache.html SVN > /afs/cern.ch/lhcb/software/nightlies/www/index-LHCb-cache.out 2>&1
        • if coverity is used, the webpage script has to run on the same machine as coverity
        • adapt pathes accordingly
      • add cleaning job on each node
        2 0 * * * buildlhcb01 rm -rf /build/nightlies/lhcb-*/`date --date='1 days ago' +\%a`/*
  • checkout sensor scripts and add starts to acrontab in sync (more or less) with the settings in SDB
    • parameters should hopefully be all parseble now
      */25 * * * * buildlhcb06 export AFSROOT=/afs && export PYTHONPATH=/afs/cern.ch/lhcb/software/releases/LBSCRIPTS/prod/InstallArea/python:$PYTHONPATH && /afs/cern.ch/user/l/lhcbsoft/PRODUCTION/prod-LHCb/LbRelease/NightliesSensors/WebpageSensor.py --ft 90 --fw /afs/cern.ch/lhcb/software/nightlies/www/LHCb_Nightlies_Webpage.xml --verbose /build/nightlies/OldLogs/SensorWebpageLog > /build/nightlies/OldLogs/SensorWebpage.out 2>&1
      */25 * * * * buildlhcb06 export AFSROOT=/afs && export PYTHONPATH=/afs/cern.ch/lhcb/software/releases/LBSCRIPTS/prod/InstallArea/python:$PYTHONPATH && /afs/cern.ch/user/l/lhcbsoft/PRODUCTION/prod-LHCb/LbRelease/NightliesSensors/RSSStatus.py --rw /afs/cern.ch/lhcb/software/nightlies/www/LHCb_Nightlies_RSS.xml --ri /afs/cern.ch/lhcb/software/nightlies/db/nightlies.results --verbose /build/nightlies/OldLogs/SensorRSSLog > /build/nightlies/OldLogs/SensorRSS.out 2>&1
      */25 * * * * buildlhcb06 export AFSROOT=/afs && export PYTHONPATH=/afs/cern.ch/lhcb/software/releases/LBSCRIPTS/prod/InstallArea/python:$PYTHONPATH && /afs/cern.ch/user/l/lhcbsoft/PRODUCTION/prod-LHCb/LbRelease/NightliesSensors/AFSSensor.py --aw /afs/cern.ch/lhcb/software/nightlies/www/LHCb_Nightlies_AFS.xml > /build/nightlies/OldLogs/AFSSensor.out 2>&1
      # sensor for coverity needs to run on lxbuild161 aka lhcb-coverity
      */25 * * * * lhcb-coverity export AFSROOT=/afs && export PYTHONPATH=/afs/cern.ch/lhcb/software/releases/LBSCRIPTS/prod/InstallArea/python:$PYTHONPATH && /afs/cern.ch/user/l/lhcbsoft/PRODUCTION/prod-LHCb/LbRelease/NightliesSensors/CoveritySensor.py --cw /afs/cern.ch/lhcb/software/nightlies/www/LHCb_Nightlies_Coverity.xml --verbose /build/nightlies/OldLogs/SensorCoverityLog > /build/nightlies/OldLogs/SensorCoverity.out 2>&1

setting up Coverity

  • to do

Development

See LHCbNightliesDevelopment

test system

  • checkout
    • base system (trunk/tag)
      • LHCbNightliesOld.py
    • helper scripts (trunk/tag)
    • configuration
    • LCG nightlies
    • LbScripts
  • create test slot
  • apply changes
    • start test server
    • run test client for the test slot
  • ...

production system

  • in addition to a test system
    • checkout/prepare acrontab
    • checkout sensors
Edit | Attach | Watch | Print version | History: r26 < r25 < r24 < r23 < r22 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r26 - 2014-02-12 - MarcoClemencic
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback