TWiki> LHCb Web>LHCbComputing>LHCbNightliesTips (revision 7)EditAttachPDF

related topics

general

  • login details for the user lhcbsoft are known by Marco Cl., Ben and Thomas
  • the nightlies server is running on lxbuild135 aka buildlhcb04
  • Coverity is running on lxbuild161 aka buildlhcb01
  • a job can be started by hand with ~lhcbsoft/bin/nightliesClient.sh PLATFORM SLOT (requiring a running server on lxbuild135
  • a job for a specific platform can be restarted with
    • a running 'restart' nightlies server, i.e. a server in addition to the standard server, which takes the configuration from HEAD of svn and runs on a different port
      • i.e. sarted on build135 with ~lhcbsoft/bin/nightliesServer.sh restart
    • starting a slot on a platform with ~lhcbsoft/bin/restart.py --slot SLOT --platform PLATFORM
      • it takes care of old build directories (locally and on AFS)
      • it renames the previous local build directory to platformOLD for debugging (beware of clogging the local build dir)
      • if the previous build dir is not needed call with --purge

System monitoring

  • Sometimes the system glitches and the monitoring marks an (transient) error
    • this happens especially for sensors depending on AFS, e.g. the rss feed monitor checking the db on AFS
    • if an error appears for more than 20minutes (the update frequency) it is probably not a transient one...

Tips for using the LHCb Nightlies System

  • to fix a corrupted rss database file in $LHCBNIGHTLIES/db, i.e. /afs/cern.ch/lhcb/software/nightlies/db
    • dump the existing db into a temporary file with sqlite3 nightlies.results ".dump" > temp.sql
    • read it back with sqlite3 nightlies.results < temp.sql
  • AFS
    • to create a new volume on AFS
      • afs_admin create -q SIZEAS100000 /afs/cern.ch/lhcb/PATH/WHERE/TO/MOUNT q.lhcb.NAME-OF-VOLUME
    • to list the access rights fs listacl PATH/
      • change ACLs with fs setacl -dir . -acl USER:GROUPRIGHTSl man
  • Coverity services on lxbuild161
    • to get the current status /build/coverity/coverity-integrity-manager/bin $ ./cov-im-ctl status ,i.e., in the monitoring
    • to start the processes /build/coverity/coverity-integrity-manager/bin $ ./cov-im-ctl start

Known unpleasantnesses

  • the server process for Coverity web interface (java / tomcat) can to run for days/weeks without problems and suddenly start to hog memory/swap
    • this will probably cause lxbuild161 aka buildlhcb01 to die and makes Joel unhappy
    • normally, this behavior should not arise much anymore, since the coverity processes are restarted all two days via acron
    • if it occurs and lxbuild161 has to be restarted, the coverity processes have to be started by hand (see above)
  • Coverity on lxbuild161 uses the RAM disk on /dev/shm/ for the actual build and the output is copied at the end onto disk
    • occasionally the RAM disk is purged (by an unknown process :-/) so that the currently running and following projects get ruined
      • the creation date of the RAM disk is available in the monitoring or in file /dev/shm/date.log
      • if the start date of the RAM disk is later than the start times of Coverity during the run days (started Wednesdays & Fridays around 3:06), the Coverity nightlies have to be stopped and restarted by hand
  • especially dev builds can get quite large, so that the disk usage on the build nodes has to be monitored
  • sometimes the rss feed database gets corrupted during AFS hiccups. If there is a corrupted db error in the monitoring for more than 20minutes fix the db as described above

Useful stuff

  • copy qmtest dir from today (Wednesday) by hand onto the AFS-dir (in /build/nightlies on the build machine)
    • find ./ -type d -iname "*Wed*qmtest" -mtime -1 -print -exec cp -a '{}' $LHCBNIGHTLIES/www/logs/. \;
  • or just today's logs (in /build/nightlies on the build machine)
    • cp -a lhcb-*/Wed/*/www/* $LHCBNIGHTLIES/www/logs/
-- ThomasHartmann - 28-Feb-2012
Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r7 - 2012-10-12 - ThomasHartmann
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback