Difference: LHCbNightliesTips (1 vs. 9)

Revision 92013-01-30 - ThomasHartmann

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

related topics

Line: 49 to 49
 
  • especially dev builds can get quite large, so that the disk usage on the build nodes has to be monitored
  • sometimes the rss feed database gets corrupted during AFS hiccups. If there is a corrupted db error in the monitoring for more than 20minutes fix the db as described above
Added:
>
>

occurring errors

  • a new project is not build
    • a newly created project is added to a slot but the build fails really early around cmt and checkout
      • check, if the project is known to LbScripts (via LbLogin)
      • maybe it is only in dev -->
run the dev version of LbLogin for the given platform and see if it works
 

Useful stuff

  • copy qmtest dir from today (Wednesday) by hand onto the AFS-dir (in /build/nightlies on the build machine)
    • find ./ -type d -iname "*Wed*qmtest" -mtime -1 -print -exec cp -a '{}' $LHCBNIGHTLIES/www/logs/. \;

Revision 82012-12-19 - ThomasHartmann

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

related topics

general

  • login details for the user lhcbsoft are known by Marco Cl., Ben and Thomas
Changed:
<
<
  • the nightlies server is running on lxbuild135 aka buildlhcb04
>
>
  • the nightlies server is running on lxbuild171 aka buildlhcb07
 
  • Coverity is running on lxbuild161 aka buildlhcb01
Changed:
<
<
  • a job can be started by hand with ~lhcbsoft/bin/nightliesClient.sh PLATFORM SLOT (requiring a running server on lxbuild135
>
>
  • a job can be started by hand with ~lhcbsoft/bin/nightliesClient.sh PLATFORM SLOT (requiring a running server on lxbuild171
 
  • a job for a specific platform can be restarted with
    • a running 'restart' nightlies server, i.e. a server in addition to the standard server, which takes the configuration from HEAD of svn and runs on a different port
      • i.e. sarted on build135 with ~lhcbsoft/bin/nightliesServer.sh restart

Revision 72012-10-12 - ThomasHartmann

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
Added:
>
>

related topics

 

general

  • login details for the user lhcbsoft are known by Marco Cl., Ben and Thomas
  • the nightlies server is running on lxbuild135 aka buildlhcb04

Revision 62012-08-29 - ThomasHartmann

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

general

  • login details for the user lhcbsoft are known by Marco Cl., Ben and Thomas
Line: 48 to 48
 

Useful stuff

  • copy qmtest dir from today (Wednesday) by hand onto the AFS-dir (in /build/nightlies on the build machine)
Changed:
<
<
    • find ./ -type d -iname "*Wed*qmtest" -mtime -1 -print | xargs cp -a {} $LHCBNIGHTLIES/www/logs/.
>
>
    • find ./ -type d -iname "*Wed*qmtest" -mtime -1 -print -exec cp -a '{}' $LHCBNIGHTLIES/www/logs/. \;
 
  • or just today's logs (in /build/nightlies on the build machine)
    • cp -a lhcb-*/Wed/*/www/* $LHCBNIGHTLIES/www/logs/
-- ThomasHartmann - 28-Feb-2012

Revision 52012-08-29 - ThomasHartmann

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

general

  • login details for the user lhcbsoft are known by Marco Cl., Ben and Thomas
Line: 46 to 46
 
  • especially dev builds can get quite large, so that the disk usage on the build nodes has to be monitored
  • sometimes the rss feed database gets corrupted during AFS hiccups. If there is a corrupted db error in the monitoring for more than 20minutes fix the db as described above
Added:
>
>

Useful stuff

  • copy qmtest dir from today (Wednesday) by hand onto the AFS-dir (in /build/nightlies on the build machine)
    • find ./ -type d -iname "*Wed*qmtest" -mtime -1 -print | xargs cp -a {} $LHCBNIGHTLIES/www/logs/.
  • or just today's logs (in /build/nightlies on the build machine)
    • cp -a lhcb-*/Wed/*/www/* $LHCBNIGHTLIES/www/logs/
 -- ThomasHartmann - 28-Feb-2012

Revision 42012-07-06 - ThomasHartmann

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
Changed:
<
<

Tips for using the LHCb Nightlies System

>
>

general

  • login details for the user lhcbsoft are known by Marco Cl., Ben and Thomas
  • the nightlies server is running on lxbuild135 aka buildlhcb04
  • Coverity is running on lxbuild161 aka buildlhcb01
  • a job can be started by hand with ~lhcbsoft/bin/nightliesClient.sh PLATFORM SLOT (requiring a running server on lxbuild135
  • a job for a specific platform can be restarted with
    • a running 'restart' nightlies server, i.e. a server in addition to the standard server, which takes the configuration from HEAD of svn and runs on a different port
      • i.e. sarted on build135 with ~lhcbsoft/bin/nightliesServer.sh restart
    • starting a slot on a platform with ~lhcbsoft/bin/restart.py --slot SLOT --platform PLATFORM
      • it takes care of old build directories (locally and on AFS)
      • it renames the previous local build directory to platformOLD for debugging (beware of clogging the local build dir)
      • if the previous build dir is not needed call with --purge

System monitoring

  • Sometimes the system glitches and the monitoring marks an (transient) error
    • this happens especially for sensors depending on AFS, e.g. the rss feed monitor checking the db on AFS
    • if an error appears for more than 20minutes (the update frequency) it is probably not a transient one...
 
Changed:
<
<
  • fix corrupted rss-database in $LHCBNIGHTLIES/db, i.e. /afs/cern.ch/lhcb/software/nightlies/db
>
>

Tips for using the LHCb Nightlies System

  • to fix a corrupted rss database file in $LHCBNIGHTLIES/db, i.e. /afs/cern.ch/lhcb/software/nightlies/db
 
    • dump the existing db into a temporary file with sqlite3 nightlies.results ".dump" > temp.sql
    • read it back with sqlite3 nightlies.results < temp.sql
Changed:
<
<
  • create a new volume on AFS
>
>
  • AFS
    • to create a new volume on AFS
 
    • afs_admin create -q SIZEAS100000 /afs/cern.ch/lhcb/PATH/WHERE/TO/MOUNT q.lhcb.NAME-OF-VOLUME
Changed:
<
<
  • list the access rights = fs listacl PATH/=
>
>
    • to list the access rights fs listacl PATH/
 
    • change ACLs with fs setacl -dir . -acl USER:GROUPRIGHTSl man
Changed:
<
<
  • coverity services on lxbuild161
    • = /build/coverity/coverity-integrity-manager/bin $ ./cov-im-ctl status=
>
>
  • Coverity services on lxbuild161
    • to get the current status /build/coverity/coverity-integrity-manager/bin $ ./cov-im-ctl status ,i.e., in the monitoring
    • to start the processes /build/coverity/coverity-integrity-manager/bin $ ./cov-im-ctl start
 
Added:
>
>

Known unpleasantnesses

  • the server process for Coverity web interface (java / tomcat) can to run for days/weeks without problems and suddenly start to hog memory/swap
    • this will probably cause lxbuild161 aka buildlhcb01 to die and makes Joel unhappy
    • normally, this behavior should not arise much anymore, since the coverity processes are restarted all two days via acron
    • if it occurs and lxbuild161 has to be restarted, the coverity processes have to be started by hand (see above)
  • Coverity on lxbuild161 uses the RAM disk on /dev/shm/ for the actual build and the output is copied at the end onto disk
    • occasionally the RAM disk is purged (by an unknown process :-/) so that the currently running and following projects get ruined
      • the creation date of the RAM disk is available in the monitoring or in file /dev/shm/date.log
      • if the start date of the RAM disk is later than the start times of Coverity during the run days (started Wednesdays & Fridays around 3:06), the Coverity nightlies have to be stopped and restarted by hand
  • especially dev builds can get quite large, so that the disk usage on the build nodes has to be monitored
  • sometimes the rss feed database gets corrupted during AFS hiccups. If there is a corrupted db error in the monitoring for more than 20minutes fix the db as described above
  -- ThomasHartmann - 28-Feb-2012 \ No newline at end of file

Revision 32012-05-09 - ThomasHartmann

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

Tips for using the LHCb Nightlies System

Line: 11 to 10
 
    • afs_admin create -q SIZEAS100000 /afs/cern.ch/lhcb/PATH/WHERE/TO/MOUNT q.lhcb.NAME-OF-VOLUME
  • list the access rights = fs listacl PATH/=
    • change ACLs with fs setacl -dir . -acl USER:GROUPRIGHTSl man
Added:
>
>
  • coverity services on lxbuild161
    • = /build/coverity/coverity-integrity-manager/bin $ ./cov-im-ctl status=
  -- ThomasHartmann - 28-Feb-2012 \ No newline at end of file

Revision 22012-03-19 - ThomasHartmann

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
Deleted:
<
<
 

Tips for using the LHCb Nightlies System

Line: 8 to 7
 
    • dump the existing db into a temporary file with sqlite3 nightlies.results ".dump" > temp.sql
    • read it back with sqlite3 nightlies.results < temp.sql
Added:
>
>
  • create a new volume on AFS
    • afs_admin create -q SIZEAS100000 /afs/cern.ch/lhcb/PATH/WHERE/TO/MOUNT q.lhcb.NAME-OF-VOLUME
  • list the access rights = fs listacl PATH/=
    • change ACLs with fs setacl -dir . -acl USER:GROUPRIGHTSl man
 -- ThomasHartmann - 28-Feb-2012

Revision 12012-02-28 - ThomasHartmann

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="LHCbComputing"

Tips for using the LHCb Nightlies System

  • fix corrupted rss-database in $LHCBNIGHTLIES/db, i.e. /afs/cern.ch/lhcb/software/nightlies/db
    • dump the existing db into a temporary file with sqlite3 nightlies.results ".dump" > temp.sql
    • read it back with sqlite3 nightlies.results < temp.sql

-- ThomasHartmann - 28-Feb-2012

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback