How to test Moore in the online environment on a single node

Due to the necessity of having DIM, Moore online tests can only run in the online environment, e.g. on a plus or HLT node. MooreOnline must be used to have the required online components available.

Setup the build environment

Since subversion is only available on plus nodes, perform these steps there. They are only needed once.

   $> lb-dev MooreOnline/latest
   $> cd MooreOnlineDev_vXrY
   $> getpack PRConfig head

You may need to do this

   $> getpack -p anonymous -s PRConfig head

Use a new shell to build local packages

   $> cd $User_release_area/MooreOnlineDev_vXrY
   $> make -j 5 install

Most times it is advised to use the latest version of Moore. The nightlies are available on cvmfs and can be also be used:

   $> lb-dev --nightly-cvmfs --nightly lhcb-head Today MooreOnline HEAD

Find a suitable machine to run on

Since plus nodes are quite limited in terms of CPU capacity, dedicated HLT performance testing nodes are available:

  1. hltperf-asus-amd6272
  2. hltperf-dell-x5650
  3. hltperf-intel-e5-2630 (faster node with 32 cores)
  4. hltperf-action-x5650
  5. hltperf-quanta01-e52630v4 (This node is used to run daily tests, do not use it.)

When you login to one of these nodes, you won't immediately have the standard LHCb software environment. To get it:

$> source /cvmfs/lhcb.cern.ch/lib/LbLogin.sh

Make sure the data is available

The Moore test will read data from file(s). These file(s) must be available on the local disk of the node that is used, so you'll have to copy the files there.

The directory /localdisk/hlt1 is writable, so if files are not already present, create a new subdirectory and put them there.

The data have to have the appropriate L0TCK and they have to be in mdf format with the same compression as is used in the Online farm. An example script, Juggle.py, to convert L0 accepted events into the right format is attached.

Test your setup

The timing test is more difficult to debug. So before moving to the timing test, run your setup with the data and options you want to use later in a normal gaudi job.

Setup the running environment and run the test

Once you have found a node, log in to it check that nobody else is using it, setup the running environment and run.

   $> cd $User_release_area/MooreOnlineDev_v24r3
   $> ./run bash
   $> export PYTHONPATH=/scratch/jenkins/benchmark/python:$PYTHONPATH
   $> export PYTHONPATH=/home/raaij/software/lib/python2.7/site-packages:$PYTHONPATH (on CentOS7)
   $> cd PRConfig/scripts
   $> python -i MooreOnlineTest.py --numa --moore-output-level=3 --moore-log=log/moores.log --warmup=300 --runtime=2000 --viewers --input=/localdisk1/hlt1/sstahl/L0Filtered160F/ HLT1_2017_160F

For an explanation of the arguments to MooreOnlineTest.py, run

   $> python MooreOnlineTest.py --help
Warning: If you use PRConfig v1r18 or earlier, you have to change --input to --directory.

The final argument should be the name of a python module in PRConfig/python/MooreTests that contains the required information to configure Moore. Make sure that the database tags are correct. Look at the existing ones to get an idea of what to run. In case the events are not ordered by time, e.g. when they were merged from several grid jobs. Set the option

   from Configurables import UpdateAndReset
   UpdateAndReset().abortRetroEvents = False

Looking at the Results

The logfiles of the individual worker tasks are split at the end of the test and contain the timing table that you are most likely interested in. An additional column is added to the table on the right-most side, which contains the time per event normalised to the measured event rate.

A file named averages.log is also created, which contains the average normalised time per event of all entries in the timing table, averaged over all worker processes.

Some results are also reported by the test and stored in the test_results.db file, which is a python shelve database. It contains the memory used per process, as reported by the psutil package, the average instantaneous rates, and the CPU usage of all cores on the machine during the test. All entries are timestamped.

Cleaning up

If the test completes successfully, all processes will be terminated, except the LogViewer (if started), which can be exited using Ctrl+C. If the test is interupted, not all processes might be terminated, clean them up using:

   $> killall GaudiCheckpoint.exe

The temporary directory is not removed and should be manually removed once the files it contains are no longer needed for debugging or analysis.

Once you are done remove the file /tmp/logSrv.fifo. It blocks other users from running a timing test on that machine.

Installing a nightly build on the online system

If you want to run from the latest nightly, you can install it with the following commands in the online system.
source /scratch/jenkins/setenv.sh
installLast
source /scratch/jenkins/bin/setupSearchPath.sh $(getLastBuildId)
lb-run MooreOnline HEAD runtest.sh

FAQ

*Which data set to use? To get a proper estimate of the timing, you need a sample filtered with an appropriate L0 TCK. For the start of the 2016 commissioning you can use the files in here /localdisk/hlt1/sstahl/tck_0x0050_25ns/ .

The test do not start to run:

  1. Go to /tmp, remove logSrv.fifo and every file with bm_* . If you do not have the permissions, ask the person who created them.
  2. Do the same fore /dev/shm
  3. Alternatively, use the option --partition=Something to create a partition which had not existed before.

You have a lot of error messages related to L0:

  1. Check the consistency of the L0 in the data sample and the setting in your Hlt configuration.

You are running on data which have not been processed by Hlt2, then you need the run change handler to pickup the xml files from the online system and not use the Online conditions database:

  1. Add these lines to your configuration file:
 from Configurables import CondDB
    conddb = CondDB()
    conddb.Online = True
    conddb.UseDBSnapshot = True
    conddb.EnableRunChangeHandler = True
    conddb.Tags['ONLINE'] = 'fake'
    import All
    conddb.RunChangeHandlerConditions = All.ConditionMap
    from Configurables import MagneticFieldSvc
    MagneticFieldSvc().UseSetCurrent = True
    conddb.EnableRunStampCheck = False
    Moore().CheckOdin = False

  1. and do "export PYTHONPATH=/group/online/hlt/conditions/RunChangeHandler:$PYTHONPATH" after ./run bash.
You get the message "(ERROR) Client Connecting to DIM_DNS on localhost: Connection refused":
  1. Check that dns service is running. Otherwise, start it (as root: /usr/local/bin/dns)
-- RoelAaij - 2015-02-03
Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt Juggle.py.txt r1 manage 2.9 K 2015-09-30 - 21:28 SaschaStahl  
Edit | Attach | Watch | Print version | History: r28 < r27 < r26 < r25 < r24 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r28 - 2017-11-28 - SaschaStahl
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback