Note that Roel has since prepared a new installation script for piquets

Installing Moore at the pit

  • lb-dev Moore v23r7
  • cd MooreDev_v23r7
  • getpack TCK/HltTCK
  • getpack Hlt/HltPiquetScripts ## Roel needs to make this package, which will contain the RedoL0 and Moore_CreateTCK scripts
  • getpack Hlt/HltSettings
  • change the Hlt TCK
  • make -j 10
  • ./run bash
  • #Run the RedoL0.py script on some data
  • gaudirun.py RedoL0.py
  • # change the config label in CreateTCK script -- this will be important for the online and run control, and will also appear in TCKsh
  • # it has Moore.createConfig = True
  • gaudirun.py Moore_CreateTCK.py
  • Then run TCKs
  • > TCKsh
  • ##
  • listConfigurations()
  • # the new one will appear with as the hex identifier
  • Now create configuration
  • creating mapping TCK: 0x00900032 -> ID: 24e17751af7447fed73ec35b108218b6
  • createTCKEntries( { 0x00900032 : id }, cas = cas_rw )
  • Can also write some python scripts to do many TCKs at the same time for you. TCKsh is also just a python script.
  • Note: PrivateTCK page needs updating.
  • Now
  • log as hlt_oper
  • go to the satellite area
  • cd /group/hlt/sattelite/MooreOnlinePit_v23r7
  • source InstalArea/x86....-opt/setupMoore.sh
  • cd TCK/HltTCK
  • svn up
  • ./scripts/createTCKmanifest config.cdb manifest
  • edit this file manifest/MOORE_v23r7
  • and copy it to here
  • /group/online/hlt/conditions/manifest/MOORE_v23r7
  • Install in the run control
  • Make sure to make an entry in the HLT elog

Run control

  • ssh -X -Y ui01
  • /group/online/ecs/Shortcuts311/LHCb/ECS/ECS_UI_FSM.sh * right click on LHCb or LHCb_HLT2 to bring up their respective control panel

Tools to figure out what's going on

  • log into plus
  • /group/online/presenter/presenter.sh
  • /group/online/dataflow/scripts/farmStatus
  • /group/online/dataflow/scripts/farmMon
  • /group/online/dataflow/scripts/mbmon
  • Any storage problem: call online piquet.
  • Get to know your online piquet on your week.
  • logs live here /clusterlogs/
  • errorLog
  • Look at the monitoring histograms in locations like /hist/Savesets/2015/LHCb/MooreHistAdder/05/20/MooreHistAdder-0-20150520T150517.root
  • E.g. 4/6/2015 run 15603:
  • /hist/Savesets/2015/LHCb/Moore1HistAdder/06/04/Moore1HistAdder-153603-20150604T004543.root
  • mass histograms live in: /hist/Savesets/2015/LHCb/HltMonitor/06/04/HltMonitor-153603-20150604T000846-EOR.root

New database snapshots

  • look at http://lhcb-release-area.web.cern.ch/LHCb-release-area/DOC/dbase/conddb/release_notes.html to find the right tags for dddb and/or conddb
  • create an LHCb project user area as yourself on plus:
         $> ssh plus
         $> lb-dev LHCb vXrY
         
  • Create the snapshots, the filename should be PARTITION_TAG.db, where PARTITION is either LHCBCOND or DDDB and TAG is a valid global tag.
         $> cd LHCbDev_vXrY
         $> ./run bash
         $> CondDBAdmin_MakeSnapshot.py -T dddb-20150724 DDDB sqlite_file:DDDB_dddb-20150724.db/DDDB
         $> CondDBAdmin_MakeSnapshot.py -T cond-20150724 LHCBCOND sqlite_file:LHCBCOND_cond-20150724.db/LHCBCOND
         
  • Copy the new snapshot files to /group/online/hlt/conditions as user online
  • Make sure to update/create the manifest file(s) for the appropriate Moore versions in /group/online/hlt/conditions/manifest

Creating new HLT1 checkpoints

Always test the checkpoints on the farm with some data. HLT1 can be tested when there is no beam, and HLT2 on the nonsense output of the HLT1 test run. It'll be obvious if it doesn't work.
  • Open the run control
  • Go to RunInfo-> Under Trigger Configurations open Create/Edit.
  • Select the correct configuration from the drop-down box.
  • Click on Checkpointing at the bottom of the HLT1 Settings panel
  • In this panel, each action will run a process and popup a window with an Ok button. Do not hit the Ok button until the process has completed or failed. Ok should always be clicked before exiting the Checkpointing, otherwise there will be problems next time.
  • Click "Create Checkpoint" and wait. This takes as long as initialize of HLT1 only.
  • When it's done, you should see "Checkpoint finished. Process now exiting." in the log window.
  • Only then hit Ok in the popup.
  • Click on "Test Checkpoint" and check if you get "RESTORE TEST WAS SUCCESSFUL." in the log window.
  • Only then hit Ok in the popup.
  • Click on "Create gzip and md5" and check if you see " All Done." in the log window.
  • Only then hit Ok in the popup.

Creating new HLT2 checkpoints

  • Open the run control
  • Go to RunInfo-> Under Trigger Configurations open Create/Edit.
  • Select the correct configuration from the drop-down box.
  • Click on Checkpointing at the bottom of the HLT2 Settings panel
  • In this panel, each action will run a process and popup a window with an Ok button. Do not hit the Ok button until the process has completed or failed. Ok should always be clicked before exiting the Checkpointing, otherwise there will be problems next time.
  • Under "Checkpoint application name" select "Moore2".
  • Create the checkpoint as for HLT1

Switching Moore startup mode to Checkpointing

  • If the checkpoints don't work, or you've created a new one, the startup mode has be be changed.
  • Check with the shift leader if now is a good moment.
  • Open the LHCb (or LHCb_HLT2 in case of HLT2) panel and make sure the run is stopped; stop it if needed.
  • Next to the drop-down box to select the activity (top right of main panel), click on "View..."
  • Click on "More..."
  • In the drop-down box next to "Moore startup mode" select the required mode.
  • Click "Apply and Close"; all the DAQ will be reset.
  • Start a run as normal.
  • On plus start: /group/online/dataflow/scripts/torrentMon The checkpoints are distributed using bittorrent, and the torrentMon shows the distribution status.
  • When configure is sent the first time, all nodes will go yellow on the torrentMon and back to green when they have received the checkpoint.
  • If a few nodes are slow, the HLT state might go to ERROR because it hits the timeout. Just wait until everything has gone green on the torrentMon.
  • You can switch on the Autopilot to take it from here.
  • If all nodes go to error or the rate is 0, swich the startup mode back to Forking.

Notes:

  • Roel has created the Hlt/HltPiquetScripts package with the scripts.
  • We should give our ssh keys to Roel so we can log in as hlt_oper and online
  • PrivateTCK page needs updating
  • How do I launch the run control?
  • What happens when some modes get full? They stop accepting events. E.g. old nodes have 2 TB versus 4 TB. Are they 1/2 the speed?
-- MikaVesterinen - 2015-06-04
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2015-11-26 - SaschaStahl
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback