The HLT piquet guide

First Steps

If you are a new HLT piquet shifter, read setup instructions here. Some of the steps listed should be followed before the first shift.

General Information and useful telephone numbers

  • Piquet assignment: Please declare your (non) availability to the LHCb shift database at: https://lbshiftdb.cern.ch/login.php. Piquet duty is for one week, starting Monday afternoon at ~14.00 - please arrange the handover with the next piquet. The piquet assignments are also tabulated on this page.
  • Phones: HLT piquet phone: 16-1859 (from outside CERN: +41 76 487-1859). See this page for instructions on forwarding the HLT piquet phone to your CERN phone. Don't forget to reset the forwarding when handing over to the new piquet. Expert phone numbers are found here under General Information.
  • Mailing lists: All piquets should be signed up to lhcb-hlt-operations@cernNOSPAMPLEASE.ch mailing list: https://e-groups.cern.ch/e-groups/Egroup.do?egroupName=lhcb-hlt-operations, and lhcb-hlt-piquet.
  • Meetings and reports: All piquets should attend the HLT operations meeting at 15:00 on Friday to be aware of the current situation of the trigger in the pit. They should attend and report at the daily run meetings, and a 2-3 slide summary will be requested by Silvia for the Tuesday meeting report.

Tasks to do every day

  • Every day, inform yourself of what has been happening in the pit recently by consulting the shift logbook: http://lblogbook.cern.ch/Shift
  • Check the daily Trigger operations plots at http://lbtriggerreport.cern.ch/reports/
  • Attend the run meetings on Tuesday and Friday at 9:30 in the pit.
  • Write a daily piquet report for the Run Chief in the HLT logbook before 9.30
  • Check what configuration we are running : from the LHCb run control panel, click on the button marked "View..." above Trigger Config:, or ask the shift leader
  • Check the HLT histograms with the presenter (including rates)
  • Check the error logger of Hlt1 and Hlt2 for warnings and errors ( error logger). If you see a warning which is not listed in the known problems JIRA (old known problems page), contact the author of the HLT line in question to verify if it is harmless. If it is, add it to the known problems to be ignored page and, if very frequent, mask it in the error logger.
  • If you are called to solve a problem and you don't know how to solve it after 5 mins, call another expert (HLT colleague, online piquet, or Rosen (16-1378) )
  • Check the problem database, HLT section to see if there are any open HLT problems; it is the responsability of the piquet to follow up any problem that occurs during the week, and to make sure it is closed when the problem is fixed.
  • Check the rate when in HLT2_EOF. If the rate is low (~14-15kHz expected) call the online piquet.
  • Make a note for everything that occurs during the week in the HLT elog-book: http://lblogbook.cern.ch/HLT+Trigger/ (register at same url if first time).
  • Make a note of known problems in the known problems page.
  • You should also update this page, or other trigger pages if documentation is missing.

What to do when going into or out of a technical stop?

  • We will almost always make a new release of Moore after technical stops. There should be a corresponding JIRA task for the new Moore release and its corresponding TCKs. The HLT piquet should be carefully following this, and will be responsible for producing the TCKs.
  • Turn off the automatic starting of HLT2 runs in BigBrother for new runs.
  • When coming out of a technical stop : be ready to verify that the new TCKs work at the first fill, and perform the HLT2 checklist to ensure that we are ready to run HLT2
  • Once the HLT2 checklist is completed, give the green light for HLT2 processing and turn on the automatic starting of HLT2 runs in BigBrother.

Which monitoring histograms should I look at?

In the presenter (when you are in the control room: on the console to the right of the operator's console) open the Shift: HLT pages 1-8. To start the presenter from the hlt_shift account, run the command ./presenter.sh, or from plus /group/online/presenter/presenter.sh

  • check that the histograms are updated correctly (choose the partition and click on the green triangle if necessary). If not: tell the shift leader to call the online piquet.
  • Page 1: L0 rates
  • Page 2: Hlt1 rates
  • Page 3: Hlt2 rates
  • Page 4: System info
  • Page 5: Mass plots
  • Page 6: Online conditions
  • Page 7: Hlt trigger rates
  • Page 8: Routing bits

How to install new versions of Moore and new TCKs at the pit?

Read this page. The procedure should only be carried out after discussion in the run meeting with agreement from the runchief and the Hlt group

How to create/change the DB tags used by the HLT?

  • In case the database tags (snapshot) need to be updated follow the instructions here here.

What to do if HLT goes into ERROR?

At the start of run:

  • when LHCb goes into ERROR after CONFIGURE, do RECOVER then RESET then CONFIGURE
  • when LHCb goes into NOT_READY after CONFIGURE, due to the HLT going into NOT_READY, first RESET the HLT, then CONFIGURE
  • if RESET is not done under these circumstances, the histograms (in the presenter) and the rates (in the rate-presenter) will not be correct
  • if HLT keeps on going into ERROR, click on HLT, and on the unit in ERROR on each subsequent panel until you know what caused the ERROR:
    1. The Moore tasks go into ERROR (HLT->PARTxx->HLTnxx->HLTnxxyy->TRGnxxyy->HLTnxxyy_GauchoJob_m).
      1. If we are in PassThrough mode, contact the online piquet. During PassThrough, Moore is not run, but a dummy program is run that is maintained by the online group. So any error in this mode of running can not be caused by the HLT.
      2. Look at the messages in the error logger.
      3. If some files are not found, it could be a problem with nfs. Contact the online piquet.
      4. If you can see there is a crash in the error logger, there could be a problem with Moore. This is very unlikely as normally this would be seen at the installation of Moore. You could go back to a previous version, or contact Eric.
    2. If the problem is not with Moore, contact the Online piquet.

What to do if the histograms are empty?

  • The histograms and rates come from the top-level Adders
  • Check if the top-level Adder is included in the partition. The shift leader should be able to do this, if not call Eric (or continue the procedure below with did). If it is not included, it should be included by clicking on the red cross next to it in PVSS.
  • If there are no rate histograms (trends) the Saver could not be running. The shift leader should be able to fix this.
  • If there are no histograms:
    1. This is not a serious problem as far as data taking is concerned, however, it means we are running blind.
    2. ssh to mona08 and type did. If did immediately exits, there is a problem with the dns. Send an email to lbonsupp requesting them to restart dnsd on mona08.
    3. In did, do view->servers by node->hlt01.lbdaq.cern.ch. Now you should see PARTxx_Adder_1. Then click on it, choose services. Now you should see the list of dim services with histograms.
  • In case of no histograms the PartAdder can be reset (LHCb top -> HLT -> PART01 -> PART01_AdderEtc).The reset can be done anytime, no need to wait for the end of the run.
  • The mass plots, rates per pp int, chi^2/dof plots are empty:
    1. This could be because the HLT is using a TCK that is unkown to Vandermeer. To fix, edit /group/hlt/VANDERMEER/VANDERMEER_vxry/Monitor/CommonMonitor/job/SetupVanDerMeer_v5r0.sh, and modify the definition of the HLTTCKROOT variable (to point to the place where the latest TCK can be found, e.g. /group/hlt/MOORE/Moore_vxry/TCK/HltTCK )

How to troubleshoot Hlt2 (HLT2 Piquet Guide)?

  • Please refer to this page
  • Description of how to start and stop Hlt2
  • mark runs good to process or change alignment versions
  • The page also shows instructions how to change the number of Hlt1 and Hlt2 processes.

How to interpret the L0 and HLT rates on the run control panel?

  • The L0 rate dial on the run control panel gives the L0 rate. Next to it is given the HLT rate. In PassThrough mode, these should be the same. If there is a discrepancy, the difference can be calculated by clicking on the TFC Control button. In particular, Calibration A, B, Random (otherwise known as "ODIN Technical"), Lumi should be added to L0DU (o.k.a. "ODIN Physics").

How to adjust the number of Hlt1 and Hlt2 processes?

The number of Hlt1 and Hlt2 processed can be changed to optimize the farm load and balancing of slow, medium, fast and faster nodes. A table which shows which node is of which type can be found on plus at `/group/online/ecs/FarmNodes.txt`.

  • Go to BigBrother
  • Click on Settings
  • Click on DataFlow scenarios
On the left you see the settings for Hlt1, on the right for Hlt2. You can choose between three different scenarios
  • Physics: Applied during data taking. There have to be enough Hlt1 processes to not create dead time from the farm. In the LHCb top panel, you can click on the dead time bar. Look for MEP request credit. The throttle time should be 0.
  • EOF: Applied out of fill. The Hlt2 throughput should be maximised in this configuration.
  • Ramp: Applied during RAMP. Hlt1 processes are created to be ready for data taking, number of Hlt2 processes is slightly reduced.
You can see the number of Moore1 and Moore2 tasks if you click on the symbol with a graph in the Hlt2 part of the LCHb/LHCb2 top panel. The bottom right plot shows the number of tasks.
  • To change the number of processes, change a number, click enter. You can then Apply the scenario. If you click without saving, the counter jumps back to the original value but the scenario is still applied.

When significantly changing the number of HLT1 processes, and the result is not what you expect, the reason is most likely the competition for resources between HLT1 and HLT2 processes. To get to the target rates, follow this recipe:

  • Continuously look at the dead time from the farm while optimizing the number of processes. The farm dead time is shown when you click on the Dead Time bar in the Run Control and then under MEP Request Credit. To avoid dead time, the current credit should be above 2000.
  • Make an initial estimate of the number of processes wanted on the different categories of nodes;
  • Reduce HLT2 such that the total number of processes (HLT1 + HLT2) is somewhat below the number of logical cores on the machine (24 for slow, 32 for medium and fast and 40 for faster);
  • Check the HLT1 rates;
  • Iterate on the number of HLT1 processes if needed, keeping the number of HLT1 + HLT2 processes below the number of logical cores per type of node;
  • Once the desired HLT1 rates have been achieved, look at the busy/idle percentages. To do this ssh into a node of that type and check the idling with `top`. Then increase the number of HLT2 processes again until you have reached 28 or 29 on the slow nodes, 38 on the medium and fast nodes and 47 on the faster nodes;
  • While adding HLT2 processes keep an eye on the HLT1 rates and add HLT1 processes as necessary to get back to the desired rates;
  • If you're close to where you want to be, but there is still a bit of deadtime, the medium and faster nodes can be used to soak up a bit extra rate by reducing HLT2 and/or increasing HLT1. They can be retuned most easily later.

To plot the disc occupancy for the node types:

  • log into a plus node as hlt_oper
  • do:
    cd /group/hlt/balancing/
    lb-run MooreOnline latest python filling.py
    lb-run MooreOnline latest python plot_filling.py
  • The plot is saved in your working directory and is stamped with the date and time.

How to change monitoring histograms in the presenter?

The instructions are found at presenter. You have to ask someone for the password to login to the webinterface of the histogram db as HIST_WRITER. Only the HIST_WRITER account is allowed to change histograms. Then go to View pages to edit.

How to calculate the HLT rates?

  • Setup a Moore Online environment on plus. You can use the installation in the satellite area by sourcing setupMoore.sh:
    . /group/hlt/sattelite/MooreOnlinePit_vXrY/InstallArea/x86_64-slc6-gcc48-opt/setupMoore.sh
  • Or setup Moore environment, and getpack the heads of Online/Hlt2Monitoring and Online/ZeroMQ.
  • Then get the monitor.py script from Hlt/HltMonitoring
    getpack Hlt/HltMonitoring
    cd Hlt/HltMonitoring/scripts
  • To calculate HLT1 rates, do
    ./monitor.py -s nodes --nprocesses=10 --nfiles=HOWEVER_MANY_FILES_YOU_WANT --debug Rate:Mass THE_RELEVANT_RUN_NUMBER
    • This picks up HLT1 processed files on the farm, and loops over the dec reports to fill histograms with the relevant rates.
    • It can only be run before HLT2 has completed (i.e. the files produced by HLT1 have to be still available).
  • To calculate HLT2 rates, you should run from the daqarea
    ./monitor.py -s daqarea --stream FULL --nprocesses=10 --nfiles=HOWEVER_MANY_FILES_YOU_WANT --debug Rate:Mass THE_RELEVANT_RUN_NUMBER

How to calculate the farm node idle rates?

You can either use the 'top' command line utility or this script. The latter will print the average idle time, in percent, for each category of farm node (currently "Slow", "Medium", "Fast", and "Faster"), computed over some random sample of nodes from within each category.

How to inspect raw files with LHCbApp?

See this page HltRawExamples

How to change the ODIN no bias rate?

In case we want to take some extra ODIN no bias data, the no bias rate can be changed in the run control.

  • Click on RunInfo, then go to Create/Edit under Trigger Configurations.
  • Select the Physics configuration.
  • In the panel you can change the NoBias trigger rate, click save after changing.ratesrates
  • To take effect, stop the run, and reload the Trigger Config (Select it and click on it again).
Be aware that the no bias rate is only prescaled in Hlt2 and every increase of rate will also go offline. To save the data on the local disks in the farm, the BWDivision Alignment writer has to be enabled.
  • Go to RunInfo -> Alignment, View/Edit.
  • Enter the number of events to collect and check the box on the right.
The Alignment writer will stop writing when the number of events is reached. The NoBias rate will not change.

How to silentceWarnings?

For example, you want to silence the following warning:

 EcalShareForHlt.PhotonShowerOverlap: CaloShowerOverlapTool:: The WARNING message is suppressed : ' E,X and Y of cluster could not be evaluated!'  

To do so, the name of algorithm (r".*EcalShareForHlt.PhotonShowerOverlap") needs to be added in the list of silent algs in

/group/hlt/sattelite/MooreOnlinePit_v26r5/MooreScripts/python/MooreScripts/silent.py

(don't forget to put a comment about why this particular warning can be masked)

Before adding the regex name to the list, check that it will only affect the OutputLevel of the algorithm that you need:

rates
lb-run Moore/latest iTCKsh 

fully_qualified_name_regex = ... # for instance r".*EcalShareForHlt.PhotonShowerOverlap"
property_regex = 'OutputLevel'
tck = ...
# The $ is needed so that we emulate the behaviour of boost::regex_match
getProperties(tck, fully_qualified_name_regex + '$', property_regex) 
# Check that output match to what you want
After adding new algorithm(s) to the list of silent algs, do
cd /group/hlt/sattelite/MooreOnlinePit_vXYrZ
make install
 

Hlt1/2 needs to be reset to take into account the changes

Alignment and Calibration FAQ

Possible questions to the Hlt piquet regarding the online alignment and calibration.

Generally useful pages

Edit | Attach | Watch | Print version | History: r100 < r99 < r98 < r97 < r96 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r100 - 2018-09-12 - ConstantinWeisser1
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback