DQM on call F.A.Q.

Here you will find technical and operative well known DQM-Online-related problems/questions and their subsequent detailed answers/procedures for DQM on-call shifters.

GENERAL INFORMATION

What is LHC doing? Which are the beam conditions?

ANSWER:

Here is a page where you can check the status of the LHC, the type of the beam - proton-proton, heavy ions, cosmic, the energy, the beam conditions - stable beam, etc.

http://op-webtools.web.cern.ch/op-webtools/Vistar/vistars.php

What CMS is doing? Which part of the detector are in? What are the Stream Rates? hide

What CMS is doing? Which part of the detector are in? What are the Stream Rates?

ANSWER:

Here are some pages even more informative than the previous one where you can check:

  • The beam conditions.
  • The physics type (proton-proton, heavy ions, cosmic).
  • The energy.
  • The DAQ state.
  • The run number.
  • The subsystems and their status (keep in mind that in order to have histograms in the DQM GUI the DQM subsystem should be included in the current run).

http://cmsonline.cern.ch/portal/page/portal/CMS%20online%20system/DAQ/DAQstatus

https://cmswbm.web.cern.ch/cmswbm/cmsdb/servlet/Page1

https://cmswbm.web.cern.ch/cmswbm/cmsdb/servlet/PageZero

Where to look for the HLT menu definition?

ANSWER:

The full definition of any HLT menu is contained in the ConfDB web application:

http://j2eeps.cern.ch/cms-project-confdb-hltdev/browser/

Please keep in mind that usually production HLT Menus should be looked at in the ORCOFF tab. The main things you should be looking at are:

  • DQM Stream definition and composition in terms of HLT paths: this is, in fact, the stream used to feed all online DQM Applications, with the notable exceptions of the Calibration ones, which use the Calibration stream
  • HLT path names, to be x-checked with a possible selection done at application level.

How to find out which HLT menu we are using for a specific run?

ANSWER:

1. Go to DQM GUI Page:

  • from within CERN network:
    • ssh [USERNAME]@cmsusr -D [PORT] (ie. ssh batinkov@cmsusr -D 12345) // connect to P5 network.
  • from off-CERN:
    • ssh -L 11080:localhost:[PORT] [USERNAME]@lxplus.cern.ch ssh cmsusr -D [PORT]

  • Configure your browser to use SOCKS 5 proxy and tunnel the traffic via the ssh connection from in step 1, where the target port is 12345.

2. DQM GUI page -> Workspace -> Everything -> Info/ProvInfo/hltKey - here is the name of the menu for the current run.

3. Then go to WBM Main Page:

4. WBM Main Page -> RunSummary, enter the run you are interested in.

5. Search for "HLT Key" and the value after it should be the same as the value displayed in the histogram in step 2.

DQM/DQM GUI PROBLEMS

How to check the DQM GUI services and restart them if needed ?

ANSWER:

Keep in mind that if a problem occurs in most cases the DQM GUI is working and the problem comes from somewhere else.

There are actually two DQM GUI servers:

  • The online server running at dqm-C2D07-02
  • The online server at P5 running at dqm-C2D07-01

For the online server you simply needs to do:

1. Visit the link https://cmsweb.cern.ch/dqm/online.

2. If the top frame of the Web page occurs then DQM GUI server is working OK and the problem is somewhere else.

For the online server at P5 you need to perform the following steps:

1. ssh [USERNAME]@cmsusr -D [PORT] [MACHINE] (ie. ssh batinkov@cmsusr -D 12345 cmsusr) // connect to P5 network.

2. Configure your browser to use SOCKS 5 proxy and tunnel the traffic via the ssh connection from in step 1, where the target port is 12345.

3. http://dqm-prod-local:8030/dqm/online/

4. If the frame of the Web page occurs the DQM GUI server is working OK and the problem is somewhere else.

Anyway if you need to restart the DQM GUI server you can do the following:

1. Log in as a dqm user to the dqm machine where the problem with the DQM GUI is: // the machines are (dqm-C2D07-02 and dqm-C2D07-01)

    ssh [MACHINE]

    sudo -u dqm -H bash

2. The following will restart the Web server with the layouts:

    for s in dqm-c2d07-{01,02,11};do 
      ssh $s '/home/dqmlocal/current/config/dqmgui/manage xrestart webserver  "I did read documentation"'
    done
    

How to check if a specific online application is running?

ANSWER:

1. Connect to P5 network opening a SOCKS/Proxy:

ssh -l [USER] -L [PORT1]:localhost:[PORT2] lxplus.cern.ch -t ssh -l [USER] -D [PORT2] cmsusr1

(ie. ssh -l rovere -L 11080:localhost:11081 lxplus.cern.ch -t ssh -l rovere -D 11081 cmsusr1)

2. Connect to any dqm machine:

ssh dqm-c2d07-29

3. Check which application is running on which machine:

  • With this commands the URLs to the applications won't be shown and you will simply get a list of the running applications: for the production machines:

~dqmpro/bin/ExtractAppInfoFromXML -spa ~dqmpro/xml/Production/CurrentConfiguration_524p4.xml  | awk -F "  " '{if (a!=$1){print $1 ":\n    " $2"  "$3;a=$1}else{print " "$2""$3}}'

for the playback machines:

~dqmdev/bin/ExtractAppInfoFromXML -spa ~dqmdev/xml/Integration/CurrentConfiguration_524p4.xml | awk -F "  " '{if (a!=$1){print $1 ":\n    " $2"  "$3;a=$1}else{print " "$2""$3}}'

  • With this commands the URLs to the applications will be displayed:

for the production machines:

~dqmpro/bin/ExtractAppInfoFromXML -spa ~dqmpro/xml/Production/CurrentConfiguration_524p4.xml  | awk '{print $0" URL --> http://"$1":"$2"/urn:xdaq-application:lid=50"}'

for the playback machines:

~dqmdev/bin/ExtractAppInfoFromXML -spa ~dqmdev/xml/Integration/CurrentConfiguration_524p4.xml | awk '{print $0" URL --> http://"$1":"$2"/urn:xdaq-application:lid=50"}'

Note that the above commands won't give you the actual status of the applications, but only will show you a list of all DQM applications and the corresponding machine and port where it should run.

Now when you have the list of the running application (and probably the corresponding URLs) you can check the status of the applications via web browser or via terminal.

4. Via Web browser:

  • Configure your browser to use SOCKS 5 proxy and tunnel the traffic via the ssh connection from in step 1
  • Simply paste the URL of the application you want to check in the browser and you will get the page with the status
  • If you want to restart the application click "halt" and wait, then click "configure" and wait and finally click "start"

  • For quick access, you can also do the following:
    1. Create a new profile "TunnelCMS" for your Firefox as described above (SOCKS5 proxy); listen to port 1081.
    2. In about:config, set network.proxy.socks_remote_dns to true.
    3. Import bookmarks to it from bookmarksCMS.html.
    4. Run the little bash script
    5. Wait for the Firefox to pop up, then enter your online password (might differ from AFS) to the terminal (nothing will happen there, though).
    6. Enter the DQM applications directly via the bookmarks (Ctrl+B to open them in the side bar).
    7. Do, what needs to be sone...
    8. Exit with a simple Ctrl+C to the terminal.

5. Via terminal:

  • Connect to the proper machine - you have the name from the list from step 3

ssh [MACHINENAME]

  • Check if the proper xdaq app is running on the correct port with the commands:

ps -u dqmpro -F

  • You will get a list of all DQM processes running on the current machine
  • check on /tmp/xdaqcjPID#[APP_PID].log if there are problems in the log file with the same APP_PID as the application

6. How to check which configuration we are running for a specific application.

  • Go to the directory with the configuration files:

cd ~dqmpro/prod/src/DQM/Integration/python/test

  • Check for the configuration file of a specific application:

less specific_file.py

  • Look in the specific section (pp, cosmic, HLT) for something like:

process.DQMEventStreamHttpReader.SelectEvents = cms.untracked.PSet(SelectEvents = cms.vstring('HLT_L1TrackerCosmics*', 'HLT_L1SingleMuOpen_AntiBPTX*') )

  • Then check the rate of the selected HLT paths in WBM.

How to remove a blacklist from the DQM histogram?

ANSWER:

1. ssh [USER]@cmsusr -D [PORT]

2. Find out the exact machine:

  • Have a look a the DQM GUI where a blacklisted histogram appears. The name of the machine is in the upper right side of the page.

  • You should see something like:

CMS DQM GUI (dqm-C2D07-02) - for the online

CMS DQM GUI (dqm-C2D07-01) - for the online at P5

3. Connect to the machine:

ssh dqm-c2d07-02 or dqm-c2d07-01

sudo -u dqm -H bash

4. Clean the blacklisted file:

cat /dev/null > /home/dqmlocal/state/dqmgui/online/blacklist.txt

5. Restart the DQM GUI Web server and then check the status:

/home/dqmlocal/current/config/dqmgui/manage xrestart webserver  "I did read documentation"

6.Again check the DQM GUI server to be sure that everything is OK.

How to check the disk usage of the critical machines?

ANSWER:

The critical machines are: dqm-prod-local(dqm-c2d07-01), dqm-prod-offsite(dqm-c2d07-02), dqm-test(dqm-c2d07-12) and dqm-c2d07-11(playback GUI runs here)

The usage of the root directory (/) should no more than 80%, because otherwise the DQM agents might stop working and this will cause a problems with the DQM GUI.

Here are the steps how to check the usage of the machines:

ssh [MACHINE_NAME] 'df -h'

ssh dqm-prod-local 'df -h' - for the online DQM GUI at P5

ssh dqm-prod-offsite 'df -h' - for the online DQM GUI

ssh dqm-test 'df -h' - for the test machine 

ssh dqm-c2d07-11 'df -h' - for the playback GUI machine

-- Main.erosales@cernNOSPAMPLEASE.ch atanas.batinkov@cernNOSPAMPLEASE.ch - 05-Jul-2012

Topic attachments
I Attachment History Action Size Date Who Comment
HTMLhtml bookmarksCMS.html r1 manage 23.1 K 2012-08-03 - 16:24 VolkerAdler  
Unknown file formatext tunnelCMS r1 manage 0.1 K 2012-08-03 - 16:24 VolkerAdler  
Unknown file formatext tunnelRemoteCMS r1 manage 0.2 K 2012-08-03 - 16:24 VolkerAdler  
Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2012-08-03 - VolkerAdler
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback