LHCb RICH Data Quality
This page summarises the efforts of the RICH Data Quality with instructions for the RICH and the RICH DQ piquet.
The RICH DQ piquet can be done from remote institutions outside CERN
As this is a TWiki page, please feel free to add and correct information here...
LHCb is in its start-up phase and instructions will likely evolve with more experience.
As a RICH DQ piquet
Before you start
- Subscribe to the relevant mailing lists
- Make sure you're aware of all meetings detailed below
- You need a valid Grid certificate to access the book-keeping database
- Get accounts for the ShiftDB, eLogs, ticket- and Savannah systems, as well as the online cluster
- If you're working from remote, it is useful to set-up your ssh links and NX clients to log into the online systems
- Familiarise yourself with the
- make sure that you execute the script
/group/rich/sw/scripts/setup.sh
in your login script (e.g. add the line . /group/rich/sw/scripts/setup.sh
to your ~/.bashrc
) on the online PCs and that the CMT directory User_release_area
points to /group/rich/sw/cmtuser
Duties
General
- Follow data-quality related issues for the RICH, contact experts if necessary, liaise with general LHCb DQ shifter
- Attend the general DQ meeting at 2pm (CERN time, weekdays only)
If we're not running in "global"
The duties of the LHCb RICH Data-Quality piquet are aimed towards taking physics data, i.e. the RICH detectors are included in global running
under the
LHCb
partition.
- If the RICH detectors are in "local" mode, experts are usually testing or making dedicated measurements (e.g. ion feedback). In this case, not much needs to be done - but it helps to keep an eye on Camera and the Presenter just to make sure everything looks OK.
- If the RICH detectors are in "global" mode (included in the
LHCb
partition) but there is no beam, the detectors are usually run to "exercise the system" and keep everything alive. Again, not much needs to be done in this case - but keep an eye on CAMERA and the Presenter to spot any issue which might occur as early as possible
Online checks
This part is aimed at assisting the RICH piquet by keeping an eye on the overall detector behaviour during data taking.
These duties only need to be performed if the RICH detectors are running in "global", i.e. the
LHCb
partition.
If the detectors are taken into "local" control (i.e. partitions
RICH
,
RICH1
or
RICH2
) experts are usually
working with them.
- Familiarise yourself with the LHC(b) plans for the day by attending the daily run meeting or reading the minutes, looking at the entries in the General Shift eLog
and the RICH eLog
- Log into the online systems at IP8 and keep a Presenter and CAMERA window open
- Regularly look at the various online Presenter pages
- Keep an eye on the messages in Camera. During a normal run only (green) information messages should appear. Occasionally some (yellow) warning messages could occour, however, they should not persist. No (red) error messages should be on the screen.
- Keep an eye on the =MonitoringMon= and check that the
RichDAQMon
and RichRingMon
keep analysing events
- If necessary, look at a specific run Checking the output of the online monitoring. For example, this should be done for a run taken in unusual conditions (e.g. high number of average collisions) or when a specific run is suspected to be problematic (several disabled HPDs, etc)
Offline checks
This part is based on the Brunel on
DaVinci histograms obtained from the reconstruction of the raw data ("production"). An email should be automatically sent to the mailing list
lhcb-dataquality-shifters
when new histograms are available. The following should be done for both the
FULL
and the
Express
stream:
- If not already done by the central DQ shifter, merge the Brunel and DaVinci histograms from the individual reconstruction jobs into one file with the full statistics (place them in your area to avoid confusion)
- Look at the provided Presenter pages (for the offline DQ part)
- Run the global alignment monitor
The status of the reconstruction of the latest runs (
Express
and
Full
stream), as well as the re-processing or stripping, etc can be monitored from the
Production Monitoring
Dirac
page.
Some remarks:
- You need to be logged into the PCs in the Offline Control Room to be able to connect to the database. The machines are called
pclbocr0N
with 0<n<5
- The instructions for the general LHCb DQ shifter can be found here: Shift Instructions
. They include the instructions how to merge histograms (which should be done by the general DQ shifter) and where the relevant histogram files can be found.
- We don't expect any quick changes, hence it should be sufficient to look only at high-statistics runs unless there are indications of potential issues (e.g. errors from the online monitoring in CAMERA, "odd" features in the online Presenter pages, etc). Low statistics runs will probably not allow much definite statements....
- The
Express
stream is currently being validated, i.e. it is quite likely that this part is not yet fully operational
- The instructions will evolve with time and we need to gain experience with the detector behaviour in realistic running conditions.
In case of issues:
- Make an entry to the RICH eLog
with as many details as possible (e.g. the warning messages from Camera, a screen-shot, etc).
- Send an email to lhcb-rich-dataquality AT cern.ch and lhcb-rich-operations AT cern.ch
- In case of a severe issue affecting the operations and on-going data-taking, try to contact the RICH piquet (e.g. when several HPDs are being disabled, an "odd" stripe-pattern appears in the hitmap displays, etc)
- Use the Savanna Data-Quality issue tracker
or Online ProblemDB
to track any issues and follow them up with the experts.
Mailing list
Subscribe to the mailing lists (...
@cern
.ch):
- lhcb-rich-dataquality
- lhcb-rich-operations
- lhcb-rich-software
- lhcb-data-quality
- lhcb-dataquality-shifters
- lhcb-online-users
- lhcb-online
- lhcb-run-news
Ticket / issues tracking systems
Several systems are being used to track issues, request information or actions, etc.
Please register with all of them. You will also need an account on the online cluster.
ELog
Several electronic log-books are used to keep track of any information relevant to operational issues
Meetings
- Daily run meeting at IP8, 9am (CERN time)
In this meeting the current plan for the day will be discussed by the Run Chief and the sub-detector experts. The News
slides usually give a good overview of the plan for the LHC and LHCb for the day or longer term.
- RICH Operations: Wednesdays, 10:35 (CERN time) in the Rich
section of Indico. All operational aspects of the RICH detector, as well as longer term planning is discussed here.
- RICH Software: Fridays 15:00 (CERN time) in the Rich
section of Indico. All RICH aspects related to software issues are discussed (e.g. Data-Quality, alignment, reconstruction, ...)
- LHCb Data-Quality meeting
, Weekdays 13:30 (CERN time), EVO link
- Online Data-Quality task-force: Wednesdays 10:00 (CERN time) LHCbInternal.DataQuality
section of Indico
Previous Trainings, etc.
Several old training slides are available. Please have a look at them as most information is still valid.
More detailed TWiki pages, web-links etc.
- When working from remote, logging into the online systems can be very slow as the X protocol forwarding e.g. the Presenter or Camera window is very slow. Follow the instructions at Log into online systems
to use the NX client. The NX client software can be obtained from NoMachine
- HistogramDB
. For completeness - it is unlikely you will need to edit the display properties of any histogram. However, here is where you can...
CAMERA
The CAMERA error reporting tool is used to display all information gathered by the online monitoring.
The user-interface showing these messages can be started via
startCameraGui
.
Messages showing some detailed information are presented in green
(e.g. monitoring algorithm started, found a ring , etc), warnings are shown in yellow and errors in red. By clicking on a particular message, more information (if available) is shown.
The monitoring algorithms announce themselves at the end of the initialisation (during the
Configure
phase when a new run is started) and most of them report that they have seen the first event as well. This indicates that they are correctly set up and also receive events from the central buffer
of the monitoring farm.
The example below shows the GUI with an hitmap where two isolated rings have been identified by the trackless ring-finding algorithm.
In case of issues with CAMERA
If you don't see any info message coming to camera in more than one minute do the following checks:
- Check in MonitoringMon that the RICH monitoring algorithms are running and processing data.
- Check in the logviewer whether or not you get messages like:
-
May27-235252[WARN] mona0806: Gaudi.exe(LHCb_MONA0806_RichDAQMon_00): CameraTool: Could not connect to any camera server! -> Aborting message 'CentralTrigger4NHitMonitorLHCb/1/Initialized
-
May27-235252[WARN] mona0806: Gaudi.exe(LHCb_MONA0806_RichDAQMon_00): CameraTool: Above message repeated 5 > times. Aborting further messaging of this type. An issue with the PC hist01 is possible.
- Read the line at the very bottom left of the CAMERA Gui. It must show:
Connected to hist01:45124 , / NOT Connected to
. If instead it shows Connected to , / NOT Connected to: hist01:45124
the Gui lost the connection with the Server. Try starting another Gui.
- If RICH monitoring algorithms are running and processing data, check that the PC
hist01
is active by typing: ssh -X hist01
. If the system hangs without giving back the shell prompt, it means that hist01
is not responding. hist01
is the PC on which the CAMERA servers physically run (together with many other tasks). It happened in the past that it crashed and it has been restarted by hand. CAMERA will be back as soon as hist01
is back, although you need to open a new CAMERA Gui when hist01
resurrect. To make sure the LHCb Online team knows about the unavailability of hist01
, call the Online piquet (or the shift leader in case you don't get any answer from the first).
- If RICH monitoring algorithms are running and processing data AND
hist01
is up then send an email at nicola.mangiafave@gmail.com_NO_SPAM_
with the details of the checks previously made.
Presenter
Starting the Presenter
Online in the control room
The Presenter is the official LHCb tool to look at histograms produced by the various online and offline monitoring algorithms.
The official script to start the presenter is
/group/online/presenter/run_presenter.sh
. We have provided an alias to this
in the online system and the Presenter can be started with
startOfficialMonitor
. Normally, this alias points to the official script,
however, at times it may be better to use an updated version...
For Offline checks
Follow
Step 5
of the offline-DQ procedures.
N.B. Currently the histogram files from the reconstruction jobs (Brunel and
DaVinci) are downloaded and merged
manually by the central DQ shifter. Hence the RICH DQ Piquet can only act once pointed to the relevant
files by the central DQ shifter. Alternatively, use the
hadd
tool provided by
ROOT
and merge the histograms yourself.
Newly downloaded histogram files are announced on =lhcb-dataquality-shifters@cern.ch =.
To start the Presenter (e.g. on the PCs in the offline control room) do the following:
SetupProject Online
presenter.exe -C /afs/cern.ch/lhcb/group/dataquality/ROOT/presenter_1.cfg &
and ignore the warning message about
DIM_DNS_NODE
.
N.B. There is no NX server available offline at the moment. To log in from remote, use the procedures to log into the online cluster and then log back into lxplus
Reference plots
Several of the Presenter pages can be shown together with a reference plot. This allows some judgement whether the current
situation is regarded as "normal". The reference plots can be switched on by clicking on the icon

in the top right part of the Presenter (if available).
N.B. As we're in the early phase of the start-up of the LHCb experiment, we still need some experience what should be regarded as "good" or "bad". Hence the reference plots are taken from some run at 3.5
TeV which didn't show any obvious "oddities" - but judgement should be taken with a grain of salt.
Futhermore, the presenter doesn't currently support different running conditions (yet), e.g. as the reference plot is taken from a 3.5
TeV run, the average hit occupancy might differ at a different beam energy or lower/higher beam intensity.
Online in IP8
The following pages are used in the control room (or remotely) to monitor the current state of the RICH detectors.
Several pages are (being) added to the
Shift
section of the list of available Presenter pages. These pages are mainly for the central Data-Manager shift crew but should also be monitored on a regular basis by the RICH crew.
The two screen-shots below show the overview pages for RICH1 and RICH2. In nominal conditions both pages should look
similar to these examples. The page contains two 2D histograms (one per detector panel) showing a integrated hitmap of the light detected by
the HPDs. Some of the central HPDs have a higher hit-count than the others as they receive more light from the
denser environment in the collision. A 1D histogram shows the inclusive distribution of the number of hits in the recorded events.
The remaining 2D histograms (again, one per detector panel) indicate which HPD (if any) has been disabled by the
online monitoring software due to some issue. These histograms are empty in nominal conditions - if any entry is found,
at least one HPD shows "odd" behaviour which needs to be investigated.
For RICH1:
For RICH2:
Disabled HPDs
In order to prevent any loss time in global data-taking, individual HPDs becoming "upset" (e.g. because of synchronisation issues or other hick-ups) can be disabled by the online monitoring. These "upset" HPDs may e.g. show a large number of HPD pixels permanently on or the data recorded by them
may be inconsistent with the rest of the event, etc. A detailed warning message is sent to CAMERA why a particular HPD has been disabled and the location of these HPDs is shown in a dedicated page in the Presenter. All 4 histograms (one per detector panel) are empty in normal conditions as shown in the example below.
Trackless Rings
A trackless ring finding algorithm runs on the Monitoring Farm based on an Elastic Neural Network. This approach does not rely on input of any other sub-detector and gives a rough performance estimate of the RICH. The presenter page shows the distribution of ring radii (sensitive to refractive index, and hence to temperature and pressure) and the photon yield for rings found in the gas radiators of RICH1 and RICH2.
(N.B. the second peak in the radii distribution in the top left plot is likely an artifact from the cut off of the maximum allowed ring size)
Hits per HPD
Several pages are available in the presenter to cross-check the current occupancy.
This page shows the pull of the expected vs actual number of hits per HPD in each RICH panel. The expected number of hits is calculated from a slowly moving average (over many events) whereas the actual number of hits is given by a fast moving average (over few events). Any large outliers indicate that an HPD does not see the expected number of events consistent with the current luminosity.
This 1D plot contains the same information and shows the number of hits (from the fast moving average) per HPD. The 1D representation has been added as it is easier to compare to a reference plot. The HPDs are arranged as columnNr*NrOfColums+NrOfHPDInColum.
UKL1 with truncated events
The UKL1 boards truncate events if the occupancy gets too high in order to keep the multi-event packet (MEP) below the maximum allowed size required by the readout. The part of the RICH event data that is handled by the affected L1 board may be either partially or wholly truncated. This means that data corresponding to blocks of individual HPDs are dropped while the HPD header information in the data is retained. Once truncation is activated for an event, all remaining data for that event and for any events following in the MEP are discarded for the affected L1 board.
In normal running conditions, this is expected to happen only sporadically when there is a burst of pathologically large events. During injection, however, it appears to be rather common. If the monitoring plot shows any entry, make an entry (with screen-shot) to the eLog, note the machine running condition (e.g. MD, injection, physics data taking) and send an email to the RICH Data-Quality mailing list.
HPD Efficiency
During the long abort gap between the bunch trains the 4 corner pixels of the HPD silicon chip is activated. This special data is
recorded and analysed on the Calibration Farm. Ideally, each pixel activated should be seen in the monitoring process. One might
expect that not each pixel is read out with 100% efficiency at all times - but in general the HPD efficiency should be close to 100%.
However, note that the timing of the HPD is not yet optimised as shown in the picture from run 70681 below. This will be done soon,
until then HPD efficiencies deviating from 100% are not a cause of concern.
Alignment monitoring (online)
The alignment of each of the 4 HPD panels monitored online using
OnlineBrunel. The page shown below shows the output
of the monitoring - in the next step, automated fits and alerts will be added.
(N.B. This plot is based on a n old version of the data-base which does not reflect the latest alignment)
Checking the Online-Monitoring plots offline for a specific run
All histograms produced by the online monitoring are saved which allows to check the output e.g. for a specific run.
The histograms are saved to
/hist/SaveSets
which is accessible from the
plus
cluster in IP8.
Below this directory, the structure is organised by
- year / partition / Monitoring-Task / Month / Day
-
ByRun
/ Monitoring - Task / run-number (1st digit) / run-number (2nd digit)
where again the RICH tasks are
RichDAQMon
,
RichRingMon
and
RichCalibMon
.
The histograms are saved at regular intervals (currently every 10 minutes) which allows to look at a specific part of the run.
If you want to look at the whole run, the following command is helpful (e.g. to look at run 74286 for the trackless rings)
cd /hist/Savesets/2010/LHCb/RichRingMon
hadd ~/MergedHistos_TracklessRings_run74286.root `find . -name RichRingMon*74286*.root`
where
-
cd
will bring you to the directory containing all histograms for the trackless ring monitor (if you know the day when the run was taken this will shorten the search for all files)
-
hadd
is from the ROOT
package and merges a set of files. The syntax is: hadd output.root list-of-input.root
-
find
will start from the current directory (.
) and return a list of all files matching the search criteria.
- note the special quotation marks around the
find
command which returns the output of the command in a way that it can be used as input to hadd
You can then look at the merged histograms in the same way as the
Offline DQ pages described below. Note that you need to load separate files if you want to look at the output of the trackless ring-finder, the low-level checks (e.g. hit-maps and disabled HPDs) or the HPD efficiency (measured by the calibration farm).
Offline Data-Quality pages
The histograms from the reconstruction (Brunel and
DaVinci) should already have been downloaded by the central DQ shifter as detailed in the
Offline DQ Procedures
. All files should be below
/afs/cern.ch/lhcb/group/dataquality/ROOT/Collision10
organised by
- run type, e.g.
Beam3500GeV-VeloClosed-MagDown
- run number, e.g.
70732
- processing, e.g.
Real_Data_RecoStripping-01
- stream, e.g.
90000000
The central DQ shifter will merge all histogram files into one file covering the whole run.
Once this is done, the Presenter pages can be looked at in the following way:
- If not already done, change the Presenter to
History
mode from Tools -> History mode
.
- Then use the drop-down list to select a histogram file to investigate:
- A dialog window will open, choose the merged histogram file from Brunel or DaVinci
- The Pages relevant for data-quality checks are in the
LHCbInternal.DataQuality
section of the list:
The screen-shots below show what the pages look like for a typical run for the Brunel based histograms.
The pages based on
DaVinci will be added soon (as there is currently a new version being deployed).
The same procedure can of course be used for the individual reconstruction jobs (i.e. prior to the histogram merging) with
the caveat that the files will only contain a fraction of the events recorded in the run.
Note:
- As histograms are booked on demand, some plots may not be available due to low statistics. In this case the message
Error missing source...
appears in the histogram title.
- Reference plots are currently being prepared and the pages will likely be updated in this early phase of running.
RICH Data-Quality Page 1: Occupancy and Decoding (Brunel)
RICH Data-Quality Page 2: Long tracks selection efficiency (Brunel)
RICH Data-Quality Page 3: Trackless rings (Brunel)
RICH Data-Quality Page 4: Photon reconstruction with tracks (Brunel)
RICH Data-Quality Page 5: PID Monitoring with Ks0 (DaVinci)
RICH Data-Quality Page 6: PID Monitoring with Lambdas (DaVinci)
RICH Data-Quality Page 7: PID Monitoring with D* (DaVinci)
RICH Data-Quality Page 8: PID Monitoring with J/psi (DaVinci)
Alignment monitoring (offline)
The procedure to verify the global RICH alignment is described in detail in the
Alignment TWiki
The global alignment should be verified regularly for both
RICH1
and
RICH2
. We don't expect the alignment that the alignment will change quickly or often. Therefore it should not be necessary to verify it for each run. Furthermore, analysing low statistics run will probably not be beneficial. As a first step (until we have more experience) it should be sufficient to analyse a high statistics run every day or every other day.
Follow the following steps:
- Identify a high-statistics run to analyse
- If not already available, merge all
Brunel
histogram files from this run using hadd
(from the ROOT
package) and place it into your area
- Follow the TWiki to download and run the latest version of the alignment check for both
RICH1
and RICH2
- If either of the fit parameters
SinAmp
or CosAmp
(significantly) exceeds 0.001 rad
, do the following:
- post a comment to the RICH eLog
- send an email to the email lists lhcb-rich-dataquality AT cern.ch and lhcb-rich-operations AT cern.ch
- add an entry to the Online ProblemDB
Tools in the online environment
Several tools exist to check that the online monitoring is working
The
MonitoringMon monitors the event buffers used by the various monitoring algorithms and can be started via
/group/online/dataflow/scripts/monitoringMon
. The buffers "Events_LHCb" read the raw data from the detector and deliver it to the monitoring tasks which then perform the quality checks. The monitoring tasks for the RICH are
RichDAQMon
and
RichRingMon
.
RichDAQMon
runs the monitors checking the data integrity, number of hits and hitmaps, whereas
RichRingMon
runs the trackless
ring-finding algorithm. Both tasks should consume a significant fraction of the data, well above 80%. The task
RichRingMon
is a little slower
than
RichDAQMon
and will hence consume fewer events as each algorithm works on a best-effort basis.
The error-logger is the central tool to keep an eye on the various monitoring tasks and can be started via = /group/online/dataflow/scripts/errorLog LHCb=
for the
LHCb
partition (for other partitions, replace the partition name in the argument). All output from all tasks (HLT, monitoring, reconstruction) is redirected to this central tool - hence it can be a bit overwhelming.... To restrict the output, use the cursor-keys to navigate around and disable the parts you are not interested in. The key players for the RICH are
- MONA08: Online Monitoring
- CALD07: Calibration Farm
- MONA09: Online reconstruction (OnlineBrunel)
The screenshot shows the 3 windows opened by the error logger. The actual messages are shown in the big window. The window on the right is used to control the error logger (enable / disable seeing messages from various farms, change the verbosity level from
Warning
to
Info
, etc). A third window (hidden here) is used for the history (not used).
Log files
The log files preserving the messages shown in the error logger are written to
/clusterlogs/partitions/< partition >/daq
where
< partition >
is the partition of interest such as
LHCb
or
RICH
Useful hints
Get information about a specific run
kerzel@pchy 0 $ SetupProject LHCbDirac
kerzel@pchy 0 $ dirac-bookkeeping-run-informations 70768
Run Informations:
Run Start: 2010-04-26 17:56:00
Run End: 2010-04-26 17:57:00
Configuration Name: VELO
Configuration Version: NZS
FillNumber: 0
Data taking description: BeamOff-VeloOpen-MagOff
Processing pass: Real Data
Stream: [90000001]
FullStat: [9671] Total: 9671
Number of events: [9671] Total: 9671
Number of file: [3] Total: 3
File size: [2929765916] Total: 2929765916
Per-event display of RICH hits
- Can be done using Panoramix
- Instructions from Thomas Ruf:
There is a framework for this implemented. You need to define your
favored view, page layout, with whatever you want to display. This
should be done with a python script. This python script gets registered
in a dictionary
onlineViews[title] = 'Rich_2dView'
Then, the nextEvent loop script checks if for the current page title
there exists an associated python script, which is then executed for the
next event.
All this is in place. Under Online, there is a first version for Rich
display. See also Rich_viewer.py in the Panoramix scripts/Python
directory.
--
UlrichKerzel - 11 Jul 2008