-- ParasNaik - 2017-05-16 -- ClaireProuve - 2015-07-18

Running the Online RICH mirror alignment

How to run the Alignment from the PVSS panel

Before running

XML-file to start with

If you want to start from a specific xml-file (i.e a specific mirror alignment) you need to copy it to the right location. Just follow the instructions here.

[Paras will remove later, as this is no longer needed, but first a backup may need to be made] The alignments that were put into the database should also be kept at this location: /group/rich/AlignmentFiles/databaseAlignments, where you can pick them up whenever you want to.

Check Configuration file

There is only one configuration file, Configuration.py. If you want to use the same configuration as used in a previous alignment, since 2017 you can just go into that alignment's directory and retrieve the version of Configuration.py used. Of course, if you wish to repeat the alignment, you need to make sure that new Configurables have not been added since!

Since the Configuration file in the master branch of Panoptes is the Configuration file, where Configuration.py exists we have created a subdirectory called SavedConfigurations , that contains .txt versions of various configurations that you can write over Configuration.py before you start your alignments. Please read README.txt. Always create new configurations in the SavedConfigurations directory, update SavedConfigurations/README.txt, then overwrite Configuration.py with your new configuration.

It is necessary to check if all parameters (especially starting-iteration, though we almost always start from Iteration 0) are set correctly. Make sure all your variables are set consistently.

You can read more about the Configuration file.

NOTE after changing the Configuration file locally, you have to recompile the code (including a git fetch Panoptes and a git lb-checkout your branch of Rich/RichMirrorAlignmentOnline first) for the changes to be implemented in the next mirror alignment.

Magnification factors

You can chose between calculating the magnification factors on-the-fly for each iteration or using predetermined ones. In order to chose you have to modify the value of magnifCoeffMode in the Configuration file.

Find out more about using predetermined Magnification factors.

Magnification factors on-the-fly

Set the value of magnifCoeffMode in the Configuration file to 2.

Predetermined magnification factors

Set the value of magnifCoeffMode in the Configuration file to 0.
Then you have to provide the files containing the predetermined magnification factors in a directory and point towards it in magnifDir in the Configuration file.

Creating a new directory for predetermined magnification factors

Choices for fixed magnification factors should always be stored in
/group/rich/AlignmentFiles/MagnifFactors/Rich1/ and /group/rich/AlignmentFiles/MagnifFactors/Rich2/ and its subdirectories.

If you want to use a new set of magnification factors, make sure that the files have the same names as those that can currently be found within the substructure of those directories.

Say you determine new magnification factors for RICH1, by allowing them to float in a test alignment. If you want to install them, follow the following procedure on a plus machine (substituting your alignment name for 20170612_004007, and the appropriate nameString instead of "_Mp6Wi8.0Fm5Mm2Sm0_online_Collision17"):

  • cd /group/rich/AlignmentFiles/MagnifFactors/Rich1/
  • ls /group/online/AligWork/MirrorAlignments/Rich1/20170612_004007
    • Look at the last iteration number and remember it.
  • mkdir 20170612_004007
  • cd 20170612_004007
  • cp /group/online/AligWork/MirrorAlignments/Rich1/20170612_004007/summary.txt .
  • (substituting your alignment number for _i4)
    • cp /group/online/AligWork/MirrorAlignments/Rich1/20170612_004007/Rich1MirrMagnFactors*_i4.txt .
    • for file in Rich1*; do mv "$file" ${file//_i4/_predefined}; done
  • for file in Rich1*; do mv "$file" ${file//_Mp6Wi8.0Fm5Mm2Sm0_online_Collision17/}; done
  • rm Rich1MirrMagnFactors_predefined.txt
  • change the magnifDir in the Configuration file

For RICH2 of course you want to substitute "Rich2" for "Rich1" in the above procedure.


You will need to look at two different errorloggers:

1. For the analyzers:
errorLog LHCbA OR in a separate terminal window less +F --follow-name  /clusterlogs/partitions/LHCbA/daq/LHCbA.log

If using the errorLog, you can change the output level in the little settings window: navigate (with the arrow keys) up to "Severity for messages" and cycle though the options with the ">" key-combination. When you have found the one you want simply hit "enter". This can be done at any time during the running (the message window might need a while to catch up with the command though).

2. For the iterator:
ssh -Y hlt02
source /group/online/dataflow/scripts/shell_macros.sh
errlog -m hlt02

NOTE: here you want the errlog command and NOT the errorLog command!!!

This panel is a bit moody and will sometimes not output the messages at the right time. It might take a while, or wait for the next iteration or wait til the program finished.

Alternatively, in a separate terminal window less +F --follow-name  /group/rich/AlignmentFiles/Logging/Rich1_hlt02.log should track the RICH1 mirror alignment when it is running, and pause otherwise. Change 'Rich1' to 'Rich2' for RICH2.

Run the mirror alignment

0. Create an FSM launch script on ui. You only need to do this once (unless there is a new version of WinCC)

ssh -Y ui

then create a new file called fsm.sh and put the following into it:

if [[ "$WCCOA_DIR" =~ "WinCC_OA/3.11" ]]; then
    echo "Starting the FSM on WinCC-OA 3.11"
elif [[ "$WCCOA_DIR" =~ "WinCC_OA/3.15" ]]; then
    echo "Starting the FSM on WinCC-OA 3.15"
elif [[ "$WCCOA_DIR" =~ "WinCC_OA/3.16" ]]; then
    echo "Starting the FSM on WinCC-OA 3.16"
    echo "You need to edit fsm.sh and add functionality for this WinCC version:"
    echo $WCCOA_DIR

1. Open the panel from an ui or plus machine:

ssh -Y ui (if you haven't already)

source fsm.sh

2. Right-click on LHCb_Align when the FSM panel shows up

3. Take the partition: Only take the partition if no one else is using the alignment!

click "take" and then wait for a smaller new panel to show up (this may take a while, just be patient). In the new panel just click "*dismiss*".

When you have taken the partition, a complete loss of internet connection will log you off the FSM panel.

During data taking, you will not need to take the partition. The partition will be set up at the pit. The bad news is now you effectively share the partition with everyone else. The good news is if you lose internet connection, your alignments will still run (unless one of the automated alignments overrides your alignment, automated alignments have priority).

4. Choose your alignment: select the activity from the menu on the right of the Run Info panel. This will always be Alignment|Rich1 or Alignment|Rich2.

5. If the state is NOT ALLOCATED, then Allocate:

6. Reserve alignment farm: Put your name into the panel and click the button to reserve the alignment. This is so other people know you are there and that they shouldn't touch anything until you are done. Also, if they need the alignment they can contact you. During commissioning periods we use a TimeTable to let everyone know when we have booked the farm (the link may need to be updated each year).

7. Select runs: Select the run range to use in the alignment by clicking on "Choose runs for alignment". Note that the fill numbers are also displayed for your convenience. The runs to use are the first ones of each fill (if they have enough events), unless the event count is being cut off (currently 3M) in which case use all of them in a fill.

Information about all runs and fills can be found in the run database.

Make sure to pick "Rich" from the list at the top! Now select the runs you want to run over and click "Ok". The number of events for each run listed in this window is the combination of events provided from the RICH1 and RICH2 mirror HLT lines. It is not quite the perfect mix, right now it is 57% RICH1 and 43% RICH2, but with substantial error bars [we aimed for 50-50, so not so bad]). To get the number of events actually processed, use the numTriggers.C script or just look in AlignmentView.

8. Verify / select farms to run on: You don't have to do this usually. However to see check the status of the HLT subfarms, click on "HLT" which will open up an overview of the HLT subfarms and nodes.

The selection-panel on the left show the included/excluded nodes and subfarms. In order to include/exclude subfarms/nodes select them in the panels and then click the button with the error pointing toward the left/right. Then also click the "Include"/"Remove" button. This might take a while, just be patient.

The subfarms marked in red are not working right now (for whatever reasons) and can and shall not be included.

When all available nodes are included their number should be between 1200 and 1800. If it is less than 1200, you need to DEALLOCATE and re-*ALLOCATE* the partition. This almost always solves the problem, and can also solve other problems with the mirror alignment (e.g. if LHCbA gets into a bad state and nothing seems to work exactly right). If the problem persists go and complain to an Online piquet (or expert [Beat Jost, Clara Gaspar, Markus Frank]) and/or the Alignment Piquet. See the ShiftDB for who's on shift.

9. Look at the status of the subfarms and nodes: Also something you don't have to do usually. If need be though, click the "PARTAlign" button in the HLT panel.

Another panel will open that will show you the state of the subfarms. By clicking on a certain subfarm you will get the state of its individual nodes.

Choose steps 10a AND 11a if an alignment has not been performed in a while on the farm. Otherwise try step 10-11b.

10a. Start alignment by configuring: The configuring process might take a couple of minutes and some nodes might take significantly longer than others. That's just normal. If one node seems crazy slow or fails you can remove it as shown in step 7.

There is a time-limit set of how long the configuring is allowed to take (6-8 minutes). It is possible this will take longer, and if nodes take longer their status will appear as ERROR, but other nodes in the HLT panel will already be READY. Give it time and verify in the LHCbA errorlogger that something is still happening. If some node(s) take(s) too long or seems stuck for some reason, just remove it.

Do not change the status of things anywhere else than in this specific panel (unless you are an expert). Don't give commands to subfarms or nodes directly! (But also don't worry, it's very unlikely a DEALLOCATE and ALLOCATE won't fix it)

11a. Start the run: From the same dropdown panel select "start run".

The "RunInfo" button on the left will change to "running". Wait a few seconds and see what happens to the "HLT" button. Sometimes it goes straight to running and sometimes it will be back at "Ready". In that case just select "start run" again.

10-11b. Autopilot: In lieu of the two steps (or three, if the farm is not allocated yet), if the runs and alignment are selected you can run the alignment in one step by switching the AUTOPILOT from OFF to ON. However if there is a bug in the code or some other problem this may cause weird loops to occur; if you see this happening switch AUTOPILOT from ON to OFF. And then try to diagnose and repair the problem following the guidelines below, and next time don't use the AUTOPILOT.

12. Now lean back and relax wink

If you want to you can go into the working directory and see the files being written, or watch the log files scroll by =)

13. If the alignment is successful, everything goes into READY. However once this happens, it is very important that you execute a RESET command as soon as possible after the alignment goes into READY. We have no idea why this is not done automatically, but since it is not, you have to do it.

14a. If and only if you want to perform another alignment, you MUST RECONFIGURE by following these steps:

  • You should have already done this, but if not: The alignment will have gone into READY, click on READY at the top and pick RESET
  • Wait until the alignment goes into NOT READY
  • Now do whatever you need to do to prepare for the next alignment: it may be changing run numbers, changing configuration and/or or changing the starting XML file (REMEMBER YOU MUST COMPILE if you change any of the code, including the Configuration)
  • Steps 10-11a or 10-11b

14b. If you want to stop doing mirrror alignments, send a RESET, and free the alignment farm again

  • You should have already done this, but if not: The alignment will have gone into READY, click on READY at the top and pick RESET
  • DEALLOCATE the farm
  • If you took the farm, then you should click on the lock and release the farm.

Monitoring the procedure

The results of each iteration is saved, this includes XML files and histograms produced at the end of each iteration (more details below).

For the moment the monitoring should be performed manually looking at the output of the fits, especially looking at the plots!!! See AlignmentView and you should be familiar with all of the information on LHCbRichMirrorAlignShiftInfo to help you evaluate an alignment.

The output of each perfomed alignment will be saved under /group/online/AligWork/MirrorAlignments/Rich' + str(whichRich)/{YYMMDD_hhmmss} .

Xml files location

  • The xml files produced in the job you are running will be saved to /group/online/AligWork/MirrorAlignments/Rich' + str(whichRich)/{YYMMDD_hhmmss}
  • The xml file the alignment starts with is picked up from /group/online/alignment/Rich' + str(whichRich)+ '/MirrorAlign/

Histograms location

The histograms are produced automatically when running the alignment. We have one root file for each iteration, they are at: the histograms are at /hist/Savesets/2015/LHCbA/ Nomenclature convention: e.g. /hist/Savesets/2015/LHCbA/AligWork_Rich1/07/10/AligWrk_Rich1-1569160001-20150710T100630-EOR.root

  • Rich1 is the name of the activity (Rich1, Rich2, Muon, Tracker or Velo)
  • 156916 is the first run number in the run list
  • 0001 is the number of iteration
  • 20150710T100630 is the time when the file has been generated.
The histograms will we copied into the work directory for fitting etc. All histograms will also be saved at the end of the alignment procedure to /group/online/AligWork/MirrorAlignments/Rich' + str(whichRich)/{YYMMDD_hhmmss}

To run root on plus: lb-run root root.exe FILENAME.root

Known issues and workarounds

Everything takes ages. That's normal smile

Actual errors

When taking the LHCbA portion, get a warning that says that PARTAlign is owned by ECS:LHCbARunControl (or someone else).

If you have to take LHCbA, and you get a warning like this chances are it LHCbA taken by someone else. Typically this partition should not be taken by someone else unless they are using it or the alignment is in automatic mode (otherwise you should have been warned that LHCbA is occupied and you shouldn't use it).

If none of the subfarms are able to be included and PARTAlign remains OFFLINE, you should try to force exclude PARTAlign, deallocate everything and then you can try to allocate yourself LHCbA from top level. To force exclude, you can right click on the lock to go into expert mode and then select the option.

Alignment goes into state “READY,” but the alignment hasn’t converged (ALWAYS CHECK! You should get an email if it converged, and in the ELOG there should be a summary report)

  • In this case you probably follow Step 10a but not Step 10b. Just click on “start run" again (no reconfiguring, etc.!!!)

PARTAlign_Master READY, nodes READY

It is possible that the job is stuck with the PARTAlign_Master in RUNNING and the various nodes (e.g. HLTA05_A) in READY.
In this case select again START_RUN (as mentioned above). This can happen after configuring but also between iterations.

Only few nodes included

If only few nodes are in the "Included Nodes and Removed Nodes", try to DEALLOCATE and then ALLOCATE again.

HLT goes into state error: Check whether it is all nodes or just one

  • If its all nodes then there is an actual error in the code, try checking the error loggers.

  • If only individual nodes are in error it could be
    • Time-out during configuring
      • give it time ~20 min
      • There is a time-limit set of how long the configuring is allowed to take.
      • It is possible to take longer and if nodes take longer their status will appear as error.
      • Give it time and verify in the errorlogger that something is still happening.
    • a crushed node during running
      • In LHCbA_HLT: TOP
        • Select the node (e.g. HLTE1028_A)
        • Click “arrow” to exclude area
        • Click “remove” to exclude (might take a while, just wait)

Nodes going into error while running

Sometimes a node goes into error state without apparent reason. This will prevent the others to get to the next iteration. Just exclude the node and click "start run" again.

Red herrings

something with Options "../../jobs " something ==> a problem in your python-configurations

  • use tail -n 20 /clusterlogs/partitions/LHCbA/daq/LHCbA.log to see more details of the error (can increase the number 20 if need be)

something complaining about some Configuration.py file in the OnlineDev something something directory ==> something in our own Configuration file is wrong

Specific errors

In LHCbA log, or when running on a single node, if you see Aug29-182253[ERROR]hltc0507: CTRL(HLTC0507_A_Controller): start: FAILED Summary:FINISHED:1 TIMEOUT:1 NodeAdder_0 ..... you may want to mention it to Beat and Clara, but it is usually not a problem

In LHCbA log, or when running on a single node, if you see something like Aug29-165429[ERROR]hltc0222: GaudiOnlineExe.exe(LHCbA_HLTC0222_AligWrk_0): RawDecoder: Rich::DAQ::RawDataFormatTool:: 'Unknown Level0 hardware ID 181' | L1HardID=10 Ingress=1 Input=1 [DecodeRawRichHLT] StatusCode=FAILURE This means that you are using the wrong DB tags.

In LHCbA log, or in a single node if you see Aug30-144215[WARN] hlt02: cp: cannot stat ‘/group/online/dataflow/options/LHCbA/HLT/LHCbA_HLT01_HLT.py’: No such file or directory Aug30-144215[WARN] hlt02: cp: cannot stat ‘/group/online/dataflow/options/LHCbA/LHCbA_HLT01_HLT.opts’: No such file or directory Just ignore it

Errors and warnings that are not a/our problem

There will be serveral harmless warnings from the analyzers during configuring and running such as:

Warning: using CKThetaQuarzRedractCorrections = [0,-0.0001,0]

UpdateManagerSvc: Override condition for path 'Conditions/Environment/Rich1/RefractivityScaleFactor' is defined more than once

FinalTrackClones:TrackBuildCloneTable:: The WARNING message is suppressed: 'Probleme extrapolating state'

TrackBestTrackCreator.Fitter:TrackMasterFitter:: The WARNING message is suppressed: 'unable to fit the track' ...

and others discussed on LHCbRichMirrorAlignShiftInfo

After the alignment has run

After the alignment has run check by hand in the working directory if it has converged and/or look at the output of the iterator in the hlt02 error logger / logging files (discussed in LHCbRichMirrorAlignCodeOnline).

In order to make sure that no nonsense happend in the alignment you should:

1. Have a look at the output plots for each iteration and see if the fits make sense. Maybe also skim for changes between iterations that seem to have gotten worse rather than better.

2. Look at the amount the mirrors were tilted and check for abnormally high values.

Log files

There are the hlt02 error logger / logging files (discussed in LHCbRichMirrorAlignCodeOnline). They should show up in the YYYYMMDD_HHMMSS directory after a successful alignment, otherwise they are in the Logging directory.

There will be a log file produced each day (being written to all the time) that contains the output of all analyzers that ran from the LHCbA partition ( = the alignment farm).

This log file will be at /clusterlogs/partitions/LHCbA/daq/LHCbA.log (under plus). Older log files are in clusterlogs/partitions/LHCbA/daq/old.

Note that these log files contain the entire output from all analyzers of this day (including other alignments etc). They are extremely large and probably not very practial. However you can filter the code using grep to isolate one node (e.g. hltf0119) to see how that node is configured and if everything ran (Brunel produces a table of processed events).

Explanation of files in the alignment folder (from Claire's email, 28 September 2015)

In /group/online/AligWork/MirrorAlignments/ there are folders for different dates which in turn contain alignments for Rich1 and/or Rich2.

The alignment procedure starts from the xml file like


and from this making the histograms in


If not using predetermined magnification factors The the starting xml-file (Rich1CondDBUpdate _Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i0.xml) is taken and tilts of +0.3 and -0.3 (+/-0.7 for RICH1) are applied around the y- and the z-axis for the primary and secondary mirrors respectively which gives 8 new xml files with endings like (these are for the calculation of the magnification factors):


The 8 .xml files are used to make the corresponding histograms stored in files like:


The 2D fits are performed on each of the 8 (+1 for untitled) root-files and the output stored in the folders which also contain the plots for each mirrorpair with the line of the 2D fit:


and with the ending corresponding to the tilted mirrors:


The the magnification factors are calculated for each tilt and each mirror and stored in files like:


If using predetermined magnification factors Then the files like the one just above are already pre-made.

After that the actual mirror-corrections are determined and summarised in the file (here only one per iteration):


(If all numbers under "only this iteration's additional corrections to compensations: truncated, not rounded" are .0 the alignment has converged.)

If the alignment has not converged the determined mirror-tilts are applied to the "old" xml-file (Rich1CondDBUpdate_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i0.xml in this case) to make the next one:


and the whole procedure continues with the next iteration...

To figure out how many events were actually run over in the alignment

This info should be in AlignmentView now. You can also use the following:

It can be very useful to figure out how many events were actually run over in the alignment for each RICH detector. It is hard to get this number directly, because our HLT Line provides both Rich1 and Rich2 samples to the online RICH mirror alignment together.

First, create a file called numTriggers.C

In it put the following:

void numTriggers(TString dirname, TString TH1name)
  TString ext=".root";
  TSystemDirectory dir(dirname, dirname);
  TList *files = dir.GetListOfFiles();
  if (files) {
     TSystemFile *file;
     TString fname;
     TIter next(files);
     int onlyOne = 1;
     while ( onlyOne == 1 && (file=(TSystemFile*)next()) ) {
        fname = file->GetName();
        if (!file->IsDirectory() && fname.EndsWith(ext)) {
           TFile *f = TFile::Open(dirname+"/"+fname.Data());
           TH1 * h1 = (TH1*)f->Get(TH1name);
           cout << fname.Data() << " has " << h1->GetEntries() << " entries in TH1 "<< TH1name << "." << endl;

Save it, and then to find the number of entries in the TH1 histogram of your choice, in the latest .root file in a directory, just run (for example):

lb-run root root.exe -b -l -q  'numTriggers.C("/group/online/AligWork/MirrorAlignments/Rich1/20160103_232542/","RICH/RichODIN/TriggerType")’   

So in this case, it spits out the actual number of events the alignment ran over for the alignment Rich1/20160103_232542:

RichRecQCHistos_rich1_Mp6Wi8.0Fv3Cm0Sm1_online_Collision15_i2.root has 2.06806e+06 entries in TH1 RICH/RichODIN/TriggerType.

So you know this particular alignment ran over about 2 million RICH1 triggered events.

As long as the alignment runs over only one TriggerType (it should due to the HLT filter), this will be accurate and of course much faster than looking in a TBrowser.

Back to main Mirror alignment TWiki

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpeg Alignment10.jpeg r1 manage 52.7 K 2017-04-14 - 19:44 ParasNaik  
JPEGjpeg Alignment13a.jpeg r1 manage 54.5 K 2017-04-14 - 19:44 ParasNaik  
JPEGjpeg Alignment3.jpeg r1 manage 52.9 K 2017-04-14 - 19:42 ParasNaik  
JPEGjpeg Alignment4.jpeg r1 manage 53.9 K 2017-04-14 - 19:42 ParasNaik  
JPEGjpeg Alignment5.jpeg r1 manage 52.6 K 2017-04-14 - 19:42 ParasNaik  
JPEGjpeg Alignment6.jpeg r1 manage 54.5 K 2017-04-14 - 19:42 ParasNaik  
JPEGjpeg Alignment7.jpeg r1 manage 53.5 K 2017-04-14 - 19:42 ParasNaik  
PNGpng Alignment7b.png r1 manage 38.0 K 2017-04-14 - 19:42 ParasNaik  
JPEGjpeg Alignment8.jpeg r1 manage 53.1 K 2017-04-14 - 19:42 ParasNaik  
PNGpng Alignment8b.png r1 manage 214.4 K 2017-04-14 - 19:42 ParasNaik  
PNGpng Alignment9.png r1 manage 201.1 K 2017-04-14 - 19:42 ParasNaik  
PNGpng Alignment9b.png r1 manage 114.4 K 2017-04-14 - 19:42 ParasNaik  
PNGpng ErrorLogger1.png r1 manage 402.6 K 2017-04-14 - 19:44 ParasNaik  
PNGpng ErrorLogger2.png r1 manage 202.4 K 2017-04-14 - 19:44 ParasNaik  
Edit | Attach | Watch | Print version | History: r37 < r36 < r35 < r34 < r33 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r37 - 2021-11-07 - ParasNaik
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback