-- ParasNaik - 2017-05-16 -- ClaireProuve - 2015-07-18

Running the Online RICH mirror alignment

How to run the Alignment from the PVSS panel

Before running

XML-file to start with

If you want to start from a specific xml-file (i.e a specific mirror alignment) you need to copy it to the right location. Just follow the instructions here. Please remove the alignment afterwards or make sure the alignment that is currently in the database gets put back into the correct folder with the latest version number.

The alignments that were put into the database should also be kept at this location: /group/rich/AlignmentFiles/databaseAlignments
where you can pick them up whenever you want to.

Check Configuration file

There is only one confuration file, Configuration.py. Since everybody is using the same configuration file, we have created a subdirectory where the configuration file is located called SavedConfigurations, that contains .txt versions of various configurations that you can write over Configuration.py before you start your alignments. Please read README.txt. Always create new configurations in the SavedConfigurations directory, update SavedConfigurations/README.txt, then overwrite Configuration.py with your new configuration. It is necessary to check if all parameters (especially starting-iteration) are set correctly. Make sure all your variables are set consitently.
More info about the Configuration file.
After changing the Configuration file apply a do_configure and do_install for the changes to be picked up by the package.

Magnification factors

You can chose between calculating the magnification factors on-the-fly for each iteration or using predetermined ones. In order to chose you have to modify the value of magnifCoeffMode in the Configuration file.

Magnification factors on the fly

Set the value of magnifCoeffMode in the Configuration file to 2.

Predetermined magnification factors

Set the value of magnifCoeffMode in the Configuration file to 0.
Then you have to provide the files containing the predetermined magnification factors in a directory and point towards it in magnifDir in the Configuration file.

Creating a new directory for predetermined magnification factors

Choices for fixed magnification factors should always be stored in
/group/rich/AlignmentFiles/MagnifFactors/Rich1/ and /group/rich/AlignmentFiles/MagnifFactors/Rich2/.

If you want to use a new set of magnification factors make sure that the files have the same names as those that can currently be found within the substructure of those directories.

Say you determine new magnification factors for RICH1, by allowing them to float. If you want to install them, follow the following procedure on a plus machine (substituting your alignment name for 20170612_004007, and the appropriate nameString instead of "_Mp6Wi8.0Fm5Mm2Sm0_online_Collision17"):

  • cd /group/rich/AlignmentFiles/MagnifFactors/Rich1/
  • ls /group/online/AligWork/MirrorAlignments/Rich1/20170612_004007
    • Look at the last iteration number and remember it.
  • mkdir 20170612_004007
  • cd 20170612_004007
  • cp /group/online/AligWork/MirrorAlignments/Rich1/20170612_004007/summary.txt .
  • (substituting your alignment number for _i4)
    • cp /group/online/AligWork/MirrorAlignments/Rich1/20170612_004007/Rich1MirrMagnFactors*_i4.txt .
    • for file in Rich1*; do mv "$file" ${file//_i4/_predefined}; done
  • for file in Rich1*; do mv "$file" ${file//_Mp6Wi8.0Fm5Mm2Sm0_online_Collision17/}; done
  • rm Rich1MirrMagnFactors_predefined.txt
  • change the magnifDir in the Configuration file

For RICH2 of course you want to substitute "Rich2" for "Rich1" in the above procedure.

Errorloggers

You will need to look at two different errorloggers:

1. For the analyzers:
errorLog LHCbA

You can change the output level in the little settings window: navigate (with the arrow keys) up to "Severity for messages" and cycle though the options with the ">" key-combination. When you have found the one you want simply hit "enter". This can be done at any time during the running (the message window might need a while to catch up with the command though).

2. For the iterator:
ssh -Y hlt02
source /group/online/dataflow/scripts/shell_macros.sh
errlog -m hlt02

NOTE: here you want the errlog command and NOT the errorLog command!!!

This panel is a bit moody and will sometimes not output the messages at the right time. It might take a while, or wait for the next iteration or wait til the program finished.

Run the alignment

0. Create an FSM launch script on ui. You only need to do this once (unless there is a new version of WinCC)

ssh -Y ui

then create a new file called fsm.sh and put the following into it:

if [[ "$WCCOA_DIR" =~ "WinCC_OA/3.11" ]]; then
    echo "Starting the FSM on WinCC-OA 3.11"
    /group/online/ecs/Shortcuts311/LHCb/ECS/ECS_UI_FSM.sh 
elif [[ "$WCCOA_DIR" =~ "WinCC_OA/3.15" ]]; then
    echo "Starting the FSM on WinCC-OA 3.15"
    /group/online/ecs/Shortcuts315/LHCb/ECS/ECS_UI_FSM.sh 
else
    echo "You need to edit fsm.sh and add functionality for this WinCC version:"
    echo $WCCOA_DIR
fi

1. Open the panel from an ui or plus machine:

ssh -Y ui (if you haven't already)

source fsm.sh

2. Right-click on LHCb_Align

3. Take the partion: Only take the partition if no one else is using the alignment!

click "take" and then wait for a smaller new panel to show up (this may take a while, just be patient). In the new panel just click "*dismiss*".

4. Choose your alignment: select the activity from the menu on the right of the Run Info panel.

5. Allocate:

6. Reserve alignment farm: Put your name into the panel and click the button to reserve the alignment.

7. Select runs: Select the run range to use in the alignment. This procedure will be fully automated after an initial testing period. The runs to use are the first ones of each fill (if they have enough events). Click on "Choose runs for alignment".

You can look at which runs you want in the run database.

Now select the runs you want to run over and click "Ok". Make sure to pick "Rich" from the list at the top! The number of events for each run listed in this window is the combination of events provided from the RICH1 and RICH2 mirror HLT lines. So it is not exact but should give an order-of-magnitude (approximate split was intended to be 50% RICH1 and 50% RICH2, but with substantial error bars). To get the number of events actually processed, use the numTriggers.C script.

8. Verify / select farms to run on: Click on "HLT" which will open up an overview of the HLT subfarms and nodes.

The selection-panel on the left show the included/ excluded nodes and subfarms. In order to include/exclude subfarms/nodes select them in the panels and then click the button with the error pointing toward the left/right. The click the "Include"/"Remove" button. This might take a while, just be patient.

The subfarm marked in red are not working right now (for whatever reasons) and can and shall not be included.

When all available nodes are included their number should be between 1200 and 1800. If it is significantly less (~500 or so) you need to deallocate and reallocate the partition. If the problem persists go and complain to an online expert.

9. Look at the status of the subfarms and nodes: Click the "PARTAlign" button in the HLT panel

Another panel will open that will show you the state of the subfarms. By clicking on a certain subfarm you will get the state of its individual nodes.

10. Start alignment by configuring: The configuring process might take a couple of minutes and some nodes might take significantly longer than others. That's just normal. If one node seems crazy slow or fails you can remove it as shown in step 7.

There is a time-limit set of how long the configuring is allowed to take. It is possible to take longer and if nodes take longer their status will appear as error. Give it time and verify in the errorlogger that something is still happening. If it takes to long or seems stuck just remove it.

Do not change the status of things anywhere else than in this specific panel (unless you are an expert). Don't give commands to subfarms or nodes directly!

11. Start the run: From the same dropdown panel select "start run".

The "RunInfo" button on the left will change to "running". Wait a few seconds and see what happens to the "HLT" button. Sometimes it goes straight to running and sometimes it will be back at "Ready". In that case just select "start run" again.

12. Now lean back and relax wink

If you want to you can go into the working directory and see the files being written =)

13a. If you want to stop aligning, free the alignment farm again

13b. If you want to perform another alignment, you MUST RECONFIGURE by following these steps

  • The alignment will have gone into READY, click on READY at the top and pick RESET
  • Wait until the alignment goes into NOT READY
  • Now do whatever you need to do to prepare for the next alignment: it may be changing run numbers, changing configuration (REMEBER YOU MUST COMPILE), or changing the starting XML file
  • CONFIGURE the alignment
  • when it is all back in READY, START_RUN

Monitoring the procedure

There is no monitoring in place yet.

The results of each iterations are saved, this includes xml files and histograms (more details below). For the moment the monitoring should be performed manually looking at the output of the fits, especially looking at the plots!!!

The output of each perfomed alignment will be saved under /group/online/AligWork/MirrorAlignments/Rich' + str(whichRich)/{YYMMDD_hhmmss} .

Xml files location

  • The xml files produced in the job you are running will be saved to /group/online/AligWork/MirrorAlignments/Rich' + str(whichRich)/{YYMMDD_hhmmss}
  • The xml file the alignment starts with is picked up from /group/online/alignment/Rich' + str(whichRich)+ '/MirrorAlign/

Histograms location

The histograms are produced automatically when running the alignment. We have one root file for each iteration, they are at: the histograms are at /hist/Savesets/2015/LHCbA/ Nomenclature convention: ex. /hist/Savesets/2015/LHCbA/AligWork_Rich1/07/10/AligWrk_Rich1-1569160001-20150710T100630-EOR.root

  • Rich1 is the name of the activity (Rich1, Rich2, Muon Tracker or Velo)
  • 156916 is the first run number in the run list
  • 0001 is the number of iteration
  • 20150710T100630 is the time when the file has been generated.
The histograms will we copied into the work directory for fitting etc. All histograms will also be saved at the end of the alignment procedure to /group/online/AligWork/MirrorAlignments/Rich' + str(whichRich)/{YYMMDD_hhmmss}

To run root on plus: lb-run root root.exe FILENAME.root

Known issues and workarounds

Everything takes ages. That's normal smile

Actual errors

When taking the LHCbA portion, get a warning that says that PARTAlign is owned by ECS:LHCbARunControl (or someone else).

If you have to take LHCbA, and you get a warning like this chances are it LHCbA taken by someone else. Typically this partition should not be taken by someone else unless they are using it or the alignment is in automatic mode (otherwise you should have been warned that LHCbA is occupied and you shouldn't use it).

If none of the subfarms are able to be included and PARTAlign remains OFFLINE, you should try to force exclude PARTAlign, deallocate everything and then you can try to allocate yourself LHCbAlign from top level. To force exclude, you can right click on the lock to go into expert mode and then select the option.

Alignment goes into state “READY,” but the alignment hasn’t converged (ALWAYS CHECK!)

  • Just click on “start run” again (no reconfiguring, etc.!!!)

PARTAlign_Master READY, nodes READY

It is possible that the job is stuck with the PARTAlign_Master in RUNNING and the various nodes (e.g. HLTA05_A) in READY.
In this case select again START_RUN (as mentioned above). This can happen after configuring but also between iterations.

Only few nodes included

If only few nodes are in the "Included Nodes and Removed Nodes", try to DEALLOCATE and then ALLOCATE again.

HLT goes into state error: Check whether it is all nodes or just one

  • If its all nodes then there is an actual error in the code, try checking the error loggers.

  • If only individual nodes are in error it could be
    • Time-out during configuring
      • give it time ~20 min
      • There is a time-limit set of how long the configuring is allowed to take.
      • It is possible to take longer and if nodes take longer their status will appear as error.
      • Give it time and verify in the errorlogger that something is still happening.
    • a crushed node during running
      • In LHCbA_HLT: TOP
        • Select the node (e.g. HLTE1028_A)
        • Click “arrow” to exclude area
        • Click “remove” to exclude (might take a while, just wait)

Nodes going into error while running

Sometimes a node goes into error state without apparent reason. This will prevent the others to get to the next iteration. Just exclude the node and click "start run" again.

Red herrings

something with Options "../../jobs " something ==> a problem in your python-configurations

  • use tail -n 20 /clusterlogs/partitions/LHCbA/daq/LHCbA.log to see more details of the error (can increase the number 20 if need be)

something complaining about some Configuration.py file in the OnlineDev something something directory ==> something in our own Configuration file is wrong

Errors and warnings that are not a/our problem

There will be serveral harmless warnings from the analyzers during configuring and running such as:

Warning: using CKThetaQuarzRedractCorrections = [0,-0.0001,0]

UpdateManagerSvc: Override condition for path 'Conditions/Environment/Rich1/RefractivityScaleFactor' is defined more than once

FinalTrackClones:TrackBuildCloneTable:: The WARNING message is suppressed: 'Probleme extrapolating state'

TrackBestTrackCreator.Fitter:TrackMasterFitter:: The WARNING message is suppressed: 'unable to fit the track' ...

After the alignment has run

After the alignment has run check by hand in the working directory if it has converged and/or look at the output of the iterator in the hlt02 errorlogger.

In order to make sure that no nonsense happend in the alignment you should:

1. Have a look at the output plots for each iteration and see if the fits make sense. Maybe also skim for changes between iterations that seem to have gotten worse rather than better.

2. Look at the amount the mirrors were tilted and check for abnormally high values.

Log files

There will be a log file produced each day that contains the output of all analyzers that ran from the LHCbA partition ( = the alignment farm).

This log file will be at /clusterlogs/partitions/LHCbA/daq/LHCbA.log (under plus). Older log files are in clusterlogs/partitions/LHCbA/daq/old.

Note that these log files contain the entire output from all analyzers of this day (including other alignments etc). They are extremely large and probably not very practial.

Explanation of files in the alignment folder (from Claire's email, 28 September 2015)

For the moment all alignments and alignment-attempts are being stored in plus under /home/cprouve/RichAlignment/alignments

In there are folders for different dates which in turn contain alignments for Rich1 and/or Rich2.

The alignment procedure starts from the xml file like

Rich1CondDBUpdate_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i0.xml

and from this making the histograms in

RichRecQCHistos_rich1_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i0.root

The the starting xml-file (Rich1CondDBUpdate _Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i0.xml) is taken and tilts of +0.3 and -0.3 are applied around the y- and the z-axis for the primary and secondary mirrors respectively which gives 8 new xml files with endings like (these are for the calculation of the magnification factors):

Rich1CondDBUpdate_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_pri_negYzerZ_i0.xml

The 8 .xml files are used to make the corresponding histograms stored in files like:

RichRecQCHistos_rich1_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_pri_negYzerZ_i0.root

The 2D fits are performed on each of the 9 root-files and the output stored in the folders which also contain the plots for each mirrorpair with the line of the 2D fit:

Rich1MirrCombinFit_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i0

and with the ending corresponding to the tilted mirrors:

Rich1MirrCombinFit_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_pri_negYzerZ_i0

The the magnification factors are calculated for each tilt and each mirror and stored in files like:

Rich1MirrMagnFactors_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_pri_negYzerZ_i0.txt

After that the actual mirror-corrections are determined and summarised in the file (here only one per iteration):

Rich1MirrAlignOut_i0.txt

(If all numbers under "only this iteration's additional corrections to compensations: truncated, not rounded" are .0 the alignment has converged.)

If the alignment has not converged the determined mirror-tilts are applied to the "old" xml-file (Rich1CondDBUpdate_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i0.xml in this case) to make the next one:

Rich1CondDBUpdate_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i1.xml

and the whole procedure continues with the next iteration...

To figure out how many events were actually run over in the alignment

It can be very useful to figure out how many events were actually run over in the alignment for each RICH detector. It is hard to get this number directly, because our HLT Line provides both Rich1 and Rich2 samples to the online RICH mirror alignment together.

First, create a file called numTriggers.C

In it put the following:

void numTriggers(TString dirname, TString TH1name)
{
  TString ext=".root";
  TSystemDirectory dir(dirname, dirname);
  TList *files = dir.GetListOfFiles();
  if (files) {
     TSystemFile *file;
     TString fname;
     TIter next(files);
     int onlyOne = 1;
     while ( onlyOne == 1 && (file=(TSystemFile*)next()) ) {
        fname = file->GetName();
        if (!file->IsDirectory() && fname.EndsWith(ext)) {
           TFile *f = TFile::Open(dirname+"/"+fname.Data());
           TH1 * h1 = (TH1*)f->Get(TH1name);
           cout << fname.Data() << " has " << h1->GetEntries() << " entries in TH1 "<< TH1name << "." << endl;
           onlyOne++;
        }
     }
  }
}

Save it, and then to find the number of entries in the TH1 histogram of your choice, in the latest .root file in a directory, just run (for example):

lb-run root root.exe -b -l -q  'numTriggers.C("/group/online/AligWork/MirrorAlignments/Rich1/20160103_232542/","RICH/RichODIN/TriggerType")’   

So in this case, it spits out the actual number of events the alignment ran over for the alignment Rich1/20160103_232542:

RichRecQCHistos_rich1_Mp6Wi8.0Fv3Cm0Sm1_online_Collision15_i2.root has 2.06806e+06 entries in TH1 RICH/RichODIN/TriggerType.

So you know this particular alignment ran over about 2 million RICH1 triggered events.

As long as the alignment runs over only one TriggerType (it should due to the HLT filter), this will be accurate and of course much faster than looking in a TBrowser.

Back to main Mirror alignment TWiki

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpeg Alignment10.jpeg r1 manage 52.7 K 2017-04-14 - 19:44 ParasNaik  
JPEGjpeg Alignment13a.jpeg r1 manage 54.5 K 2017-04-14 - 19:44 ParasNaik  
JPEGjpeg Alignment3.jpeg r1 manage 52.9 K 2017-04-14 - 19:42 ParasNaik  
JPEGjpeg Alignment4.jpeg r1 manage 53.9 K 2017-04-14 - 19:42 ParasNaik  
JPEGjpeg Alignment5.jpeg r1 manage 52.6 K 2017-04-14 - 19:42 ParasNaik  
JPEGjpeg Alignment6.jpeg r1 manage 54.5 K 2017-04-14 - 19:42 ParasNaik  
JPEGjpeg Alignment7.jpeg r1 manage 53.5 K 2017-04-14 - 19:42 ParasNaik  
PNGpng Alignment7b.png r1 manage 38.0 K 2017-04-14 - 19:42 ParasNaik  
JPEGjpeg Alignment8.jpeg r1 manage 53.1 K 2017-04-14 - 19:42 ParasNaik  
PNGpng Alignment8b.png r1 manage 214.4 K 2017-04-14 - 19:42 ParasNaik  
PNGpng Alignment9.png r1 manage 201.1 K 2017-04-14 - 19:42 ParasNaik  
PNGpng Alignment9b.png r1 manage 114.4 K 2017-04-14 - 19:42 ParasNaik  
PNGpng ErrorLogger1.png r1 manage 402.6 K 2017-04-14 - 19:44 ParasNaik  
PNGpng ErrorLogger2.png r1 manage 202.4 K 2017-04-14 - 19:44 ParasNaik  
Edit | Attach | Watch | Print version | History: r36 | r34 < r33 < r32 < r31 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r32 - 2017-07-09 - ParasNaik
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback