--
ParasNaik - 2017-05-16 --
ClaireProuve - 2015-07-18
Running the Online RICH mirror alignment
How to run the Alignment from the PVSS panel
Before running
XML-file to start with
If you want to start from a specific xml-file (i.e a specific mirror alignment) you need to copy it to the right location. Just follow the instructions
here.
[Paras will remove later, as this is no longer needed, but first a backup may need to be made] The alignments that were put into the database should also be kept at this location:
/group/rich/AlignmentFiles/databaseAlignments
, where you can pick them up whenever you want to.
Check Configuration file
There is only one configuration file,
Configuration.py
. If you want to use the same configuration as used in a previous alignment, since 2017 you can just go into that alignment's directory and retrieve the version of
Configuration.py
used. Of course, if you wish to repeat the alignment, you need to make sure that new Configurables have not been added since!
Since the Configuration file in the
master
branch of Panoptes is
the Configuration file, where
Configuration.py
exists we have created a subdirectory called
SavedConfigurations
, that contains
.txt
versions of various configurations that you can write over
Configuration.py
before you start your alignments.
Please read README.txt
. Always create new configurations in the
SavedConfigurations
directory, update
SavedConfigurations/README.txt
, then overwrite
Configuration.py
with your new configuration.
It is necessary to check if all parameters (especially starting-iteration, though we almost always start from Iteration 0) are set correctly. Make sure all your variables are set consistently.
You can read more about the
Configuration file.
NOTE after changing the Configuration file locally, you have to
recompile the code (including a git fetch Panoptes
and a git lb-checkout
your branch of Rich/RichMirrorAlignmentOnline
first) for the changes to be implemented in the next mirror alignment.
Magnification factors
You can chose between calculating the magnification factors on-the-fly for each iteration or using predetermined ones. In order to chose you have to modify the value of
magnifCoeffMode in the
Configuration file.
Find out more about using
predetermined Magnification factors.
Magnification factors on-the-fly
Set the value of
magnifCoeffMode in the
Configuration file to 2.
Predetermined magnification factors
Set the value of
magnifCoeffMode in the
Configuration file to 0.
Then you have to provide the files containing the predetermined magnification factors in a directory and
point towards it in magnifDir in the
Configuration file.
Creating a new directory for predetermined magnification factors
Choices for fixed magnification factors should always be stored in
/group/rich/AlignmentFiles/MagnifFactors/Rich1/
and
/group/rich/AlignmentFiles/MagnifFactors/Rich2/
.
If you want to use a new set of magnification factors make sure that the files have the same names as those that can currently be found within the substructure of those directories.
Say you determine new magnification factors for RICH1, by allowing them to float. If you want to install them, follow the following procedure on a
plus
machine (substituting your alignment name for
20170612_004007
, and the appropriate nameString instead of "_Mp6Wi8.0Fm5Mm2Sm0_online_Collision17"):
-
cd /group/rich/AlignmentFiles/MagnifFactors/Rich1/
-
ls /group/online/AligWork/MirrorAlignments/Rich1/20170612_004007
- Look at the last iteration number and remember it.
-
mkdir 20170612_004007
-
cd 20170612_004007
-
cp /group/online/AligWork/MirrorAlignments/Rich1/20170612_004007/summary.txt .
- (substituting your alignment number for
_i4
)
-
cp /group/online/AligWork/MirrorAlignments/Rich1/20170612_004007/Rich1MirrMagnFactors*_i4.txt .
-
for file in Rich1*; do mv "$file" ${file//_i4/_predefined}; done
-
for file in Rich1*; do mv "$file" ${file//_Mp6Wi8.0Fm5Mm2Sm0_online_Collision17/}; done
-
rm Rich1MirrMagnFactors_predefined.txt
- change the magnifDir in the Configuration file
For RICH2 of course you want to substitute "Rich2" for "Rich1" in the above procedure.
Errorloggers
You will need to look at two different errorloggers:
1. For the analyzers: errorLog LHCbA
You can change the output level in the little settings window: navigate (with the arrow keys) up to "Severity for messages" and cycle though the options with the ">" key-combination. When you have found the one you want simply hit "enter". This can be done at any time during the running (the message window might need a while to catch up with the command though).
2. For the iterator: ssh -Y hlt02
source /group/online/dataflow/scripts/shell_macros.sh
errlog -m hlt02
NOTE: here you want the errlog
command and NOT the errorLog
command!!!
This panel is a bit moody and will sometimes not output the messages at the right time. It might take a while, or wait for the next iteration or wait til the program finished.
Run the alignment
0. Create an FSM launch script on ui
. You only need to do this once (unless there is a new version of WinCC)
ssh -Y ui
then create a new file called
fsm.sh
and put the following into it:
if [[ "$WCCOA_DIR" =~ "WinCC_OA/3.11" ]]; then
echo "Starting the FSM on WinCC-OA 3.11"
/group/online/ecs/Shortcuts311/LHCb/ECS/ECS_UI_FSM.sh
elif [[ "$WCCOA_DIR" =~ "WinCC_OA/3.15" ]]; then
echo "Starting the FSM on WinCC-OA 3.15"
/group/online/ecs/Shortcuts315/LHCb/ECS/ECS_UI_FSM.sh
else
echo "You need to edit fsm.sh and add functionality for this WinCC version:"
echo $WCCOA_DIR
fi
1. Open the panel from an ui or plus machine:
ssh -Y ui
(if you haven't already)
source fsm.sh
2. Right-click on LHCb_Align
3. Take the partion: Only take the partition if no one else is using the alignment!
click "take" and then
wait for a smaller new panel to show up (this may take a while, just be patient). In the new panel just click "*dismiss*".
4. Choose your alignment: select the activity from the menu on the right of the Run Info panel.
5. Allocate:
6. Reserve alignment farm: Put your name into the panel and click the button to reserve the alignment.
7. Select runs: Select the run range to use in the alignment. This procedure will be fully automated after an initial testing period. The runs to use are the first ones of each fill (if they have enough events). Click on "Choose runs for alignment".
You can look at which runs you want in the
run database.
Now select the runs you want to run over and click "Ok".
Make sure to pick "Rich" from the list at the top! The number of events for each run listed in this window is the combination of events provided from the RICH1 and RICH2 mirror HLT lines. So it is not exact but should give an order-of-magnitude (approximate split was intended to be 50% RICH1 and 50% RICH2, but with substantial error bars).
To get the number of events actually processed, use the
numTriggers.C script.
8. Verify / select farms to run on: Click on "HLT" which will open up an overview of the HLT subfarms and nodes.
The selection-panel on the left show the included/ excluded nodes and subfarms. In order to include/exclude subfarms/nodes select them in the panels and then click the button with the error pointing toward the left/right. The click the "Include"/"Remove" button. This might take a while, just be patient.
The subfarm marked in red are not working right now (for whatever reasons) and can and shall not be included.
When all available nodes are included their number should be between 1200 and 1800. If it is significantly less (~500 or so) you need to deallocate and reallocate the partition. If the problem persists go and complain to an online expert.
9. Look at the status of the subfarms and nodes: Click the "PARTAlign" button in the HLT panel
Another panel will open that will show you the state of the subfarms. By clicking on a certain subfarm you will get the state of its individual nodes.
10. Start alignment by configuring: The configuring process might take a couple of minutes and some nodes might take significantly longer than others. That's just normal. If one node seems crazy slow or fails you can remove it as shown in step 7.
There is a time-limit set of how long the configuring is allowed to take. It is possible to take longer and if nodes take longer their status will appear as error. Give it time and verify in the errorlogger that something is still happening. If it takes to long or seems stuck just remove it.
Do not change the status of things anywhere else than in this specific panel (unless you are an expert). Don't give commands to subfarms or nodes directly!
11. Start the run: From the same dropdown panel select "start run".
The "RunInfo" button on the left will change to "running". Wait a few seconds and see what happens to the "HLT" button. Sometimes it goes straight to running and sometimes it will be back at "Ready". In that case just select "start run" again.
12. Now lean back and relax
If you want to you can go into the working directory and see the files being written =)
13a. If you want to stop aligning, free the alignment farm again
13b. If you want to perform another alignment, you MUST RECONFIGURE by following these steps
- The alignment will have gone into READY, click on READY at the top and pick RESET
- Wait until the alignment goes into NOT READY
- Now do whatever you need to do to prepare for the next alignment: it may be changing run numbers, changing configuration (REMEBER YOU MUST COMPILE), or changing the starting XML file
- CONFIGURE the alignment
- when it is all back in READY, START_RUN
Monitoring the procedure
There is no monitoring in place yet.
The results of each iterations are saved, this includes xml files and histograms (more details below). For the moment the monitoring should be performed manually looking at the output of the fits, especially looking at the plots!!!
The output of each perfomed alignment will be saved under
/group/online/AligWork/MirrorAlignments/Rich' + str(whichRich)/{YYMMDD_hhmmss}
.
Xml files location
- The xml files produced in the job you are running will be saved to
/group/online/AligWork/MirrorAlignments/Rich' + str(whichRich)/{YYMMDD_hhmmss}
- The xml file the alignment starts with is picked up from
/group/online/alignment/Rich' + str(whichRich)+ '/MirrorAlign/
Histograms location
The histograms are produced automatically when running the alignment. We have one root file for each iteration, they are at: the histograms are at
/hist/Savesets/2015/LHCbA/
Nomenclature convention: ex.
/hist/Savesets/2015/LHCbA/AligWork_Rich1/07/10/AligWrk_Rich1-1569160001-20150710T100630-EOR.root
- Rich1 is the name of the activity (Rich1, Rich2, Muon Tracker or Velo)
- 156916 is the first run number in the run list
- 0001 is the number of iteration
- 20150710T100630 is the time when the file has been generated.
The histograms will we copied into the work directory for fitting etc. All histograms will also be saved at the end of the alignment procedure to
/group/online/AligWork/MirrorAlignments/Rich' + str(whichRich)/{YYMMDD_hhmmss}
To run root on plus:
lb-run root root.exe FILENAME.root
Known issues and workarounds
Everything takes ages. That's normal
Actual errors
When taking the LHCbA portion, get a warning that says that PARTAlign is owned by ECS:LHCbARunControl (or someone else).
If you have to take
LHCbA, and you get a warning like this chances are it
LHCbA taken by someone else.
Typically this partition should not be taken by someone else unless they are using it or the alignment is in automatic mode (otherwise you should have been warned that
LHCbA is occupied and you shouldn't use it).
If none of the subfarms are able to be included and PARTAlign remains OFFLINE, you should try to force exclude PARTAlign, deallocate everything and then you can try to allocate yourself
LHCbAlign from top level.
To force exclude, you can right click on the lock to go into expert mode and then select the option.
Alignment goes into state “READY,” but the alignment hasn’t converged (ALWAYS CHECK!)
- Just click on “start run” again (no reconfiguring, etc.!!!)
PARTAlign_Master READY, nodes READY
It is possible that the job is stuck with the PARTAlign_Master in
RUNNING and the various nodes (e.g. HLTA05_A) in
READY.
In this case select again
START_RUN (as mentioned above). This can happen after configuring but also between iterations.
Only few nodes included
If only few nodes are in the "Included Nodes and Removed Nodes", try to DEALLOCATE and then ALLOCATE again.
HLT goes into state error: Check whether it is all nodes or just one
- If its all nodes then there is an actual error in the code, try checking the error loggers.
- If only individual nodes are in error it could be
- Time-out during configuring
- give it time ~20 min
- There is a time-limit set of how long the configuring is allowed to take.
- It is possible to take longer and if nodes take longer their status will appear as error.
- Give it time and verify in the errorlogger that something is still happening.
- a crushed node during running
- In LHCbA_HLT: TOP
- Select the node (e.g. HLTE1028_A)
- Click “arrow” to exclude area
- Click “remove” to exclude (might take a while, just wait)
Nodes going into error while running
Sometimes a node goes into error state without apparent reason. This will prevent the others to get to the next iteration. Just exclude the node and click "start run" again.
Red herrings
something with Options "../../jobs " something ==> a problem in
your python-configurations
- use
tail -n 20 /clusterlogs/partitions/LHCbA/daq/LHCbA.log
to see more details of the error (can increase the number 20
if need be)
something complaining about some
Configuration.py
file in the
OnlineDev
something something directory ==> something in
our own Configuration file is wrong
Errors and warnings that are not a/our problem
There will be serveral harmless warnings from the analyzers during configuring and running such as:
Warning: using CKThetaQuarzRedractCorrections = [0,-0.0001,0]
UpdateManagerSvc: Override condition for path 'Conditions/Environment/Rich1/RefractivityScaleFactor' is defined more than once
FinalTrackClones:TrackBuildCloneTable:: The WARNING message is suppressed: 'Probleme extrapolating state'
TrackBestTrackCreator.Fitter:TrackMasterFitter:: The WARNING message is suppressed: 'unable to fit the track' ...
After the alignment has run
After the alignment has run check by hand in the working directory if it has converged and/or look at the output of the iterator in the hlt02 errorlogger.
In order to make sure that no nonsense happend in the alignment you should:
1. Have a look at the output plots for each iteration and see if the fits make sense. Maybe also skim for changes between iterations that seem to have gotten worse rather than better.
2. Look at the amount the mirrors were tilted and check for abnormally high values.
Log files
There will be a log file produced each day that contains the output of all analyzers that ran from the
LHCbA partition ( = the alignment farm).
This log file will be at
/clusterlogs/partitions/LHCbA/daq/LHCbA.log
(under plus). Older log files are in
clusterlogs/partitions/LHCbA/daq/old
.
Note that these log files contain the entire output from all analyzers of this day (including other alignments etc). They are extremely large and probably not very practial.
Explanation of files in the alignment folder (from Claire's email, 28 September 2015)
For the moment all alignments and alignment-attempts are being stored in plus under
/home/cprouve/RichAlignment/alignments
In there are folders for different dates which in turn contain alignments for Rich1 and/or Rich2.
The alignment procedure starts from the xml file like
Rich1CondDBUpdate_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i0.xml
and from this making the histograms in
RichRecQCHistos_rich1_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i0.root
The the starting xml-file (
Rich1CondDBUpdate _Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i0.xml
) is taken and tilts of +0.3 and -0.3 are applied around the y- and the z-axis for the primary and secondary mirrors respectively which gives 8 new xml files with endings like (these are for the calculation of the magnification factors):
Rich1CondDBUpdate_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_pri_negYzerZ_i0.xml
The 8
.xml
files are used to make the corresponding histograms stored in files like:
RichRecQCHistos_rich1_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_pri_negYzerZ_i0.root
The 2D fits are performed on each of the 9 root-files and the output stored in the folders which also contain the plots for each mirrorpair with the line of the 2D fit:
Rich1MirrCombinFit_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i0
and with the ending corresponding to the tilted mirrors:
Rich1MirrCombinFit_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_pri_negYzerZ_i0
The the magnification factors are calculated for each tilt and each mirror and stored in files like:
Rich1MirrMagnFactors_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_pri_negYzerZ_i0.txt
After that the actual mirror-corrections are determined and summarised in the file (here only one per iteration):
Rich1MirrAlignOut_i0.txt
(If all numbers under "only this iteration's additional corrections to compensations: truncated, not rounded" are .0 the alignment has converged.)
If the alignment has not converged the determined mirror-tilts are applied to the "old" xml-file (
Rich1CondDBUpdate_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i0.xml
in this case) to make the next one:
Rich1CondDBUpdate_Mp6Wi4.0Fv3Cm2Sm1_online_Collision15_i1.xml
and the whole procedure continues with the next iteration...
To figure out how many events were actually run over in the alignment
It can be very useful to figure out how many events were actually run over in the alignment for each RICH detector. It is hard to get this number directly, because our HLT Line provides both Rich1 and Rich2 samples to the online RICH mirror alignment together.
First, create a file called
numTriggers.C
In it put the following:
void numTriggers(TString dirname, TString TH1name)
{
TString ext=".root";
TSystemDirectory dir(dirname, dirname);
TList *files = dir.GetListOfFiles();
if (files) {
TSystemFile *file;
TString fname;
TIter next(files);
int onlyOne = 1;
while ( onlyOne == 1 && (file=(TSystemFile*)next()) ) {
fname = file->GetName();
if (!file->IsDirectory() && fname.EndsWith(ext)) {
TFile *f = TFile::Open(dirname+"/"+fname.Data());
TH1 * h1 = (TH1*)f->Get(TH1name);
cout << fname.Data() << " has " << h1->GetEntries() << " entries in TH1 "<< TH1name << "." << endl;
onlyOne++;
}
}
}
}
Save it, and then to find the number of entries in the TH1 histogram of your choice, in the latest .root file in a directory, just run (for example):
lb-run root root.exe -b -l -q 'numTriggers.C("/group/online/AligWork/MirrorAlignments/Rich1/20160103_232542/","RICH/RichODIN/TriggerType")’
So in this case, it spits out the actual number of events the alignment ran over for the alignment
Rich1/20160103_232542
:
RichRecQCHistos_rich1_Mp6Wi8.0Fv3Cm0Sm1_online_Collision15_i2.root has 2.06806e+06 entries in TH1 RICH/RichODIN/TriggerType.
So you know this particular alignment ran over about 2 million RICH1 triggered events.
As long as the alignment runs over only one
TriggerType (it should due to the HLT filter), this will be accurate and of course much faster than looking in a TBrowser.
Back to main Mirror alignment TWiki