Alignment Position Error (APE) Estimator

Complete: 5

Goal of this page

Description of the configuration and operation of the tool for estimating the Alignment Position Error (APE).

Introduction

The tool is called ApeEstimator and is located in the /UserCode here. It contains several subtools needed in addition to the one which really calculates APE parameters, and several scripts to run the procedure and to produce validation plots.

If you want to run quickly the tool with the current configuration without knowing anything about details, you can read only Setting up the Tool, Scripts for Automated Workflow and Scripts producing Validation Plots, and read the other parts only if you need them.

Setting up the Tool

The current setup is tested and used with CMSSW_4_2_5. The recent tag is V02-01-03. After setting up the CMSSW area, do the following:

  • cvs co -r V02-01-03 -d Alignment UserCode/JohannesHauk/ApeAndCpeStudies/FullVersion/Alignment
  • cvs co -r V02-01-03 -d ApeEstimator UserCode/JohannesHauk/ApeAndCpeStudies/FullVersion/ApeEstimator
  • scram b
  • bash ApeEstimator/ApeEstimator/scripts/initialise.bash

The last command creates all relevant folders needed for the outputs, thus automated procedures in the scripts. It also copies some scripts for easier handling.

Now, it is necessary to create a .root-file containing the TTree with all relevant information about the silicon modules of the tracker. This is done by a simple standalone tool, placed in Alignment/TrackerTreeGenerator, which uses the ideal geometry. The ideal geometry is chosen, since it guarantees that selections of modules via their position space coordinates chose always the same modules. E.g. TOB modules on the same rod have the same design position in phi, but misalignment could cause a selection choosing only some modules if the cut is by accident selected around the nominal position. The file is created with:

  • cmsRun Alignment/TrackerTreeGenerator/test/trackerTreeGenerator_cfg.py

The .root-file containing the TTree can be found and browsed in Alignment/TrackerTreeGenerator/hists/TrackerTree.root, and there it is read from in the APE calculation.

Now, the tool is set up and the procedure for calculating APEs can be configured and started. However, in order to allow fast iterations and parallelisation, a private skim of the files which should be used is created, as explained in the following step.

Creation of Private Skim

Default Creation

In order to allow parallelisation and fast iterations, a private skim of files is created from the AlCaReco files. The event content is minimised to the needs for the ApeEstimator, a tighter preselection is also applied, and the files are split in size for optimising between number of files and number of events inside. To do so, the file ApeEstimator/ApeEstimator/test/batch/skimProducer.bash is run on the batch farm. It uses the configuration in ApeEstimator/ApeEstimator/test/SkimProducer/skimProducer_cfg.py. There, the used track selection is defined in ApeEstimator/ApeEstimator/python/AlignmentTrackSelector_cff.py, and the trigger selection in ApeEstimator/ApeEstimator/python/TriggerSelection_cff.py. The event content is defined in ApeEstimator/ApeEstimator/python/PrivateSkim_EventContent_cff.py.

Which dataset to process is steered via a configurable parameter using the VarParsing, which also allows a local test run.

In order to have the file names correct for the automated workflow, it is necessary to run after the skim production the script ApeEstimator/ApeEstimator/test/SkimProducer/cmsRename.sh.

After having run those two scripts for each dataset that one wants to process, the skim is ready for the automated workflow of the APE estimation. However, the folder of the CAF user diskpool to store the output, and later read it in, is not automised. This needs to be adjusted by the user.

Specific Creation using in addition a List of Good Tracks

In order to allow another event and track selection based on event contents which are already excluded in the AlCaReco files, the configuration defined in ApeEstimator/ApeEstimator/test/SkimProducer/skimProducer_cfg.py allows to read in a event-and-track list in a specific format. Thus, one can apply an event and track selection on the corresponding RECO file (if available...) or on the AOD file if the content is enough for the selection (should be always available), which can of course be done via CRAB.

The list is a TTree in a .root-file, several outputs in case of parallel processing can simply be merged using hadd. The corresponding tools for generating the list respectively reading in the list and select only chosen tracks are placed in ApeEstimator/Utils.

In fact this was used to produce the recent private skims on 2011 data placed in /store/caf/user/hauk/data/DoubleMu/ (the folder name is misleading, in fact the dataset is not obtained using DoubleMu triggers, but SingleMu triggers), and on 2011 MC placed in /store/caf/user/hauk/mc/Summer11/, since the old AlCaReco selection was not optimal (especially the selection on "isolated muons", which had a big fake rate); thus the new one which was applied later to the official streams is applied using this workaround.

The APE Estimation Tool

In order to allow parallel processing, the tool is based on two different modules. The first one (ApeEstimator) reads the events and gathers all relevant information in several .root-files. The second one (ApeEstimatorSummary) then calculates the APE values afterwards, requiring to merge the files from the first step. The tool is automated and based on 4 scripts, which need to be run sequentially, starting the next one only after all actions initiated by the previous one have finished successfully. Since the method is a local method, iterations are necessary, so the chain needs to be repeated. In the following, the configuration of the two modules is explained, the scripts to run are explained in a later section.

In general, all configurations should be done in your final blah_cfg.py. The files blah_cfi.py define the configurable parameters, mainly without doing any selection, and should never be changed, they are only templates. The files blah_cff.py give the default settings, and should only be changed in exceptional cases.

Configuration of ApeEstimator

The ApeEstimator module is coded in ApeEstimator/ApeEstimator/plugins/ApeEstimator.cc, having the configuration template in ApeEstimator/ApeEstimator/python/ApeEstimator_cfi.py with documentation of the configurable parameters, and the default settings in ApeEstimator/ApeEstimator/python/ApeEstimator_cff.py.

For testing purposes there is one configuration file ApeEstimator/ApeEstimator/test/testApeestimator_cfg.py, but the general configuration used in the automated workflow can be found elsewhere as explained later.

Event, Track and Hit Selections

The module contains the possibility of a dedicated hit and cluster selection. However, the cluster selection is common for all pixel modules, respectively common for all strip modules. Some selections are applied to both pixel and strip hits. These selections are based on intervals, you need to specify always pairs of numbers to select specific intervals, e.g for one interval (0.3,0.4) or for three intervals (0.3,0.4, 1.8,1.9,-1.7,-1.5). In case of integers, a single number can be selected by e.g. (3,3). If no number is given, no selection is applied.

The track selection is hardcoded and can only be switched on and off. This has historical reasons, and should probably be excluded, and instead the official AlignmentTrackSelector should be used before the applied refit. This would also guarantee, that the track selection is identical during the iterations, since the change of the APE values might lead to small migrations of the track parameters inside/outside the selection window due to the refit.

Some additional selections and configurations can be applied.

Choose between APE Calculation and Control Plots

Furthermore, there are two important switches, since the module can be used for the calculation of APE values, but also as analyzer only, producing zillions of control plots, including plots for track parameters. Calculating APEs is defined in the cff.py as the module ApeEstimator, setting the switch calculateApe = True. Using the analyzer is defined in the cff.py as the module ApeAnalyzer, setting the switch analyzerMode = True. In principle, one could use both things simultaneously in one module, but this often makes no sense due to the sector definitions explained later: the APEs should be calculated for the whole tracker, while the huge amount of detailed validation plots should be chosen for some exemplary regions. The APE calculation also contains some validation plots which are in principle not necessary for the calculation, but since these are the most important basic validation plots, they are implemented there in order to understand the general quality of the estimated APEs.

Granularity of APEs: Sector Definition

A group of modules which should be analysed combined, and for which the same APE value is calculated and assigned, is called "sector". The sectors can be defined based on all module information stored in the TTree produced by the standalone tool mentioned above. The sector definitions need to be given to the ApeEstimator using the parameter Sectors, which is a VPSet. An empty template, not selecting anything but defining all selection parameters, is in ApeEstimator/ApeEstimator/python/SectorBuilder_cfi.py, explaining the possible arguments. Each sector is defined as a PSet, already defined sectors for the subdetectors are given in ApeEstimator/ApeEstimator/python/SectorBuilder_Bpix_cff.py, ApeEstimator/ApeEstimator/python/SectorBuilder_Fpix_cff.py, ApeEstimator/ApeEstimator/python/SectorBuilder_Tib_cff.py, ApeEstimator/ApeEstimator/python/SectorBuilder_Tid_cff.py, ApeEstimator/ApeEstimator/python/SectorBuilder_Tob_cff.py, ApeEstimator/ApeEstimator/python/SectorBuilder_Tec_cff.py. Further subdefinitions should be built in the same way as shown there. It is important to assign to each sector a name reflecting clearly the exact definition, because this can be found in all .root-files and in printouts and also histogram names, in order to see which sector the results are for. All sector definitions are then gathered in the only file to include, ApeEstimator/ApeEstimator/python/SectorBuilder_cff.py. There, the two important sector definitions (VPSets) which are used at present can be found, it is ValidationSectors for the tool in analyzer mode having the full set of validation plots, and should contain only those sectors where one wants to have a closer look at, and RecentSectors, which defines the granularity for the APE calculations, and should span the whole tracker.

Configuration of the Cluster Parameter Estimator (CPE)

The configuration of the CPE which should be used in the refit is given in ApeEstimator/ApeEstimator/python/TrackRefitter_38T_cff.py. There it is chosen which PixelCPE and which StripCPE should be used. The recent one in use is called TTRHBuilderGeometricAndTemplate, but of course the parameters can be changed also in the specific cfg.py, your configuration. But you need to ensure that it is also included in the refit definition, see below.

Configuration of the Refit

The refit itself is also defined in ApeEstimator/ApeEstimator/python/TrackRefitter_38T_cff.py. There the CPE has to be specified by its ComponentName, which is for the one mentioned above WithGeometricAndTemplate. Very important parameters which might have an influence on the results are the ones steering the hit rejection (outliers and bad pixel template fits). Again, this can be overwritten in your specific configuration.

For the refitter, a sequence is defined which needs to be included in the cfg.py, since the refit also needs the offlineBeamSpot. It also contains the selection of tracks flagged as of highPurity, since in many alignment tasks only those are selected, and so it is done here. There, one could also apply the track selection instead of within the ApeEstimator, but in the present configuration this is not done, it selects only for highPurity.

Configuration of the Geometry and the GlobalTag

The global tag and the geometry need to be specified in the cfg.py. But never change the APE, this always has to be the design one with zero APE everywhere. During the iterations of the automated workflow, the correct APE object as created in the previous iteration is taken automatically.

Output

The final output of the ApeEstimator is one file containing the relevant distributions for the second step, the ApeEstimatorSummary, and all validation plots. The output is structured in numerated folders for the defined sectors. Within each folder there is a histogram z_name, which contains only the name given to the sector and allows its identification.

Configuration of ApeEstimatorSummary

The ApeEstimator module is coded in ApeEstimator/ApeEstimator/plugins/ApeEstimatorSummary.cc, having the configuration template in ApeEstimator/ApeEstimator/python/ApeEstimatorSummary_cfi.py with documentation of the configurable parameters, and the default settings in ApeEstimator/ApeEstimator/python/ApeEstimatorSummary_cff.py. The module needs as input a file (parameter InputFile) produced with the ApeEstimator.

For testing purposes there is one configuration file ApeEstimator/ApeEstimator/test/testApeestimatorSummary_cfg.py, but the general configuration used in the automated workflow can be found elsewhere as explained later.

Choose between Calculation of APEs or Setting the Baseline from Design MC

When setting the flag setBaseline = True, the nominal residual width for each defined sector is estimated. I.e. that not APE values are calculated, but the nominal residual width which returns exactly APE=0 is estimated. This should be done on design MC only, to get this as a reference instead of a fixed assumption of the nominal residual width. A .root-file containing a TTree is created, specified by parameter BaselineFile.

If the flag is not set, APEs are calculated. If a baseline .root-file was produced and is specified by BaselineFile, then the nominal residual width is read from this file for each sector. If no such file is found, the assumption of residual width equal to 1 is used for each sector. The calculated APE values are also stored in a TTree of a .root-file, specified by IterationFile. However, this file is not created newly when you run the tool again, since it is used for iterations of the APE. If it is found, the last stored entry is assumed to be the squared APE as estimated in the previous iteration, and the estimated squared correction is added to it and gives the new APE, which is stored as new entry in the TTree. The APE values for each module contained in a sector are written to an ASCII-file specified by ApeOutputFile in the format needed to use the module Alignment/CommonAlignmentAlgorithm/python/ApeSettingAlgorithm_cfi.py to create a DB object. The configuration of the tool creating the DB object as it is used during the automated workflow can be found in ApeEstimator/ApeEstimator/test/cfgTemplate/apeLocalSetting_cfg.py.

Parameters steering the APE Calculation

How the weight of the individual intervals in the residual resolution for the APE calculation within one sector should be estimated is specified by apeWeight, where the variant with "entriesOverSigmaX2" works best.

The minimum number of hits for using an interval in the calculation is defined by minHitsPerInterval.

The parameter sigmaFactorFit was used earlier to use a two-step Gaussian fit procedure, but it caused some instabilities due to obvious non-Gaussian behaviour in several intervals. Thus it is not used at all in the present implementation of the code, except for additional plots. In the recent implementation, a Gaussian is fit to each residual distribution in each interval spanning the full range. This second fit should then fit only the core, with +- the specified factor times the width of the first fit around the mean of the first fit.

The parameter correctionScaling is used as the damping factor to avoid overestimations and thus convergence problems due to the correlations of the APEs of all modules penetrated by a track. The estimated squared correction is scaled with this factor.

The two parameters smoothIteration and smoothFraction should not be used and removed from the code. This was another implementation of the damping factor for the iterations, which is mathematically equivalent.

Output

The output of the tool are the following files. There is a .root-file containing a set of validation plots important for the APE estimate. In the mode setting the baseline, the baseline .root-file mentioned above is produced. In the mode for calculating the APE, the .root-file for the iterations of the APE value and the ASCII-file with the APE values used for producing a DB object as mentioned above is produced.

Scripts for Automated Workflow

The automated workflow is based on four scripts, which need to be run in the specific order, starting the next one only after all actions of the previous one have finished successfully. They are run for each iteration, and do the following action:

  • createStep1.bash iterationNumber [ lastIteration ]
This produces all relevant scripts defining the ApeEstimator step. It creates scripts for parallel processing, one for each regarded skimmed file. The scripts can be found in ApeEstimator/ApeEstimator/test/batch/workingArea/. It is the only script which needs to be adjusted, the other three are identical for each possible operation. The samples which should be read, and the alignment geometry, are specified here. But in fact, there are only parameters specified which are then used in the config template ApeEstimator/ApeEstimator/test/cfgTemplate/apeEstimator_cfg.py, using the VarParsing. So what is really selected is coded in the config template.

  • startStep1.bash
The scripts produced in the previous step are sent to the batch farm, i.e. the ApeEstimator is run parallelised, based on the config template ApeEstimator/ApeEstimator/test/cfgTemplate/apeEstimator_cfg.py. In order to gain speed, the optimised skimmed files as explained above are copied to the batch machine and processed there locally. The output is stored in ApeEstimator/ApeEstimator/hists/workingArea/.

  • createStep2.bash iterationNumber [ setBaseline ]
This script does all necessary preparatory action for the final calculations. The final output directory is created which is specific for baseline setting or APE calculation mode, and in the latter case also to the iteration number. All .root-files which are the result of the first step are merged using hadd, the merged file is moved to the final output directory. The script needed for the second iteration is created, which is ApeEstimator/ApeEstimator/test/batch/workingArea/summary.bash. The script is based on the template ApeEstimator/ApeEstimator/test/cfgTemplate/summaryTemplate.bash. In case of iterations the .root-file containing the APE value of the previous iteration is copied to the final output directory.

  • startStep2.bash
This script runs the script prepared in the previous step. This means running the ApeEstimatorSummary based on the template ApeEstimator/ApeEstimator/test/cfgTemplate/apeEstimatorSummary_cfg.py, and the output is stored in the final output directory created in the previous step. Furthermore, in case of APE calculation mode, it runs the creation of the DB object, based on the template ApeEstimator/ApeEstimator/test/cfgTemplate/apeLocalSetting_cfg.py. Finally it does a cleanup of all scripts used in this iteration (produced in the two create-steps). In case of the creation of a DB object for the APEs (APE calculation mode), the printout of the cmsRun job shows one error. This is expected, since the DB object creation is based on the AlignmentProducer, which is a generic tool and expects some information which is not given and not necessary for writing the APE object. The DB object should be created correctly.

The way of how to use these scripts is explained in the following for setting the baseline respectively calculating APEs.

Set Baseline from Design Simulation

The scripts are in ApeEstimator/ApeEstimator/test/cfgTemplateDesign/. But only the createStep1.bash is also in this folder in CVS, the others are copied from ApeEstimator/ApeEstimator/test/cfgTemplate/, since they are identical for all possible operations.

The scripts need to be run only once, since iterations make no sence for setting the baseline. Do the following steps:

  • cd ApeEstimator/ApeEstimator/test/cfgTemplateDesign/workingArea/
  • bash ../createStep1.bash 0
  • bash ../startStep1.bash
  • bash ../createStep2.bash 0 True
  • bash ../startStep2.bash

The final output directory where all files end up is ApeEstimator/ApeEstimator/hists/Design/baseline/.

Calculate APE Parameters

There is one directory prepared for calculating APEs on MC using misaligned geometries, and one directory for calculating APEs on data. Depending on what you want to analyse, do for MC:

  • cd ApeEstimator/ApeEstimator/test/cfgTemplateMc/workingArea/

or do for real data:

  • cd ApeEstimator/ApeEstimator/test/cfgTemplateData/workingArea/

The relevant scripts are placed in this folder, but only the createStep1.bash is also in this folder in CVS, the others are copied from ApeEstimator/ApeEstimator/test/cfgTemplate/, since they are identical for all possible operations.

For all but the last iteration, do the following steps, where iterationNumber needs to be replaced by the number of the iteration, starting with 0:

  • bash ../createStep1.bash iterationNumber
  • bash ../startStep1.bash
  • bash ../createStep2.bash iterationNumber
  • bash ../startStep2.bash

For the last iteration, do:

  • bash ../createStep1.bash iterationNumber True
  • bash ../startStep1.bash
  • bash ../createStep2.bash iterationNumber
  • bash ../startStep2.bash

In the first iteration (called iteration 0), also validation plots of the analyzer mode of ApeEstimator are created automatically. During the iterations, this is not done, only the relevant things for the iterations are produced. If you specified your last iteration as stated above, again a set of validation plots is created. This allows the comparison and the automated production (see later) of validation plots, comparing the distributions before iterations (with APE=0) to distributions after iterations (final APE as estimated by the tool). The tool is normally used with 15 iterations, so running iteration 0,...14 with the first set of commands, and then running iteration 15 with the second set of commands. This very last iteration (called iteration 15) is not to do another iteration, but to get the validation on the APE after the 15 iterations 0,...14. If you would like to use another number of iterations, it is not a problem, but some of the automated scripts producing validation plots as explained later need to be adjusted.

Be aware that the APE DB object of the iteration called iteration 14 is your final result, and not the APE DB object of the one called iteration 15!

All output except of the DB object is stored in ApeEstimator/ApeEstimator/hists/workingArea/iter*/, where the * corresponds to the number of the given iteration. The important ones are iter0 (containing the validations with zero APE), iter14 (containing the results concerning the estimated final APE values), and iter15 (containing the validations with the final APE).

The DB objects are stored in ApeEstimator/ApeEstimator/hists/apeObjects/. Always use the one named apeIter14.db as final result, the one named apeIter15.db is only produced due to the automation.

Scripts producing Validation Plots

Scripts for producing validation plots based on root-macros can be found in ApeEstimator/ApeEstimator/macros/. There are two different scripts to run. First do:

  • cd ApeEstimator/ApeEstimator/macros/

The first script produces all validation plots of iteration 0, and prints them in .pdf-files. This is done for the design MC (the baseline), and for the geometry under study (data or misaligned MC). They are mainly used for optimising the track and hit selection. These validation plots can only be obtained for the sectors which are defined in the analyzer mode module of the ApeEstimator, but there are also files with specific validation plots which are produced for each sector defined in the APE calculation of the ApeEstimator. To produce them run:

  • bash ./apeOverview.sh

The output is stored in ApeEstimator/ApeEstimator/hists/plots/, the subfolder ideal/ contains those of the design MC, the subfolder data/ those of the geometry under study.

The other script produces single .eps-files containing one histogram each, and overlays them for design geometry, and geometry under study. These are the most important plots, but others could in principle be added in the root-macros. To produce them run:

  • bash ./drawPlotAndIteration.sh
  • bash ./sortPlots.sh

The output is stored in ApeEstimator/ApeEstimator/hists/workingArea/iter*/plots/, where the * stands for 0, 14 or 15, corresponding to the iteration where they are obtained from. They are sorted in subfolders. Important are the following plots in the following subfolders.

  • The calculated APE values can be found in iter14/plots/result/.
  • The important validation plots are in iter15/plots/Sector/ and iter15/plots/Track/.
  • For the modelling of data, look at iter0/plots/Sector/ and iter0/plots/Track/.

The last bullet (iter0) is not that important, since the distributions are also overaid in the the one above (iter15), together with the distributions of the final APE. But there the distributions are not scaled to integral, in order to see the absolute number of events/tracks/hits, thus the statistics of the modelling.

If one wants to overlay the resulting APE values for different geometries, this can be done with the corresponding root-macro, no explicit script exists. The macro is ApeEstimator/ApeEstimator/macros/commandsDrawComparison.C.

References

The method used for the APE estimation is described in the following two documents:

-- JohannesHauk - 24-Jul-2012

Edit | Attach | Watch | Print version | History: r24 | r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r9 - 2014-05-02 - AjayKumar
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback