TQAF Layer 2
Complete:
Introduction
The TQAF Layer 2 is the major top specific working layer based on PAT (TQAF Layer 1) objects. It offers tools for top analyses with the aim to resolve the topology of single top events and top anti-top events in all possible decay channels. At the moment development is driven by the analysis of top anti-top events in the semi-leptonic decay channel. An extension to the full leptonic decay channel is planned to be implemented soon and an extension to the full hadronic channel in analogy to the others should be straight forward. The user should keep in mind that the TQAF subsystem as well as the PAT has the aim to collect tools suitable for top analyses to solve technical and programming issues in order to get people into the physics problems as quick as possible. It does not do the analysis nor does it restrict the user's freedom in configuring, adapting, changing or extending existing implementations according to his/her needs. In some cases (like the choice/implementation of new variables for MVA methods or the choice/implementation of new constraints for the kinematic fit) users are especially encouraged actively to adapt the code according to his/her needs. Descriptions on how to do this are provided below.
In the following the most important tools of the TQAF Layer 2 are described. Their main purpose is to be used (in a fully configurable way) within the full framework their outputs being interfaced as EventHypotheses and corresponding meta information to a flexible comprising structure like the
TtSemiLeptonicEvent, which may be made persistent and used within FWLite in later analysis steps. The tools are fully modularized such that they may be used standalone within the full framework though. If you like to contribute to the further development of one or the other tool or even would like to add new ones your help is appreciated. Please contact
Roger Wolf then.
Kinematic Fit
Contact person
Sebastian Naumann
This section will describe the interface to the
kinematic fit. It still has to be added. In case of questions don't hesitate to ask Sebastian...
Description
Structure
Access
Production
Event Selection (MVA based)
Contact person
Manuel Renz
Description
The
TopEventSelection
package allows to separate "signal" and "background" events using the
MVA package developed by Christophe Saout. In the default implementation TTbar(semileptonic muon channel) events are separated from W+jets events using a Likelihood-Ratio with 10 input-variables. Besides the default the user can utilize any other process as "signal" or "background", implement his own input-variables and switch to Neural-Networks or any other TMVA-based MVA.
Structure
There are two main classes in
TopEventSelection
:
Two further classes are responsible for the calculation of the input-variables and their transfer to the MVA-Trainer:
Access
In case you want to use the default implementation, you have to run
TQAFLayer2
on your input files to get TQAFLayer2 output. Therein the likelihood-output branch is named
double_findTtSemiLepSignalSelMVA_DiscSel_TQAF
and can easily be accessed in your Analyzer. Note: The likelihood-output for events which do not pass the event selection criteria is set to -1. All other events have outputs between 0 and 1, where background events accumulate next to 0 and signal events next to 1.
Production
In case you want to do your own MVA-Signal-Selection, you should first produce TQAFLayer1 output for your files. It is possible to run TQAFLayer1 and the
TraintreeSaver in one step but you may run into performance problems, so it is better to produce TQAFLayer1 output first.
to be continued...
Jet Parton Association (MVA based)
Contact person
Sebastian Naumann
Description
For analyses like differential cross section measurements, the measurement of the top mass and measurements of other top characteristics there is a special need for a full or partial event reconstruction with a proper association of reconstructed jets to the quarks of the top (anti-top) decay chain(s). For the semileptonic channel, the classes
provide an implementation of the
CMS MVA package for multivariate analysis methods to find this proper jet-parton association. They are located in the plugins directory of the
TopJetCombination package
. The CMS MVA package is a centrally provided interface to all kind of multivariate analysis methods as likelihood, neural net and others. It also provides a full interface to the root TMVA package. It takes most technical burdens from the user's shoulders like histogram and event management and preprocessing and de-correlation of input variables. It is steered via xml steering files.
Structure
Access
Production
To produce a new .mva file which can be used as input to calculate the MVA discriminator for jet-parton hypotheses, first run the TrainTreeSaver:
cmsRun TopQuarkAnalysis/TopJetCombination/test/ttSemiLepJetCombMVATrainTreeSaver_cfg.py
The resulting tree will be stored in a file called
train_save.root
. You can then perform the actual training:
mvaTreeTrainer --xslt TopQuarkAnalysis/TopJetCombination/data/TtSemiLepJetCombMVATrainer.xml TopQuarkAnalysis/TopJetCombination/data/TtSemiLepJetComb.mva train_save.root
There are two output files, on one hand
train_monitoring.root
and on the other hand
TtSemiLepJetComb.mva
.
To investigate the resulting
train_monitoring.root
via the ViewMonitoring macro, do:
ln -s $CMSSW_RELEASE_BASE/src/CMS.PhysicsTools/MVATrainer/test/ViewMonitoring.C
root -l ViewMonitoring.C
A small window is popping up that allows to have a look on the variables used by the MVA method (just click the button inputVariables->norm). To have more information on what you are looking at, you can have a look at the
SWGuideMVAFrameworkTutorial.
If you want to take a look at what is happening when you make the .mva file, you can have a look at the input file
TtSemiLepJetCombMVATrainer.xml
in
data/
To understand what is happening, please have a look as well at the corresponding section in the
SWGuideMVATrainer.
By default in TQAF there are 3 processors used, namely
ProcNormalize
,
ProcMatrix
,
ProcLikelihood
. The first one takes care about the normalization of the variables you gave as an input and gives the same number of variables back. The second one is used for the decorrelation of the variables: it checks if there is a linear correlation and calculates and applies a rotation matrix to decorrelate the variables, as an input the normalized variables are used. The output is the same number of variables as given to the input, but now the variables should be decorrelated.
ProcMatrix
is by default not used as input for
ProcLikelihood
!
ProcLikelihood
takes as an input the normalized variables provided by
ProcNormalize
and gives as output 1 variable, namely the discriminator. This discriminator can be used in a next step to select the jet parton association as the one with the highest discriminator value.
The variables currently implemented are:
- the angle between the quarks from the hadronically decaying W boson (function: angleHadQQBar() )
- the angle between the reconstructed W boson and b quark jet from the hadronic decay chain (function: angleHadWHadB() )
- the angle between b quark jet and the lepton from the leptonic decay chain (function: angleLeptonLepB() )
- the angle between the reconstructed top and antitop (function: angleTopTop() )
- the mass difference between the reconstructed top and antitop (function: deltaMTopTop() )
- the mass of the reconstructed hadronically decaying W boson (function: massHadW() )
- the mass of the reconstructed leptonically decaying W boson (function: massLepW() )
- ...
The variables can be found in the class
TtSemiLepJetComb
.
Adding new variables
If you want to use further variables just add them to the
TtSemiLepJetComb
class,
e.g.:
class TtSemiLepJetComb {
// common calculator class for likelihood
// variables in semi leptonic ttbar decays
public:
TtSemiLepJetComb();
TtSemiLepJetComb(const std::vector<pat::Jet>&, const std::vector<int>,
const math::XYZTLorentzVector&, const math::XYZTLorentzVector&);
TtSemiLepJetComb(const std::vector<pat::Jet>&, const std::vector<int>, const math::XYZTLorentzVector&);
~TtSemiLepJetComb();
double angleHadQQBar() const { return ROOT::Math::VectorUtil::Angle(hadQJet, hadQBarJet) * TMath::RadToDeg(); }
...
double deltaPhiMetLepB() const {return ROOT::Math::VectorUtil::DeltaPhi(neutrino,lepBJet);} =
In
TtSemiLepJetCombEval.h
you have to add a line
values.push_back( CMS.PhysicsTools::Variable::Value("deltaPhiMetLepB",jetComb.deltaPhiMetLepB() ) );
to the
evaluateTtSemiLepJetComb
method such that the variable is calculated and written to the vector
values
that is used from the MVATrainer.
You finally have to add the new variables to the xml steering file (e.g.
TtSemiLepJetCombMVATrainer_Muons.xml
). It is important to care of the correct order and naming:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<MVATrainer>
<general>
<option name="id">TtSemiLepJetCombMVATrainer</option>
<option name="trainfiles">train_%1$s%2$s.%3$s</option>
</general>
<input id="input">
<var name="angleHadQQBar" multiple="false" optional="false"/>
</input>
<processor id="norm" name="ProcNormalize">
<input>
<var source="input" name="angleHadQQBar"/>
</input>
<config>
<pdf/>
...
<pdf/>
</config>
<output>
<var name="var1"/>
...
<var name="var11"/>
</output>
</processor>
<processor id="rot" name="ProcMatrix">
<input>
<var source="norm" name="var1"/>
...
<var source="norm" name="var11"/>
</input>
If you finally want to use the result of the training in your analysis, you have to run the TtSemiLepJetCombMVAComputer.
The example config file for cmsRun is
TopQuarkAnalysis/TopJetCombination/test/ttSemiLepJetCombMVAComputer_cfg.py
.
Make sure to put the TtSemiLepJetCombMVAComputer in your path prior to the analyzer in which you want to read in the result of the MVA and to use the correct
.mva
file, i.e. the one you produced in the training performed on your favorite Monte Carlo sample and with your event selection. You might want to replace the path that is given in the
TtSemiLepJetCombMVAComputer_cff.py
, for example by including a line like the following in your config file:
process.TtSemiLepJetCombMVAFileSource.ttSemiLepJetCombMVA = "MySubsystem/MyPackage/data/MyJetComb.mva"
Jet Parton Association (GenEvent based)
Contact person
Sebastian Naumann
Description
This revised version of the jet parton matching has been introduced in a presentation
here. It provides matching of the partons for top quark pair production to jets with four different algorithms as detailed below:
Structure
totalMinDist
Main idea: successively use the jet-parton pair with the smallest

in the event
Procedure:
- calculate
for all
possible jet-parton pairs in the event and store the values in a vector
- sort vector with respect to
- match jet and parton corresponding to the smallest
in the vector
- remove all entries belonging to the matched jet or the matched parton from the vector
- continue matching and removing until vector is empty
Comments:
- default procedure in the TQAF (
TopQuarkAnalysis/TopTools/src/JetPartonMatching.cc
)
minSumDist
Main idea: find the combination of jet-parton pairs with the smallest
Procedure:
- successively find all
possible combinations of jet-parton pairs (using a recursive approach)
- along the way, calculate
for each combination
- store information about a combination if the respective
is smaller than the smallest found so far
- in the end, match jets and partons as in the combination that was found to have the smallest
Comments:
- minimizing
prevents from having the matching for the whole event screwed up by one jet or parton
- disadvantage: large combinatorics
- this approach is used in the CMSSW JetMCAlgos (
CMS.PhysicsTools/JetMCAlgos/plugins/CandOneToOneDeltaRMatcher.cc
) as the BruteForce
algorithm (not the SwitchMode
)
ptOrderedMinDist
Main idea: the position is supposed to be measured with higher accuracy for harder particles than for softer ones
Procedure:
- sort partons with respect to
in descending order
- starting with the hardest parton, find the jet with the smallest
to this parton and match it
- consecutively match the other partons, ignoring jets that have already been assigned to a (harder) parton
Comments:
- it was confirmed by earlier studies that better matching is achieved for hard than for soft jets
unambiguousOnly
Main idea: in order to avoid mismatchings, do not tolerate any ambiguity within some
Procedure:
- for each parton, find all jets that lie within
- if exactly one jet is found, match it to the parton
- if none or more than one jets are found, dismiss the whole event
Comments:
- only suited for clean events with well separated jet-parton pairs
- in this case, the order of the jets and partons when looping over them is of no importance
- very high purity, the lowest efficiency
The code is located in the
TopTools
subdirectory of the package. The configuration files to run the jet parton matching for semi-leptonic and full-hadronic top pair production can be found in the
TopEventProducers
subdirectory of the package.
Access
Production
Event Hypotheses
Contact person
Roger Wolf
This section will describe the currently implemented event hypotheses for top quark analyses of top anti-top event topologies in the semi-leptonic decay channel. It still has to be added. In case of questions don't hesitate to ask Roger...
Description
Structure
Access
Production
--
RogerWolf - 19 Jun 2008