DaVinci Tutorial 9 -- Multivariate Selection


This tutorial introduces the usage of the MVADictTools package to add multivariate classifiers to your selection. It shows how to add a cut on the response to a TMVA classifier to a CombineParticles selection and how to write the classifier output into an ntuple. This tutorial requires DaVinciTutorial4.

How to add a new classifier backend to the framework is explained in a separate tutorial: DaVinciTutorial9a.

Another application of the framework introduced here is retrieving information from a decay-tree that has been refitted with the DecayTreeFitter: DaVinciTutorial9b


Make sure you understand DaVinciTutorial4. The present tutorial builds on the solution of tutorial 4. DaVinciTutorial6 will be useful to undestand how an ntuple is filled. DaVinciTutorial7 is essential to understand the syntax how variables are accessed via LoKi Functors.

To access the material used in this tutorial you need to add the MVADictTools package to your workspace:

getpack Phys/MVADictTools HEAD 
cd Phys/MVADictTools/cmt 
cmt make 

The tutorial assumes that you are aware of the workings of TMVA, how to train and test a multivariate classifier and how to export it to xml in order to apply it to data. For the purpose of this tutorial there is a trained BDT available at Phys/MVADictTools/options/TestPhi2KK.xml. Note that this is only meant for testing purposes - no guarantee for the performance of this BDT is given!

MVADictTools/options contains example scripts including the solutions to this tutorial and MVADictTools/python contains useful helper functions that will be used and explained below.

Adding a Cut on a BDT response to the selection

A typical scenario for the application of multivariate classifiers is the refinement of the selection of composite particles. Here we are going to add a BDT cut to the selection of the Phi, decaying into two Kaons in DaVinciTutorial4. The Phi selection can be defined as

_phi2kk = CombineParticles("Phi2KK") 
_phi2kk.DecayDescriptor = "phi(1020) -> K+ K-" 
_phi2kk.CombinationCut = "ADAMASS('phi(1020)')<50*MeV"  
_phi2kk.MotherCut = "(VFASPF(VCHI2/VDOF)<100)"

Now let's replace the cut on the vertex chi2 by a simple BDT classifier which instead uses pT and IPCHI2 variables of the composite and the daughters.

We are going to read the BDT response out of a dictionary. The corresponding LoKi functor is straight forward:

_phi2kk.MotherCut = "VALUE('LoKi__Hybrid__DictValue/MVAResponse')> -0.5"

Note that MVAResponse is just an arbitrary name here. The DictValue tool now provides a place where we can plug our BDT. We are going to use the TMVA backend to implement the classifier. The BDT definition is stored in an xml file:

weightfile = "TestPhi2KK.xml"

Also, we need to supply the variables that are needed by the classifier. We are using LoKi functors to collect the values into a dictionary. The mechanism is completely anlogous to the Hybrid::TupleTool described in DaVinciTutorial7. Note that the names of the variables defined here have to correspond to the naming scheme in the xml file!

# input variables needed by BDT
Vars = {
     "lab1_PT"               : "PT",
     "lab1_IPCHI2_OWNPV"     : "MIPCHI2DV(PRIMARY)",
     "lab2_PT"               : "CHILD(PT,1)",
     "lab3_PT"               : "CHILD(PT,2)",
     "lab2_IPCHI2_OWNPV"     : "CHILD(MIPCHI2DV(PRIMARY),1)",
     "lab3_IPCHI2_OWNPV"     : "CHILD(MIPCHI2DV(PRIMARY),2)",

After all these preparations we just need one command to plug everything together:

from MVADictHelpers import *

Writing the BDT output to the NTuple

In order to follow this part make sure you understand DaVinciTutorial6, which shows how to fill an NTuple using the TupleTools.

The TupleTool writing out the BDT response should be applied to the appropriate branch of the DecayTree:

tuple = DecayTreeTuple()
tuple.Inputs = ["Phys/SelBs2JpsiPhi/Particles"]
tuple.Decay = "[B_s0 -> (^J/psi(1S) => ^mu+ ^mu-) (^phi(1020) -> ^K+ ^K-)]cc"
tuple.ToolList = []
tuple.UseLabXSyntax = True
tuple.RevertToPositiveID = False
    "Phi" : "B_s0 -> (^phi(1020) -> K+ K-) ? ",

Then the BDT can be added as a TupleTool with a similar helper function as used above:

from MVADictHelpers import *
addTMVAclassifierTuple(tuple.Phi, 'TestPhi2KK.xml', Vars,
                       Name='BDT', Keep=True, Preambulo=[""])

The additional options allow you to customize the tool a bit more. Name allows to name the response variable. Keep will write all the input variables into the ntuple in addition to the BDT response. And the Preambulo can be used to define a list of strings containing Python code that will be executed before the LoKi functors are evaluated.

What is happening behind the scenes?

The MVA tools described above are implemented using the modular framework of the LoKi dictionary tools. The implementation of the LoKi dictionary tools lives in the package Phys/LoKiArrayFunctors. The processing of the multivariate operation is organized as a chain of tools as illustrated in the figure below.


The DictOfFunctors is a tool that builds a dictionary which is filled with the values retrieve though LoKi functors, very much in the same way as the Hybrid::TupleTool works. The main difference is that while the TupleTool writes the variables into the ntuple directly, the DictOfFunctors returns a dictionary instead.

This dictionary can be manipulated by a so called DictTransform. The DictTransform tool strictly speaking only provides the plug-and-play mechanism for whatever algorithm should operate on the variable set. It receives a source dictionary and returns an output dictionary. TMVATransform is the implementation, which applies a TMVA classifier to the source dictionary and returns the classifier response in the output dictionary. More information on how to add your own DictTransform can be found in DaVinciTutorial9a.

Dict2Tuple is a TupleTool, which simply writes all content of a source dictionary to an ntuple.

There is also a DictValue tool which takes one entry in the dictionary - identified by its key - and returns it as a double value. This can be used together with the VALUE functor in order to implement the cut mechanics shown above.

Using the Dictionary Tools to create a Training NTuple

Every multivariate classifier has to be trained before it can be used for a selection. This usually involves creating an ntuple with all the desired variables. The modular design of the dictionary tools allows to reuse a lot of the code between writing the training data set and writing the selection script. This is especially useful to ensure a consistent naming scheme.

To create an ntuple with the desired variables we can use a tool chain as discussed above but without any DictTransform. Let's use this case to demonstrate how a dictionary tool chain is setup in detail:

    "Phi" : "B_s0 -> (^phi(1020) -> K+ K-) ? ",

from Configurables import LoKi__Hybrid__Dict2Tuple as Dict2Tuple
from Configurables import LoKi__Hybrid__DictOfFunctors as DictOfFunctors

# add and configure the Dict2Tuple tool
tuple.Phi.MVAtuple.Source = "LoKi__Hybrid__DictOfFunctors/MVAdict"

# add the DictOfFunctors as source of the Dict2Tuple tool
tuple.Phi.MVAtuple.MVAdict.Variables = Vars  # the dictionary of LoKi functors

The result of this small script is equivalent to using the Hybrid::TupleTool and can be used to create the training ntuple. On top of this, since now there is a point where you have access to the dictionary before it is written into the ntuple, it is trivial to insert whatever manipulations you want to make on the dictionary. Have a look at the helper functions in Phys/MVADictTools/python/MVADictHelpers.py to see how a DictTransform is inserted between the DictOfFunctors and the Dict2Tuple.

-- SebastianNeubert - 02 Dec 2013

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng IParticleDictArchitectures-crop.png r1 manage 5.4 K 2013-12-02 - 16:57 SebastianNeubert Typical chain of dictionary tools
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2016-04-13 - SebastianNeubert
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback