Neural Network Tools in Brunel

Neural Network Tools (NNTools) in Brunel

Neural Networks (NN) can be built on a set of input variables using various software packages. The variables should have a separation power on the quality which is of interest to the user. The advantage of a NN is that it combines the input variables in an output value that has a superior discrimination power to any of the input variables. NNs were used on level two trigger in H1 experiment to select events with elastic J/Ψ . In the CDF experiment searches for Electroweak Single-Top-Quark were done using the NN.

In the software of the LHCb experiment the NN are used for identification of the ghost tracks out of the tracks found in pattern recognition algorithms. The ghost tracks are those tracks that could be associated with a reconstructible MC particle. The NN tools presented here combine various tracks variables and can be found in CVS at Tr/NNTools/v1r1. This version will be available starting with Brunel version v32r2.

The final product of the NNTools in v1r1 is a Ghost Probability which is stored as additional information for each of the best long tracks (BLTr). Under BLTr one understands in the LHCb software the tracks which are stored in the best track container (LHCb::TrackLocation::Default) and, as their history says, were found by the long track pattern recognition algorithms TrackMatching and PatForward.

Users who want to apply the Ghost Probability in an analysis at the hadronic final state level should read the first two sections.

In case a more detailed description is needed, the reader is advised to go through all sections that follow. With the examples presented here, the reader should be able to build up any type of NN in a relatively short time. The users that have new tools which can improve or expand the actual NNTools version and would like to commit them in the LHCb software are kindly ask to contact the maintainer of the NNTools.

The last section of this page hosts talks and LHCb notes about various tests performed with NN.

Setting the Options Files for NNTools

The NNTools is not yet a default tool in the Brunel reconstruction. Thus the .dst files used in DaVinci have to be re-done. In order to re-do the .dst files one needs the corresponding .digi files. Once the .digi files are known a Brunel job can be run to produce the new .dst files.

In order to include in the new .dst file the Ghost Probability of a track one needs to call the NNTools algorithm in the RecoTracking.opts file. Thus the following lines need to be added in the mentioned options file:

// Tracking sequence
RecoTrSeq.Members  += { "ProcessPhase/Track" };
Track.DetectorList  = { "ForwardPat", "ForwardPreFit", "ForwardFit"
                        , ...,
                        , "NNTools"
                       };
// Forward pattern
TrackForwardPatSeq.Members += { "PatForward" };
#include "$PATALGORITHMSROOT/options/PatFwdTool.opts"
#include "$PATALGORITHMSROOT/options/PatForward.opts"
...
// Match pattern
TrackMatchPatSeq.Members +={ "TrackMatchVeloSeed/TrackMatch" };
#include "$TRACKMATCHINGROOT/options/TrackMatchVeloSeed.opts"
....
// --------------------------------------------------------------------
// NeuralNet ghost probability TMVA package
TrackNNToolsSeq.Members +={"NeuralNetTmva"};
#include "$NNTOOLSROOT/options/NeuralNetTmva.opts"
// --------------------------------------------------------------------
// NeuralNet ghost probability TMultiLayerPercepton package
// TrackNNToolsSeq.Members +={"NeuralNetTmlp"};
// #include "$NNTOOLSROOT/options/NeuralNetTmlp.opts"
An example, for downloading, of options file which use the NNTools is presented here in RecoTracking.opts.

As one can observe from the previous example, NNTools has also its own options files. These options files are specific to each type of NN.

The options file for methods from the TMVA package, NeuralNetTmva.opts, contains the followings:

//== Options for NNTools
NeuralNetTmva.pathWeightsFile  = "$NNTOOLSROOT/weights/";
NeuralNetTmva.tmvaMethod       = "CFMlpANN method"; //default: "CFMlpANN";
NeuralNetTmva.minValueNN       = 0.;                //default: 0.;
NeuralNetTmva.maxValueNN       = 1.;                //default: 1.;
NeuralNetTmva.nameWeightsFileM = "MVAnalysis_m_CFMlpANN.weights.txt";
NeuralNetTmva.nameWeightsFileP = "MVAnalysis_pf_CFMlpANN.weights.txt";

//NeuralNetTmva.tmvaMethod       = "BDT method";
//NeuralNetTmva.minValueNN       = -0.840768;
//NeuralNetTmva.maxValueNN       =  0.820418;
//NeuralNetTmva.nameWeightsFileM = "MVAnalysis_m_BDT.weights.txt";
//NeuralNetTmva.nameWeightsFileP = "MVAnalysis_pf_BDT.weights.txt";
and one can modify it to include the needed TMVA method. The user is advised not to use the TMlpANN method due to the fact that it requires a directory /weights/ where the job runs in order to write a temporary file.

If one would like to run the NN built using TMultiLayerPerceptron package the following options file NeuralNetTmlp.opts is needed:

//== Options for NNTools
NeuralNetTmlp.pathWeightsFile  = "$NNTOOLSROOT/weights/";
NeuralNetTmlp.nameWeightsFileM = "weights_M.mlp";
NeuralNetTmlp.nameWeightsFileP = "weights_P.mlp";

The NNs are using internal variables from the pattern recognition algorithms. These variables are calculated inside the algorithms using additional loops. The users who would like to avoid the additional loops, e.g. for trigger studies, have to modify the following options files.

For PatForward, a PatForward.opts file has to be used:

//== Options to switch on or off the loops for NN variables
PatForward.writeNNVariables = false; //default: true;

In case of TrackMatching algorithm a TrackMatchVeloSeed.opts file is needed:

//== Options to switch on or off the loops for NN variables
TrackMatch.writeNNVariables = false; //default: true;

The value of the option writeNNVariables is set by default to true.

Run NNTools Local, on Farm or on Grid

The Brunel executable can be run from the prompter of a shell using the command:

>./slc4_ia32_gcc34/Brunel.exe options/myoptions.opts

Even if the syntax command is very simple, the users are recomanded to run Brunel jobs with NNTools, and also in general, using python scripts in ganga environment.

The user has to check first if ganga is installed. If this is not the case, the user needs to install it from http://ganga.web.cern.ch/ganga/download/ . This should not take longer than 30 seconds, as it is advertised on their web page.

The Brunel job submission is done using the Brunel_Ganga.py script and the command:

>GangaEnv 4.4.2
>ganga Brunel_Ganga.py

In order to check the job status one has to type:

>ganga 
In [1]:jobs
and the list of all jobs of the user will be displayed together with details about their status. More ganga commands can be found at http://ganga.web.cern.ch/ganga/user/html/GangaIntroduction/.

Running under ganga requires also the right setting of the path where the output files should be written. This is different than for the usual running procedure as one can see in the following example:

under ganga running

//---------------------------------------------------------------------------
// Event output
//---------------------------------------------------------------------------
DstWriter.Output = "DATAFILE='$BRUNELOPTS/../myfile.dst' TYP='POOL_ROOTTREE' OPT='REC'";
//////////////////////////////////////////////////////////////////////////////

usual running

//---------------------------------------------------------------------------
// Event output
//---------------------------------------------------------------------------
DstWriter.Output = "DATAFILE='PFN:myfile.dst' TYP='POOL_ROOTTREE' OPT='REC'";
//////////////////////////////////////////////////////////////////////////////

The big advantage of using ganga is that the user can change the location where the job will be run modifying only a job option, as illustrated in the following lines:

#-------------------------------------------------------------------------------
# Define where to run
#-------------------------------------------------------------------------------
# Run interactively
# myBackend    = Interactive()
# Run directly on the local machine, but in the background
# myBackend    = Local()
# Submit to an LSF batch system, using the 8nm queue
# for other queues see http://wwwpdp.web.cern.ch/wwwpdp/lsf/LSF-at-CERN.html
# queue 8 NCU minutes
myBackend    = LSF( queue = '8nm' )
# Submit to the grid. Requires a working grid certificate of course :)
# myBackend    = Dirac()
#-------------------------------------------------------------------------------
Having the same script one can run on local machine, e.g. lxplus, on farm, e.g. LSF batch system, or on grid.

Running the job on grid requiers a valid grid certificate. Details about how can a grid certificate can be obtained and more can be found at Brunel on Grid.

Neural Network Variables

The input variables for the NNs can be classified in two classes: as general track variables and specific variables to the pattern recognition algorithms.

An overview of the input variables can be seen in the following table:

Type of tracks General Variables Specific Variables
Match tracks P(χ2); n.d.f; NTT hits; NVeLo hits; η; NCom TS; χMatch; Δχ
PatForward tracks NCom TS; QPatForward; Δ QPatForward

The meaning and the separation power of the input variables can be seen in the note LHCb-2007-xxx.

The variable NCom TS in PatForward algorithm represents the number of the candidates of a track which have more than 70% of the corresponding Tracking Station (TS) hits in common.

In the TrackMatching algorithm the NCom TS gives the number of the candidates of a track which have the corresponding TS segment in common.

Types of Neural Networks

In order to identify the ghost tracks two NN were built on software packages available in ROOT: MultiLayerPerceptron and Toolkit for Multivariate Data Analysis (TMVA).

Before the steps of how a NN is built are presented, the user should notice that the traking sample to be used has to be split in two independent NTuples: one for the training procedure and one for the testing of the NN. Every NN has its own internal procedure for testing. In the next sub-sections the testing refers more to the situation when a NN evaluates tracks from a new tracking sample - this will be in the end the main usage of the NN.

In CVS at Tr/NNTools/v1r1/src/ the NeuralNetNTuple.cpp is the tool that fills the NTuple with all track variables needed for training and testing of the NN. To write out this NTuple one has to add in the options file BrunelCheck.opts from Rec/Brunel/v32r1/options the lines:

CheckPatSeq.Members += { "PatLHCbID2MCParticle"
                          , ...,
                          , "NeuralNetNTuple"
                         };

The NTuple for training is storing only the input variables of the NN. An example of ROOT macro for filling this NTuple can be seen in Fill_InputNtuple_BestLong_M.C. This ROOT macro fills the input variables for BLTr with a Match track history. The testing NTuple should contain all variables needed for various tests, e.g. for BLTr of Match type see Fill_TestNtuple_BestLong_M.C. Similar, one should write ROOT macros for BLTr with a PatForward track history.

Due to the specific variables the training and testing procedure has to be done as follows. The best long tracks are separated according to their history, Match or PatForward tracks, in two NTuples. An individual training is applied on each sample and two weights files are written. In these files the weights for NN decision are stored.

In the testing procedure of the NN the weights files are read with respect to the track history. This approach offers the advantage of an optimal usage of the specific track variables.

The next sections present examples and ROOT macros how to train and test a NN. Comments and suggestions are given for points that are found to be problematic.

MultiLayerPerceptron

First NN is built using the package MultiLayerPerceptron. The advantage of this package is its relatively simple set-up.

Training Neural Network

The training is based on a mlp object and its variables defined as follows:

TMultiLayerPerceptron *mlp = new TMultiLayerPerceptron
        ("@eta,@chi2Prob,@ndf,@TThits,@VeLohits,@Dchi2,@NComOT,@chi2M:10:5:ans!",
         nInputPtr,"((Entry$%6)<5)","((Entry$%6)==5)");
where nInputPtr is the NTuple with the tracks sample for testing. More details can be seen in the ROOT macro Train_Tmlp_TrNN.C for Match tracks. To run the training macro, one needs to type :
> root -l Train_Tmlp_TrNN.C
The weights of the NN which will be later use for evaluation are written in a ASCII file. Similar ROOT macros have to be run for PatForward tracks.

Testing Neural Network

Once the weights file is written, one can run the testing procedure on a independent tracks sample than the training one. The testing consists in evaluating a track with respect to the variables used as input for NN.

Here one has to mention that all methods available to define a mlp object require a NTuple. This is a bit problematic if one would like to run the evaluation directly on tracks of an event without storing their variables in an NTuple. Thus an empty NTuple has to be defined for such a direct approach, otherwise the NTuple with the testing tracks sample has to be used. One can improve this, writing a method to define the mlp object without the need of including an NTuple as parameter.

A ROOT macro example can be seen in Run_Tmlp_NN.C. For calculation of the ghost rate an example is shown in GetGhostRate.C. The track effciency is calculated with GetTrackEfficiency.C which requires GetNumRecMCpart.C.

With the examples presented here everyone should be able to set up a NN using the MultiLayerPerceptron. The implementation of the MultiLayerPerceptron evaluation in Brunel can be found in Tr/NNTools/v1r1/src.

TMVA Neural Network

The NN built using the TMVA package returns to user more informations than the MultiLayerPerceptron one. Histograms as Background Rejection vs. Signal Efficiency or Ouput MVA Variables can be seen running TMVAGui.C. The files required by the TMVA in ROOT v5.14 can be found in TMVA_files.tar.gz.

Training Neural Network

The training of the NN is done using a TMVA::Factory object:

TMVA::Factory *factory = new TMVA::Factory( "MVAnalysis", outputFile, "");

The input variables are added to this object and the number of tracks to be used for training and internal testing of the NN, see Train_Tmva_NN_M.C for Match tracks. The outputs of the training procedure are the weights files of the TMVA methods which are written in a directory, automaticaly built, /weights/ . A similar ROOT macro should be run for PatForward tracks.

Testing Neural Network

The evaluation of the tracks from a independent sample is done with a TMVA::Reader object:

TMVA::Reader *reader_M = new TMVA::Reader();

as one can see in Run_Tmva_NN.C.

In ROOT version v5.14 the TMlpANN method of the TMVA is limited to run having the weights file only in a directory named /weights/. If the location of the weights file is changed the method cannot create its temporary file and the evaluation stops.

For the TMVA methods CFMlpANN, MLP and BDT this is not the case. The user should take care about this detail in the option files.

The implementation of the TMVA evaluation in Brunel can be found in Tr/NNTools/v1r1/src.

Talks and LHCb Notes about Neural Network

Track ghost reduction using a Neural Net, T-Rec Meeting, Adrian Perieanu

Ghost Reduction, Tracking and Alignment Workshop - Amsterdam, Adrian Perieanu

Ghosts, T-Rec Meeting, Adrian Perieanu

Identification of Ghost Tracks using Neural Networks, LHCb-2007-draft2, Adrian Perieanu

For comments or requests please send an e-mail to Adrian Perieanu.

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt Brunel_Ganga.py.txt r2 r1 manage 5.5 K 2007-11-01 - 17:59 AdrianPerieanu  
C source code filec Fill_InputNtuple_BestLong_M.C r1 manage 3.9 K 2007-11-02 - 13:14 AdrianPerieanu  
C source code filec Fill_TestNtuple_BestLong_M.C r1 manage 4.9 K 2007-11-02 - 13:36 AdrianPerieanu  
C source code filec GetGhostRate.C r1 manage 1.1 K 2007-11-05 - 13:36 AdrianPerieanu  
C source code filec GetNumRecMCpart.C r1 manage 1.7 K 2007-11-05 - 13:37 AdrianPerieanu  
C source code filec GetTrackEfficiency.C r1 manage 1.2 K 2007-11-05 - 13:37 AdrianPerieanu  
Texttxt RecoTracking.opts.txt r1 manage 5.5 K 2007-11-01 - 20:13 AdrianPerieanu  
C source code filec Run_Tmlp_NN.C r2 r1 manage 8.1 K 2007-11-05 - 13:41 AdrianPerieanu  
C source code filec Run_Tmva_NN.C r1 manage 13.5 K 2007-11-05 - 18:58 AdrianPerieanu  
Unknown file formatgz TMVA_files.tar.gz r1 manage 30.2 K 2007-11-06 - 13:56 AdrianPerieanu  
C source code filec Train_Tmlp_TrNN.C r3 r2 r1 manage 3.7 K 2007-11-05 - 18:57 AdrianPerieanu  
C source code filec Train_Tmva_NN_M.C r1 manage 3.4 K 2007-11-05 - 18:58 AdrianPerieanu  
Edit | Attach | Watch | Print version | History: r18 < r17 < r16 < r15 < r14 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r18 - 2007-12-19 - AdrianPerieanu
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback