Neural Network Tools in Brunel
Neural Network Tools (NNTools) in Brunel
Neural Networks (NN) can be built on a set of input variables using various
software packages. The variables should have a separation power on the quality
which is of interest to the user. The advantage of a NN is that it combines the input
variables in an output value that has a superior discrimination power to any
of the input variables. NNs were used on level two trigger in H1 experiment
to select events with
elastic J/Ψ 
.
In the
CDF
experiment
searches for Electroweak Single-Top-Quark were done using the NN.
In the software of the LHCb experiment the NN are used for
identification of the ghost tracks out of the tracks found in pattern
recognition algorithms. The ghost tracks are those tracks that could be associated with a reconstructible
MC particle.
The NN tools presented here combine various tracks variables and can be found
in
CVS at
Tr/NNTools/v1r1. This version will be available starting with
Brunel
version
v32r2.
The final product of the
NNTools in
v1r1 is a
Ghost Probability which is stored
as additional information for each of the best long tracks (
BLTr). Under
BLTr one understands
in the LHCb software the tracks which are stored in the best track container
(LHCb::TrackLocation::Default) and, as their history says, were found by the
long track pattern recognition algorithms TrackMatching and PatForward.
Users who want to apply the
Ghost Probability in an analysis at the
hadronic final state level should read
the first two
sections.
In case a more detailed description is needed, the reader is advised to go
through all sections that follow. With the examples presented here, the reader
should be able to build up any type of NN in a relatively short time. The
users that have new tools which can improve or expand the actual
NNTools version and would
like to commit them in the LHCb software are kindly ask to contact the
maintainer of the
NNTools.
The last section of this page hosts talks and LHCb notes about various tests
performed with NN.
Setting the Options Files for NNTools
The
NNTools is not yet a default tool in the Brunel reconstruction. Thus the .dst files used in
DaVinci have to be re-done. In order to re-do the .dst files one needs the corresponding .digi files. Once the .digi files are known a
Brunel job can be run to produce the new .dst files.
In order to include in the new .dst file the
Ghost Probability of a track one needs to call the
NNTools algorithm in the
RecoTracking.opts
file. Thus the following lines need to be added in the mentioned options file:
// Tracking sequence
RecoTrSeq.Members += { "ProcessPhase/Track" };
Track.DetectorList = { "ForwardPat", "ForwardPreFit", "ForwardFit"
, ...,
, "NNTools"
};
// Forward pattern
TrackForwardPatSeq.Members += { "PatForward" };
#include "$PATALGORITHMSROOT/options/PatFwdTool.opts"
#include "$PATALGORITHMSROOT/options/PatForward.opts"
...
// Match pattern
TrackMatchPatSeq.Members +={ "TrackMatchVeloSeed/TrackMatch" };
#include "$TRACKMATCHINGROOT/options/TrackMatchVeloSeed.opts"
....
// --------------------------------------------------------------------
// NeuralNet ghost probability TMVA package
TrackNNToolsSeq.Members +={"NeuralNetTmva"};
#include "$NNTOOLSROOT/options/NeuralNetTmva.opts"
// --------------------------------------------------------------------
// NeuralNet ghost probability TMultiLayerPercepton package
// TrackNNToolsSeq.Members +={"NeuralNetTmlp"};
// #include "$NNTOOLSROOT/options/NeuralNetTmlp.opts"
An example, for downloading, of options file which use the
NNTools is presented here in
RecoTracking.opts.
As one can observe from the previous example,
NNTools has also its own options files. These options files are specific to each
type of NN.
The options file for methods from the TMVA package,
NeuralNetTmva.opts, contains the followings:
//== Options for NNTools
NeuralNetTmva.pathWeightsFile = "$NNTOOLSROOT/weights/";
NeuralNetTmva.tmvaMethod = "CFMlpANN method"; //default: "CFMlpANN";
NeuralNetTmva.minValueNN = 0.; //default: 0.;
NeuralNetTmva.maxValueNN = 1.; //default: 1.;
NeuralNetTmva.nameWeightsFileM = "MVAnalysis_m_CFMlpANN.weights.txt";
NeuralNetTmva.nameWeightsFileP = "MVAnalysis_pf_CFMlpANN.weights.txt";
//NeuralNetTmva.tmvaMethod = "BDT method";
//NeuralNetTmva.minValueNN = -0.840768;
//NeuralNetTmva.maxValueNN = 0.820418;
//NeuralNetTmva.nameWeightsFileM = "MVAnalysis_m_BDT.weights.txt";
//NeuralNetTmva.nameWeightsFileP = "MVAnalysis_pf_BDT.weights.txt";
and one can modify it to include the needed TMVA method. The user is advised not to use the TMlpANN method due to the fact that it requires a directory
/weights/ where the job runs in order to write a temporary file.
If one would like to run the NN built using TMultiLayerPerceptron package the following options file
NeuralNetTmlp.opts is needed:
//== Options for NNTools
NeuralNetTmlp.pathWeightsFile = "$NNTOOLSROOT/weights/";
NeuralNetTmlp.nameWeightsFileM = "weights_M.mlp";
NeuralNetTmlp.nameWeightsFileP = "weights_P.mlp";
The NNs are using internal variables from the pattern recognition algorithms. These variables are calculated inside the algorithms using additional loops.
The users who would like to avoid the additional loops, e.g. for trigger studies, have to modify the following options files.
For PatForward, a
PatForward.opts file has to be used:
//== Options to switch on or off the loops for NN variables
PatForward.writeNNVariables = false; //default: true;
In case of TrackMatching algorithm a
TrackMatchVeloSeed.opts
file is needed:
//== Options to switch on or off the loops for NN variables
TrackMatch.writeNNVariables = false; //default: true;
The value of the option
writeNNVariables is set by default to
true.
Run NNTools Local, on Farm or on Grid
The Brunel executable can be run from the prompter of a shell using the command:
>./slc4_ia32_gcc34/Brunel.exe options/myoptions.opts
Even if the syntax command is very simple, the users are recomanded to run
Brunel jobs with
NNTools, and also in general,
using python scripts in
ganga
environment.
The user has to check first if ganga is installed. If this is not the case, the user needs to install it from
http://ganga.web.cern.ch/ganga/download/
. This should
not take longer than 30 seconds, as it is advertised on their web page.
The
Brunel job submission is done using the
Brunel_Ganga.py script and the command:
>GangaEnv 4.4.2
>ganga Brunel_Ganga.py
In order to check the job status one has to type:
>ganga
In [1]:jobs
and the list of all jobs of the user will be displayed together with details about their status.
More ganga commands can be found at
http://ganga.web.cern.ch/ganga/user/html/GangaIntroduction/
.
Running under ganga requires also the right setting of the path where the output files
should be written. This is different than for the usual running procedure as one can see
in the following example:
under ganga running
//---------------------------------------------------------------------------
// Event output
//---------------------------------------------------------------------------
DstWriter.Output = "DATAFILE='$BRUNELOPTS/../myfile.dst' TYP='POOL_ROOTTREE' OPT='REC'";
//////////////////////////////////////////////////////////////////////////////
usual running
//---------------------------------------------------------------------------
// Event output
//---------------------------------------------------------------------------
DstWriter.Output = "DATAFILE='PFN:myfile.dst' TYP='POOL_ROOTTREE' OPT='REC'";
//////////////////////////////////////////////////////////////////////////////
The big advantage of using ganga is that the user can change the location where the job will be run modifying only
a job option, as illustrated in the following lines:
#-------------------------------------------------------------------------------
# Define where to run
#-------------------------------------------------------------------------------
# Run interactively
# myBackend = Interactive()
# Run directly on the local machine, but in the background
# myBackend = Local()
# Submit to an LSF batch system, using the 8nm queue
# for other queues see http://wwwpdp.web.cern.ch/wwwpdp/lsf/LSF-at-CERN.html
# queue 8 NCU minutes
myBackend = LSF( queue = '8nm' )
# Submit to the grid. Requires a working grid certificate of course :)
# myBackend = Dirac()
#-------------------------------------------------------------------------------
Having the same script one can run on local machine, e.g. lxplus, on farm, e.g. LSF batch system,
or on grid.
Running the job on grid requiers a valid grid certificate. Details about how can a grid certificate
can be obtained and more can be found at
Brunel on Grid.
Neural Network Variables
The input variables for the NNs can be classified in two classes: as general track variables and
specific variables to the pattern recognition algorithms.
An overview of the input variables can be seen in the following table:
The meaning and the separation power of the input variables can be seen in the note
LHCb-2007-xxx
.
The variable N
Com TS in PatForward algorithm represents the number of
the candidates of a track which have more than 70% of the corresponding Tracking Station (TS) hits in common.
In the TrackMatching algorithm the N
Com TS gives the number of the candidates of a track which have the corresponding TS segment in common.
Types of Neural Networks
In order to identify the ghost tracks two NN were built on software packages available in ROOT:
MultiLayerPerceptron and
Toolkit for Multivariate Data Analysis (TMVA).
Before the steps of how a NN is built are presented, the user should notice that the traking sample to be used has to be split
in two independent NTuples: one for the training procedure and one for the testing of the NN. Every NN has its own internal
procedure for testing. In the next sub-sections the testing refers more to the situation when a NN evaluates tracks from a new
tracking sample - this will be in the end the main usage of the NN.
In
CVS at
Tr/NNTools/v1r1/src/ the
NeuralNetNTuple.cpp is the tool that fills the NTuple with all track variables needed for training and testing of the NN. To write out this NTuple one has to add in the options file
BrunelCheck.opts from
Rec/Brunel/v32r1/options the lines:
CheckPatSeq.Members += { "PatLHCbID2MCParticle"
, ...,
, "NeuralNetNTuple"
};
The NTuple for training is storing only the input variables of the NN. An example of ROOT macro for filling this NTuple can be seen in
Fill_InputNtuple_BestLong_M.C. This ROOT macro fills the input variables for
BLTr with a Match track history. The testing NTuple should contain all variables needed for various tests, e.g. for
BLTr of Match type see
Fill_TestNtuple_BestLong_M.C. Similar, one should write ROOT macros for
BLTr with a PatForward track history.
Due to the specific variables the training and testing procedure has to be done as follows. The best long tracks are separated according to their history, Match or PatForward tracks, in two NTuples. An individual training is applied on each sample and two weights files are written. In these files the weights for NN decision are stored.
In the testing procedure of the NN the weights files are read with respect to the track history. This approach offers the advantage of an optimal usage of the specific track variables.
The next sections present examples and ROOT macros how to train and test a NN. Comments and suggestions are given
for points that are found to be problematic.
MultiLayerPerceptron
First NN is built using the package
MultiLayerPerceptron
.
The advantage of this package is its relatively simple set-up.
Training Neural Network
The training is based on a
mlp object and its variables defined as follows:
TMultiLayerPerceptron *mlp = new TMultiLayerPerceptron
("@eta,@chi2Prob,@ndf,@TThits,@VeLohits,@Dchi2,@NComOT,@chi2M:10:5:ans!",
nInputPtr,"((Entry$%6)<5)","((Entry$%6)==5)");
where
nInputPtr is the NTuple with the tracks sample for testing. More details can be seen in the ROOT macro
Train_Tmlp_TrNN.C for Match tracks. To run the
training macro, one needs to type :
> root -l Train_Tmlp_TrNN.C
The weights of the NN which will be later use for evaluation are written in a ASCII file. Similar ROOT macros
have to be run for PatForward tracks.
Testing Neural Network
Once the weights file is written, one can run the testing procedure on a independent tracks sample than the training one.
The testing consists in evaluating a track with respect to the variables used as input for NN.
Here one has to mention that all methods available to define a
mlp object require a NTuple.
This is a bit problematic if one would like to run the evaluation directly on tracks of an event without storing their variables
in an NTuple. Thus an empty NTuple has to be defined for such a direct approach, otherwise the NTuple with the testing tracks sample has to be used. One can improve this, writing a method to define the
mlp object without the need of including an NTuple
as parameter.
A ROOT macro example can be seen in
Run_Tmlp_NN.C. For calculation of the ghost rate an example is shown in
GetGhostRate.C. The track effciency is calculated with
GetTrackEfficiency.C which requires
GetNumRecMCpart.C.
With the examples presented here everyone should be able to set up a NN using the
MultiLayerPerceptron
.
The implementation of the MultiLayerPerceptron evaluation in
Brunel can be found in
Tr/NNTools/v1r1/src.
TMVA Neural Network
The NN built using the TMVA package returns to user more informations than the MultiLayerPerceptron one. Histograms as
Background Rejection vs.
Signal Efficiency or
Ouput MVA Variables can be seen running
TMVAGui.C. The files required by the TMVA in
ROOT v5.14 can be found in
TMVA_files.tar.gz.
Training Neural Network
The training of the NN is done using a TMVA::Factory object:
TMVA::Factory *factory = new TMVA::Factory( "MVAnalysis", outputFile, "");
The input variables are added to this object and the number of tracks to be used for training and internal testing of
the NN, see
Train_Tmva_NN_M.C for Match tracks. The outputs of the training procedure are the weights files of the TMVA methods which are written in a directory, automaticaly built,
/weights/ . A similar ROOT macro should be run for PatForward tracks.
Testing Neural Network
The evaluation of the tracks from a independent sample is done with a TMVA::Reader object:
TMVA::Reader *reader_M = new TMVA::Reader();
as one can see in
Run_Tmva_NN.C.
In
ROOT version
v5.14 the TMlpANN method of the TMVA is limited to run having the weights file
only
in a directory named
/weights/. If the location of the weights file is changed the method cannot create its temporary
file and the evaluation stops.
For the TMVA methods CFMlpANN, MLP and BDT this is not the case. The user should take care about
this detail in the option files.
The implementation of the TMVA evaluation in
Brunel can be found in
Tr/NNTools/v1r1/src.
Talks and LHCb Notes about Neural Network
Track ghost reduction using a Neural Net
, T-Rec Meeting, Adrian Perieanu
Ghost Reduction
, Tracking and Alignment Workshop - Amsterdam, Adrian Perieanu
Ghosts
, T-Rec Meeting, Adrian Perieanu
Identification of Ghost Tracks using Neural Networks
, LHCb-2007-draft2, Adrian Perieanu
For comments or requests please send an e-mail to
Adrian Perieanu.