SVN at CERN or Glasgow

Introduction

First install the package as shown at the previous page: AdrianBuzatuJERSVN. Below is how you run it.

Setup

Since we are going to oscillate between working on my Mac and on the Glasgow linux machine (and want to make the code generic for people on other machines, such as CERN or their local institutions), let's have a naming convention. We call "$jer" the location of the code you are running, "$jerroot" the location of the event trees and train trees that we will be creating out of the original files, the "$initialroot" the initial D3PD files which are large, so on the Mac we keep only one for testing and at Glasgow we keep all of them. Also there are the common variables about SVN: "$SVNUSR" the shortcut that will allow you to check out the code and the "$SVN_EDITOR" to allow you to make comments in the SVN commits. I keep these lines in my .profile file on the Mac and the .bashrc on the Glasgow Linux machine.

On Mac:

export SVNUSR=svn+ssh://svn.cern.ch/reps/atlasusr
export SVN_EDITOR="emacs -nw"
export jer="/Users/abuzatu/Work/ATLAS/Analyses/JER/ResolutionNN/trunk"
export jerroot="/Users/abuzatu/Work/ATLAS/Analyses/JER/root_output"
export initialroot="/Users/abuzatu/Work/ATLAS/root/MC12"

On Linux at Glasgow:

export SVNUSR="svn+ssh://svn.cern.ch/reps/atlasusr"
export SVN_EDITOR="emacs -nw"
export jer="/afs/phas.gla.ac.uk/user/a/abuzatu/public_ppe/JER/ResolutionNN/trunk"
export jerroot="/afs/phas.gla.ac.uk/user/a/abuzatu/public_ppe/JER/root_output"
export initialroot="/nfs/atlas/vhbb01/MC12"

Then go to the working directory

cd $jer

Then you need to setup ROOT and the "LD_LIBRARY_PATH". For Mac there is nothing to do, but for Linux you need to do

source setup/setup17Glasgow.sh

If you are at CERN, I have a script that I have not validated yet.

source setup/setup17CERN.sh

Compile the code

make

Create event trees

This step is specific to the data file read. All the steps above will be identical. The data files read are for now ATLAS D3PDs, but they can be in principle ATLAS flattples, CDF flattuples, etc.

./doReadD3PD.sh 

That shows you the options. You can run on a small number of events to test (200) or on all events (-1). You can run on all the D3PDs stored at Glasgow (/nfs) or locally on your laptop (/Users). For testing you run this. The path is from Glasgow, but for one file (the one from the test file list), you can run also locally on Mac.

./doReadD3PD.sh $initialroot $jerroot list/list_process_test.txt 200

For actually running you run this

./doReadD3PD.sh $initialroot $jerroot list/list_process.txt -1

But what we are going to do is a quick test of all processes but just 200 events from each D3PD.

./doReadD3PD.sh $initialroot $jerroot list/list_process.txt 200

This may take you 15 minutes run.

Now check that the event trees files were produced

cd $jerroot
ls 

As you see, there are several files per process (underscore number), as well as the merged file for that process (underscore all). We need to do a bit of bookkeeping now.

mkdir test
mv *.root test
cd test
mkdir merged
mv *_all.root merged
cd merged
ls

Now we will use these files as a base to merge them again in the desired combinations: 1) all mass points per process (llbb, lnubb, nunubb); 2) all these three processes merged per mass point (100 through 140 GeV with 5 GeV increments). For this we have another script.

cd $jer
ls
./doMergeAll.sh ${jerroot}/test/merged $jerroot bjets_event_trees 110+115+120+125+130+135+140
ls $jerroot

This should take less than 1 minute. And this stage which is specific to the data you use is done. All the rest downstream are the same for each initial data structure.

Plot possible input variables

We can now look at possible input variables on a jet-by-jet basis.

./bin/readTree.exe

It will tell you the options. This executable reads the event tree and can either create a training tree or produce plots and studies for the training or testing trees for event-by-event basis or jet-by-jet basis. For now we use it for a NN training tree creation, so we choose option 2.

./bin/readTree.exe 2

We see the instructions and we run for the file that has the 125 mass point with all processes combined. The first "none" says that we do not want to correct the 4-vector of the reconstructed jet for muons in jets. If we wanted, we would put a number for the dR cut value that we want, for example "0.4". The second "none" says that we d ont want to correct further for the NN trained only with Pt as an input variable. If we did, we would write the structure of the NN, for example "@rPt:4:trPt". The canvas saved will have the dimensions 600x400, it will be saved in .eps, .pdf and .gif (1 1 1), in the folder "./plots/". The name of the plots will have the prefix "input_" and suffix "_all" (that is because we will use all the jets in the sample, both leading and subleading, from both events used in training or testing further on by the NN). The following "1" tells us to normalized the plots to unit area and the following "0" to not print debug statements. So we run.

./bin/bjets_readTree.exe 2 $jerroot/bjets_event_trees_125_all.root event_tree event none none 600 400 1 1 1 ./plots/ input_ _all 1 0

You should see a statement like this

***** Start bjets_readTree.exe ***** 
TTree event_tree has 1264 entries.
We put the odd-number events in the training tree and the even-number events in the testing tree, and also put all events in the allreordered tree in such a way that all the leading jets are filled first and all the subleading jets second. That is because the MLP creates its own training tree by taking all odd entries and its own testing tree by taking all even entries, and we want ewual numbers of leading and subleading jets in both the training and testing trees.
Our training-tree-filling algorithm fails miserably for an odd (not even) number of events, so we ignore the last event if an odd number of events passed our selection and matching.
Even number of events saved, so we do not remove the last event.
We will use 1264 entries (events, each with two jets) to create the trees.
Info in <TCanvas::Print>: eps file ./plots/input_Pt_all.eps has been created
Info in <TCanvas::Print>: pdf file ./plots/input_Pt_all.pdf has been created
Info in <TCanvas::Print>: eps file ./plots/input_Eta_all.eps has been created
Info in <TCanvas::Print>: pdf file ./plots/input_Eta_all.pdf has been created
Info in <TCanvas::Print>: eps file ./plots/input_Phi_all.eps has been created
Info in <TCanvas::Print>: pdf file ./plots/input_Phi_all.pdf has been created
Info in <TCanvas::Print>: eps file ./plots/input_E_all.eps has been created
Info in <TCanvas::Print>: pdf file ./plots/input_E_all.pdf has been created
Info in <TCanvas::Print>: eps file ./plots/input_M_all.eps has been created
Info in <TCanvas::Print>: pdf file ./plots/input_M_all.pdf has been created
Info in <TCanvas::Print>: eps file ./plots/input_Jvtxf_all.eps has been created
Info in <TCanvas::Print>: pdf file ./plots/input_Jvtxf_all.pdf has been created
Info in <TCanvas::Print>: eps file ./plots/input_BWeight_all.eps has been created
Info in <TCanvas::Print>: pdf file ./plots/input_BWeight_all.pdf has been created
Info in <TCanvas::Print>: eps file ./plots/input_NTrk_all.eps has been created
Info in <TCanvas::Print>: pdf file ./plots/input_NTrk_all.pdf has been created
Info in <TCanvas::Print>: eps file ./plots/input_SumPtTrk_all.eps has been created
Info in <TCanvas::Print>: pdf file ./plots/input_SumPtTrk_all.pdf has been created
Info in <TCanvas::Print>: eps file ./plots/input_SvpNTrk_all.eps has been created
Info in <TCanvas::Print>: pdf file ./plots/input_SvpNTrk_all.pdf has been created
Info in <TCanvas::Print>: eps file ./plots/input_SvpM_all.eps has been created
Info in <TCanvas::Print>: pdf file ./plots/input_SvpM_all.pdf has been created
Info in <TCanvas::Print>: eps file ./plots/input_SvpLxy_all.eps has been created
Info in <TCanvas::Print>: pdf file ./plots/input_SvpLxy_all.pdf has been created
Info in <TCanvas::Print>: eps file ./plots/input_SvpErrLxy_all.eps has been created
Info in <TCanvas::Print>: pdf file ./plots/input_SvpErrLxy_all.pdf has been created

One key aspect is that if in the file the number of events is odd, we remove the last event. We need even number of events for the book keeping of training vs testing jets in the NN on the jet-by-jet basis. Now you can take a look at the plots. On Mac that is simple

open plots/*.pdf 

These plots show reconstructed quantities filled in either the black histogram is the truth Pt is larger than 60 GeV or in the blue histogram if that is not the case. This allows us to see which variable has a stronger correlation with the truth Pt. This also validates that we filled correctly our variables from the D3PD (no empty histograms or unphysical shapes). Now we are ready to train a NN with using all these variables.

Create training trees

For the desired event trees (typically all) we create training trees to be used by the NN. For this we use the executable ./bin/readTree.exe

./bin/readTree.exe

It will tell you the options. This executable reads the event tree and can either create a training tree or produce plots and studies for the training or testing trees for event-by-event basis or jet-by-jet basis. For now we use it for a NN training tree creation, so we choose option 1.

./bin/readTree.exe 1

This shows us the exact usage example. We can do it for one file at a time, or we can run a dedicated script to do it for all desired files.

./doTrainingTrees

Again when you run this you see the usage example and then we run it (it should take less than 1 minute):

./doTrainingTrees.sh $jerroot event_trees train_trees none none 110+115+120+125+130+135+140+lnubb+llbb+nunubb
ls $jerroot

See now how the training tree files appeared?

Train a NN

We are now ready to train a NN.

./bin/trainNN.exe 

This gives us the usage example. We will use the 125 Gev mass point with all processes combined, so the file "bjets_train_trees_125_all.root". We also choose to train on a jet-by-jet basis ("per_jet"). If you wanted on an event-by-event basis it would be "per_event". We choose 100 epochs for the training. And we choose to have the reconstructed Pt of the jet ("rPt") as an input, 4 hidden layers and the ratio between the truth Pt and reconstructed Pt ("trPt") as the output ("@rPt:4:trPt"). After the training we will get a .cxx and a .h file for the NN (which is in fact a function) and we choose the stem for the name of "NN". We chose the files to be saved in the current folder ("./"). The code will also give us plots of the contributions of different variables to the NN. We have a choice to save as .eps, .pdf and .gif in this order ("1 1 1"). So we run:

./bin/trainNN.exe ${jerroot}/train_trees_125_all.root per_jet 100 @rPt:4:trPt NN ./ 1 1 1

You should see an output like this

output_name_cxx=NN 
output_name=NN 
TTree per_jet has 126396 entries. We will use these for the training.
Info in <TMultiLayerPerceptron::Train>: Using 63198 train and 63198 test entries.
Training the Neural Network
Epoch: 0 learn=0.486451 test=0.491689
Epoch: 10 learn=0.216541 test=0.22559
Epoch: 20 learn=0.212316 test=0.221369
Epoch: 30 learn=0.211771 test=0.220891
Epoch: 40 learn=0.211672 test=0.22079
Epoch: 50 learn=0.211648 test=0.220769
Epoch: 60 learn=0.211644 test=0.22077
Epoch: 70 learn=0.211636 test=0.220763
Epoch: 80 learn=0.211632 test=0.220757
Epoch: 90 learn=0.21163 test=0.220758
Epoch: 99 learn=0.211629 test=0.220756
Training done.
NN.h and NN.cxx created.
Network with structure: @rPt:4:trPt
inputs with low values in the differences plot may not be needed
@rPt -> 0.0443376 +/- 0.0446
Info in <TCanvas::Print>: eps file ./NN.eps has been created
Info in <TCanvas::Print>: pdf file ./NN.pdf has been created
Info in <TCanvas::Print>: gif file ./NN.gif has been created

The error "Error in <TMultiLayerPerceptron::TMultiLayerPerceptron::Train()>: Line search fail" may appear sometimes when we have few events to train on, for example when we use only 200 events from each file. As the number of epochs increases, the learn value goes down and the test value goes down. But after some number of epochs the test value starts to increase. It is then that you start to overtrain. So you need to optimise this way what is the number of epochs to train on.

The lines like this "@rPt -> 0.0443376 +/- 0.0446" will tell you how much a variable contributes to the NN training. When you have several variables, you can rank them by this. It is not very precises method though.

When you run an ls, you see that several files were created.

ls -lrt

We move the plots to the plots folder.

mv NN.eps plots/.
mv NN.pdf plots/.
mv NN.gif plots/.

Now we move the actual NN function files to the "NN" folder, where they overwrite the old files (the old files had to be there, otherwise our code would not have compiled in the first place).

mv NN.h NN/.
mv NN.cxx NN/.

Now we need to recompile this class. Just to avoid some running errors I got when running on Linux, thought it may be redundant, is to make clean first, and then clean for the shared library that corresponds to the NN.

make clean -C NN; make -C NN

Also to avoid other errors I got when running on Linux, thought it may be redundant, is to make clean first, and then clean, for two other shared libraries and one executable that use the new NN.

make clean -C NNUsage; make -C NNUsage
make clean -C Sample; make -C Sample
make clean -C readTree; make -C readTree

Now we are ready to use the new NN to study its impact on the jet energy correction and the mbb resolution.

Study the NN effect

./bin/readTree.exe

It will tell you the options. This executable reads the event tree and can either create a training tree or produce plots and studies for the training or testing trees for event-by-event basis or jet-by-jet basis. For now we use it for a NN training tree creation, so we choose option 3.

./bin/readTree.exe 3

This shows us the exact usage example. We want to use the same file and NN that we trained. We will do two sets of plots and studies, one for the events that go in the closure test (train: events numbered 1, 3, 5, etc) and one for events that go in the overtraining test (test: events numbered 2, 4, 6, etc). What we do here is to apply the NN correction and get a new collection of events or jets, then plot the histograms on a jet-by-jet basis (Pt, Eta, Phi, E, M) for the leading and subleading jet and then the dijet invariant mass, Mbb. For each we show the ratio (truth to reconstructed - black, truth to truth, or 1, in blue, truth to (reconstructed with NN applied) in red) and the absolute value (reconstructed times the ratios from before). What the NN does is to estimate a new ratio for Pt and then multiply Pt and E with that number. That way the entire LorentzVector is multiplied with that number, so M as well. But Eta and Phi do not change. The code is flexibile that you can train on two different NN outputs at a time, or put the inputs and outputs in different order. The strings are parsed correctly and then vectors are filled with the appropriate values in the appropriate order.

./bin/readTree.exe 3 ${jerroot}/event_trees_125_all.root event_tree event none none @rPt:4:trPt 100 train+test 600 400 1 1 1 ./plots/ corrected_ _ 1 0

When it runs, you will see a warning like this.

Warning in <TROOT::Append>: Replacing existing TH1: Pt_recon (Potential memory leak).

I avoided pointers especially to avoid this, but it still happens. My code is not perfect, but it works.

Let's look at the plots and other files, all put in the folder "./plots".

cd plots

You first see files with chi2 statements in them. They show the chi2 agreement between the ratio histograms of truth and reconstructed plus NN effect. This is if we want to have a quick numerical accout of the agreement. You can hack the code to get the chi2 by number of degrees of freedom, before or after normalization, etc. At the moment I am not using these numbers directly in my study.

less chi2_corrected_E_test_ratio.txt

Then you see plots for train vs test and for each for ratio vs value, just as I described above that show the ratio of the resolution between the reconstructed on one side and the reconstructed and NN on the other side. The smaller the number, the better the effect of the NN. We create these files so that in the end we can compare easily this final figure of several NNs. These values go for train vs test and for each go for the histogram itself (h) and the Gaussian fit applied to the histogram (g) - see how the histograms have a tail on the low side, well the Gaussian fit does not take the tail into account, and it is the Gaussian fit that is the official definition of the resolution - .

diff corrected_ratio_g_Resolution__train_value.txt corrected_ratio_g_Resolution__test_value.txt

By comparing the these numbers for the train and test and if they are similar it means we did not overtrain.

Create PDF presentation with Beamer/Latex

cd $jer/presentation
ls
mv presentation_per_jet.tex presentation.tex
pdflatex presentation.tex

Then open the presentation to see all the plots in one place. In Mac you do

open presentation.pdf

On linux you do

acroread presentation.pdf

That is it! If you thought this is a bit too long, you are right. That is why I made a script that runs it all for you for one neural network configuration and input file. Moreover, that will allow you to run subsequently (or even in parallel) for different NNs, that we can compare easily. So let's do that now. But first, let's clean all the files that we have made.

First let's make clean

make clean
rm -f presentation/presentation.*
rm -f presentation/*.aux
rm -f plots/*

Run all NN in one script

We run the script that does all these steps for us

./doNN.sh

We see the instructions

./doNN.sh $jerroot list/list_file.txt  list/list_NN_per_jet_test.txt
ls

For each pair of file and NN, a new folder is created, the code is copied there, the code is compiled, then the steps from above are done, obtaining at the end the plots, the .tex and .txt files and finally the .pdf presentation file. Then we can compare and rank the NNs by their resolution improvmeent in the Gaussian fit.

./doCompare.sh
./doCompare.sh ./ ratio_g_Resolution__train_value

And we see something like this

0.797178 file_name @Pt,@SumPtTrk:4:RatioPt 10
0.875119 file_name @Pt:4:RatioPt 10

We see that using only Pt as input improves the resolution by 12%, whereas using in addition SumPtTrk improves the resolution to 20%. Of course we would need to use the full dataset (larger statistics) and more epochs in training (100-300 instead of 10), for a more correct result. This was a test to make it quickly and let you run the code.

That is all. Now you can add new input variables on a jet-by-jet basis.

Now step-by-step study NN_Pt effect

But what we do first is train a NN with rPt as input only and name it "NN_Pt" instead of just "NN" and we will put it in the dedicated folder so that later on we can apply this NN only by itself, along with the muon in jet correction. We want to see what the NN gives on the top of these two corrections, by using other variables.

cd $jer
./bin/trainNN.exe ${jerroot}/train_trees_125_all.root per_jet 150 @rPt:4:trPt NN_Pt ./ 1 1 1
mv NN_Pt.h NN_Pt/.
mv NN_Pt.cxx NN_Pt/.
make clean -C NN_Pt; make -C NN_Pt
make clean -C NNUsage; make -C NNUsage
make clean -C Sample; make -C Sample
make clean -C readTree; make -C readTree

Now let's make input plots for no muon-in-jet selection, but using the "NN_Pt" correction.

./bin/readTree.exe 2 ${jerroot}/event_trees_125_all.root event_tree event none @rPt:4:trPt 600 400 1 1 1 ./plots/ input_ _all 1 0

Now let's make a training tree that will take have the same conditions, i.e. no muon-in-jet selection, but using the "NN_Pt" correction.

./bin/readTree.exe 1 ${jerroot}/event_trees_125_all.root event_tree event none @rPt:4:trPt ${jerroot}/train_trees_125_all.root

We check the mean value of rPt and we see 6% increase from the one without the NN_Pt applied, which is consistent with wha we expected.

Now we train a new NN with the name "NN" also with "rPt" only. Then we apply it. Its impact should be almost zero in the resolution, as the inputs will contain the first NN.

./bin/trainNN.exe ${jerroot}/train_trees_125_all.root per_jet 150 @rPt:4:trPt NN ./ 1 1 1

The output looks like this now, as it converges quicker, even gives the error I mentioned above. You should see an output like this

output_name_cxx=NN 
output_name=NN 
TTree per_jet has 126396 entries. We will use these for the training.
Info in <TMultiLayerPerceptron::Train>: Using 63198 train and 63198 test entries.
Training the Neural Network
Epoch: 0 learn=0.404515 test=0.408708
Epoch: 10 learn=0.179479 test=0.187594
Epoch: 20 learn=0.179479 test=0.187596
Epoch: 30 learn=0.179477 test=0.187596
Epoch: 40 learn=0.179476 test=0.187595
Epoch: 50 learn=0.179475 test=0.187595
Error in <TMultiLayerPerceptron::TMultiLayerPerceptron::Train()>: Line search fail
Epoch: 150 learn=0.179475 test=0.187595
Training done.
NN.h and NN.cxx created.
Network with structure: @rPt:4:trPt
inputs with low values in the differences plot may not be needed
@rPt -> 0.000327329 +/- 0.00111224
Info in <TCanvas::Print>: eps file ./NN.eps has been created
Info in <TCanvas::Print>: pdf file ./NN.pdf has been created

The plots NN.pdf look very different, which is what we except, the new NN should not learn anything new. Also above we see that it converges quickly to some value.

Then move the NN to its folder and recompile, as usual, to use it.

cd $jer
mv NN.h NN/.
mv NN.cxx NN/.
make clean -C NN; make -C NN
make clean -C NNUsage; make -C NNUsage
make clean -C Sample; make -C Sample
make clean -C readTree; make -C readTree

Now we apply the NN to see how much improvement we get.

./bin/readTree.exe 3 ${jerroot}/event_trees_125_all.root event_tree event none @rPt:4:trPt @rPt:4:trPt  100 train+test 600 400 1 1 1 ./plots/ corrected_ _ 1 0

And indeed, no improvement in the resolution. Furthermore, the resolution number is the same as before. So good.

Now repeat the exercise for muon-in-jet correction applied, but no "NN_Pt" applied and then with both corrections applied. After that, we will stick with both corrections applied and we will run the regular way with each variable (except rPt) at a time, to see the new ranking of variables and how much we can gain on the top of these two alone.

Now step-by-step study the muon-in-jet correction

Input plots

./bin/readTree.exe 2 ${jerroot}/event_trees_125_all.root event_tree event 0.4 none 600 400 1 1 1 ./plots/ input_ _all 1 0

Training tree

./bin/readTree.exe 1 ${jerroot}/event_trees_125_all.root event_tree event 0.4 none ${jerroot}/train_trees_125_all.root

Open the training tree and see that the mean is 3% higher in Pt than if there is no correction applied. Now we train the Pt-only NN.

./bin/trainNN.exe ${jerroot}/train_trees_125_all.root per_jet 150 @rPt:4:trPt NN ./ 1 1 1

The output looks like this now, as it converges quicker, even gives the error I mentioned above. You should see an output like this

output_name_cxx=NN 
output_name=NN 
TTree per_jet has 126396 entries. We will use these for the training.
Info in <TMultiLayerPerceptron::Train>: Using 63198 train and 63198 test entries.
Training the Neural Network
Epoch: 0 learn=0.445851 test=0.450528
Epoch: 10 learn=0.167461 test=0.177747
Epoch: 20 learn=0.165385 test=0.175738
Epoch: 30 learn=0.16525 test=0.175595
Epoch: 40 learn=0.165232 test=0.175584
Epoch: 50 learn=0.165229 test=0.17558
Epoch: 60 learn=0.165229 test=0.175579
Epoch: 70 learn=0.165228 test=0.175579
Epoch: 80 learn=0.165227 test=0.175578
Epoch: 90 learn=0.165227 test=0.175577
Epoch: 100 learn=0.165227 test=0.175577
Epoch: 110 learn=0.165227 test=0.175577
Epoch: 120 learn=0.165227 test=0.175577
Epoch: 130 learn=0.165227 test=0.175577
Epoch: 140 learn=0.165227 test=0.175577
Epoch: 149 learn=0.165226 test=0.175577
Training done.
NN.h and NN.cxx created.
Network with structure: @rPt:4:trPt
inputs with low values in the differences plot may not be needed
@rPt -> 0.0271166 +/- 0.0233612
Info in <TCanvas::Print>: eps file ./NN.eps has been created
Info in <TCanvas::Print>: pdf file ./NN.pdf has been created
Info in <TCanvas::Print>: gif file ./NN.gif has been created

The plots NN.pdf good, just as before, consistent with the fact that no NN was applied before this one.

Then move the NN to its folder and recompile, as usual, to use it.

cd $jer
mv NN.h NN/.
mv NN.cxx NN/.
make clean -C NN; make -C NN
make clean -C NNUsage; make -C NNUsage
make clean -C Sample; make -C Sample
make clean -C readTree; make -C readTree

Now we apply the NN to see how much improvement we get.

./bin/readTree.exe 3 ${jerroot}/event_trees_125_all.root event_tree event 0.4 none @rPt:4:trPt 100 train+test 600 400 1 1 1 ./plots/ corrected_ _ 1 0

And indeed, different resolution, interesting, better than the NN with Pt only.

Now step-by-step study of muon-in-jet correction and NN_Pt effect (in this order applied)

We just copy paste from above, but use 0.4 instead of none for the muon part.

But what we do first is train a NN with rPt as input only and name it "NN_Pt" instead of just "NN" and we will put it in the dedicated folder. We train this NN after we apply the muon in jet collection. So we can use the training tree that is already made from above.

cd $jer
./bin/trainNN.exe ${jerroot}/train_trees_125_all.root per_jet 150 @rPt:4:trPt NN_Pt ./ 1 1 1
mv NN_Pt.h NN_Pt/.
mv NN_Pt.cxx NN_Pt/.
make clean -C NN_Pt; make -C NN_Pt
make clean -C NNUsage; make -C NNUsage
make clean -C Sample; make -C Sample
make clean -C readTree; make -C readTree

Now we have a NN_Pt that is trained after the muon-in-jet selection was applied. In the code when we use it, we first apply the muon in jet and then apply the "NN_Pt" on top of that.

Now let's make input plots for muon-in-jet selection and using the "NN_Pt" correction.

./bin/readTree.exe 2 ${jerroot}/event_trees_125_all.root event_tree event 0.4 @rPt:4:trPt 600 400 1 1 1 ./plots/ input_ _all 1 0

Now let's make a training tree that will take have the same conditions, i.e. muon-in-jet correction and in addition "NN_Pt" correction.

./bin/readTree.exe 1 ${jerroot}/event_trees_125_all.root event_tree event 0.4 @rPt:4:trPt ${jerroot}/train_trees_125_all.root

We check the mean value of rPt and we see 6% increase from the one without the NN_Pt applied, which is the same we got with just NN_Pt applied. The muon in jet seems to not have changed the mean value.

Now we train a new NN with the name "NN" also with "rPt" only. Then we apply it. Its impact should be almost zero in the resolution, as the inputs will contain the first NN.

./bin/trainNN.exe ${jerroot}/train_trees_125_all.root per_jet 150 @rPt:4:trPt NN ./ 1 1 1

The output looks like this now, as it converges quicker, even gives the error I mentioned above. You should see an output like this

utput_name_cxx=NN 
output_name=NN 
TTree per_jet has 126396 entries. We will use these for the training.
Info in <TMultiLayerPerceptron::Train>: Using 63198 train and 63198 test entries.
Training the Neural Network
Epoch: 0 learn=0.395274 test=0.399417
Epoch: 10 learn=0.150421 test=0.159965
Epoch: 20 learn=0.150421 test=0.159965
Epoch: 30 learn=0.15042 test=0.159965
Epoch: 40 learn=0.150419 test=0.159963
Epoch: 50 learn=0.150418 test=0.159967
Error in <TMultiLayerPerceptron::TMultiLayerPerceptron::Train()>: Line search fail
Epoch: 150 learn=0.150418 test=0.159967
Training done.
NN.h and NN.cxx created.
Network with structure: @rPt:4:trPt
inputs with low values in the differences plot may not be needed
@rPt -> 9.23169e-05 +/- 0.000632627
Info in <TCanvas::Print>: eps file ./NN.eps has been created
Info in <TCanvas::Print>: pdf file ./NN.pdf has been created

The plots NN.pdf look very different, which is what we except, the new NN should not learn anything new. Also above we see that it converges quickly to some value.

Then move the NN to its folder and recompile, as usual, to use it.

cd $jer
mv NN.h NN/.
mv NN.cxx NN/.
make clean -C NN; make -C NN
make clean -C NNUsage; make -C NNUsage
make clean -C Sample; make -C Sample
make clean -C readTree; make -C readTree

Now we apply the NN to see how much improvement we get.

./bin/readTree.exe 3 ${jerroot}/event_trees_125_all.root event_tree event 0.4 @rPt:4:trPt @rPt:4:trPt  100 train+test 600 400 1 1 1 ./plots/ corrected_ _ 1 0

And we get the same result as from the muon and NN applied before. Things consisten, yey!

Now we can finally keep the "NN_Pt" NN and start using the other NN variables.

The event-by-event basis will be validated soon, 99% of the code is there.

Scripts to run all at a time

We want the mbb resolution in all combinations of applying and not applying muon-in-jet correction (M) and Pt-only NN (N), for llbb, nunubb, lnubb and all combined, and this for 125 GeV. We want for each case to have a training tree as well as all the plots and text files that give the resolution. For this we have to run the script "doStudyBasic.sh". In order to run for all processes at the same time (in parallel, to finish quicker), we run "doRunStudyBasic.sh". For each process you will have a folder, e.g. "atlas_ZH125_llbb_all". Go there, then to plots.

cd atlas_ZH125_llbb_all
cd plots

Here you will see folders for each of the four cases: nMnN, nMwN, wMnN, wMwN, where "n" means not applied and "w" means with that correction appied. We will go to each folder.

cd nMnN
ls
ls *.tex

What we want is mbb resolution for the train and test samples.

less corrected_Mbb_train_value.tex

In each file, the first raw is for the truth (generated). The second raw is for our corrections only, but not extra NN applied. The third raw for extra NN applied on the top of our corrections. In this script, there is no extra NN applied, so that we save time, so that raw will contain "nan". For each raw, there are columns and in each columns there are two numbers separated by "/". The left values refer to the histogram as it is, whereas the right numbers refer to eleven Gaussian fits applied subsequently to the histogram (physically what a Gaussian fit does is ignore the tails of the histogram). In each raw from left to right the pairs are the mean, the RMS and the resolution, defined as the RMS divided by the mean. We are interested in the RMS for the Gaussian fit. For example:

\begin{tabular}{llll}
\hline
 & Mean & RMS & resolution=RMS/Mean \\
\hline
Generated & 113/118 & 20.1/7.05 & 0.178/0.0596 \\
Reco & 106/108 & 24/20.3 & 0.226/0.188 \\
Reco+NN & nan/nan & nan/nan & nan/nan \\
\hline
(Reco+NN)/(Reco) & nan/nan & nan/nan & nan/nan \\
\hline
\end{tabular}

So from here we get that resolution for truth is 0.060 and for nMnN is 0.188. This is for the training tree. We do the same for the testing tree.

less corrected_Mbb_test_value.tex

We get this text

\begin{tabular}{llll}
\hline
 & Mean & RMS & resolution=RMS/Mean \\
\hline
Generated & 113/118 & 20.2/7.2 & 0.179/0.0609 \\
Reco & 106/108 & 24/20.8 & 0.227/0.193 \\
Reco+NN & nan/nan & nan/nan & nan/nan \\
\hline
(Reco+NN)/(Reco) & nan/nan & nan/nan & nan/nan \\
\hline
\end{tabular}

So from here we get that resolution for truth is 0.061 and for nMnN is 0.193 for the training tree. Notice how these values are different. Their difference gives us a feeling of the "uncertainty" on our corrections and training. It is also a statistical error.

Then we do the same for the other three folders, nMwN, wMnN, wMwN. Notice that for all folders the truth numbers remain the same, so we only have to change the correction ones. We put all together in a tex file like this, to be included in a talk.

\begin{tabular}{llll}
\hline
Muon in jet & Pt-only NN & Extra & Resolution Train/Test\\
\hline
No & No & No & 0.188/0.193 \\
No & Yes & No & 0.157/0.161 \\
Yes & No & No & 0.165/0.168 \\
Yes & Yes & No & 0.142/0.145 \\
Truth & Truth & Truth & 0.060/0.061 \\
\hline
\end{tabular}

Back to the main framework page: AdrianBuzatuJER.

-- AdrianBuzatu - 18-Oct-2012

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2013-01-22 - WilliamBreadenMadden
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback