The idea of this study is to explore our capability of discriminate background photons (from pi0s, etas, etc) from direct photons, using a Maximum LogLikelihood fit to find the best linear combination of MC signal and background samples that match a given data sample.
For that we have used the following datasets as MC 'reference samples'
PandaIdJob | Process | pT | Dataset | AOD |
88 (Xabier) | single gamma | 20 GeV | 7040 | trig1_misal1_mc12.007040.singlepart_gamma_Et20.recon.AOD.v13003001 |
91 (Xabier) | single gamma | 60 GeV | 7042 | trig1_misal1_mc12.007042.singlepart_gamma_Et60.recon.AOD.v13003001 |
84 (Xabier) | single pi0 | 20 GeV | 7140 | trig1_misal1_mc12.007140.singlepart_pi0_Et20.recon.AOD.v13003001 |
85 (Xabier) | single pi0 | 60 GeV | 7142 | trig1_misal1_mc12.007142.singlepart_pi0_Et60.recon.AOD.v13003001 |
and ran the analysis over the output AANs from EventView rel 13.0.40.
Despite all the results presented from here on are those for 20GeV samples, the performance is similar for 60GeV datasets and the same conclusions are valid.
The variable used so far was fracs1 = shower shape in the shower scope: [E(+-3) - E(+-1)/E(+-1)], where E(+-n) is the energy in +-n strips around the strip with highest energy. The MC distributions for both signal and background photons candidates (20GeV, after IsEM==0 selection) are shown next.
The method of analysis is based on the so-called binned maximum log likelihood.
We have implemented a ROOT algorithm which uses TMinuit to minimize -2*ln L
, where L is the likelihood function defined in our case as
L = PI(i){e^{-mu_i}*mu_i^{n_i}/n_i!}
where,
The method then look for the two best parameters (N & purity) that lead to the minimum value of -2*lnL=.
the factor -2 allows MINUIT to get errors using the same recipe as for least squares, i.e. go up from the minimum by 1.
To study the capability of the method to extract the purity two different samples have been tried out:
In this approach, one half of the sample has been kept as 'truth' reference and the other used to make a new 'data' sample with a given purity. In this way we can study the robustness of the method for different signal & background ratios.
Several configurations have been set-up (initialization parameters,# of bins, fracs1 range, etc). First, an outcome example is given together with the corresponding distributions (for 10 & 50 bins).
*ET20* *True Purity = 0.4* True Normalization = 8.97E-001 Signal Entries = 1712 , Background Entries = 1712 , Mixed Entries = 1536 PARAMETER DEFINITIONS: NO. NAME VALUE STEP SIZE LIMITS 1 purity 0.00000e+00 1.00000e-02 0.00000e+00 1.00000e+00 2 N 5.00000e-01 1.00000e-02 no limits ********** ** 1 **MIGRAD ********** MIGRAD MINIMIZATION HAS CONVERGED. MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX. COVARIANCE MATRIX CALCULATED SUCCESSFULLY FCN=136.874 FROM MIGRAD STATUS=CONVERGED 53 CALLS 54 TOTAL EDM=2.13794e-08 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 purity 3.53865e-01 9.98784e-02 1.20858e-03 -2.65085e-05 2 N 8.97194e-01 2.28924e-02 1.31226e-04 -9.02950e-03 EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=1 1.012e-02 2.287e-11 2.287e-11 5.241e-04 PARAMETER CORRELATION COEFFICIENTS NO. GLOBAL 1 2 1 0.00000 1.000 0.000 2 0.00000 0.000 1.000
10 bins
50 bins
Now, this analisis was extended to several true input purities. The results are summed up in the following table.
10 bins | 30 bins | 50 bins | ||||||
Mixed Entries | True Purity | True N | Purity MLL Value | N MLL Value | Purity MLL Value | N MLL Value | Purity MLL Value | N MLL Value |
922 | 0. | 5.38E-1 | 1.35E-18 +- 5.94E-2 | 5.38E-1 +- 1.77E-2 | 1.33E-18 +- 5.76E-2 | 5.38E-1 +- 1.77E-2 | 1.13E-18 +- 8.02E-2 | 5.38E-1 +- 1.77E-2 |
1024 | 0.1 | 5.98E-1 | 1.17E-9 +- 7.33E-1 | 5.98E-1 +- 1.87E-2 | 1.18E-9 +- 7.88E-1 | 5.98E-1 +- 1.87E-2 | 4.29E-2 +- 1.10E-1 | 5.98E-1 +- 1.87E-2 |
1152 | 0.2 | 6.73E-1 | 4.21E-2 +- 1.23E-1 | 6.73E-1 +- 1.98E-2 | 7.28E-2 +- 1.11E-1 | 6.73E-1 +- 1.98E-2 | 1.35E-1 +- 1.08E-1 | 6.73E-1 +- 1.98E-2 |
1317 | 0.3 | 7.69E-1 | 1.32E-1 +- 1.23E-1 | 7.69E-1 +- 2.12E-2 | 1.57E-1 +- 1.07E-1 | 7.69E-1 +- 2.12E-2 | 2.24E-1 +- 1.05E-1 | 7.69E-1 +- 2.12E-2 |
1536 | 0.4 | 8.97E-1 | 2.73E-1 +- 1.19E-1 | 8.97E-1 +- 2.29E-2 | 2.91E-1 +- 1.2E-1 | 8.97E-1 +- 2.29E-2 | 3.54E-1 +- 1.00E-1 | 8.97E-1 +- 2.29E-2 |
1844 | 0.5 | 1.07E0 | 4.41E-1 +- 1.11E-1 | 1.07E0 +- 2.50E-2 | 4.48E-1 +- 9.68E-2 | 1.07E0 +- 2.50E-2 | 4.84E-1 +- 9.27E-2 | 1.07E0 +- 2.50E-2 |
1536 | 0.6 | 8.97E-1 | 5.47E-1 +- 1.23E-1 | 8.97E-1 +- 2.29E-2 | 5.55E-1 +- 1.06E-1 | 8.97E-1 +- 2.29E-2 | 6.07E-1 +- 1.03E-1 | 8.97E-1 +- 2.29E-2 |
1317 | 0.7 | 7.69E-1 | 6.92E-1 +- 1.35E-1 | 7.69E-1 +- 2.12E-2 | 6.55E-1 +- 1.14E-1 | 7.69E-1 +- 2.12E-2 | 7.03E-1 +- 1.09E-1 | 7.69E-1 +- 2.12E-2 |
1152 | 0.8 | 6.73E-1 | 8.09E-1 +- 1.47E-1 | 6.73E-1 +- 1.98E-2 | 7.29E-1 +- 1.22E-1 | 6.73E-1 +- 1.98E-2 | 8.00E-1 +- 1.16E-1 | 6.73E-1 +- 1.98E-2 |
1024 | 0.9 | 5.98E-1 | 9.78E-1 +- 6.96E-1 | 5.98E-1 +- 1.87E-2 | 8.69E-1 +- 1.27E-1 | 5.98E-1 +- 1.87E-2 | 9.09E-1 +- 1.15E-1 | 5.98E-1 +- 1.87E-2 |
922 | 1.0 | 5.38E-1 | 1.00E0 +- 9.88E-2 | 5.38E-1 +- 1.77E-2 | 9.99E-1 +- 8.93E-1 | 5.38E-1 +- 1.77E-2 | 1.00E0 +- 7.13E-1 | 5.38E-1 +- 1.77E-2 |
As you can appreciate from this table, as we could expected is worst in the case of few bins when your shape ain't well resolved.
The normalization factor does not depend on the number of bins, as it should be.
Last results can be also summed up in a plot like this...
It can be seen a big error in a couple of bins, near both end limits of purity. Further studies are on course to find out the reason.
Taking into account the independence of the normalization with respect to the binning and the good agreement in the whole purity range, we have also tried to perform the minimization fixing N to its 'true' known value. Any improvement has been achieved by doing that though.
Different input parameters have been also tried, but without any observable effect on the final estimation which is a desirable feature of any method. The extreme case when both parameters are allowed to vary freely is shown in this table (as we've said before N doesn't depend on this so it ain't shown here)
50 bins | ||
True Purity | MLL purity | |
0. | -8.81E-2 +- 1.17E-1 | |
0.1 | 4.30E-2 +- 1.16E-1 | |
0.2 | 1.35E-1 +- 1.10E-1 | |
0.3 | 2.25E-1 +- 1.06E-1 | |
0.4 | 3.54E-1 +- 1.01E-1 | |
0.5 | 4.84E-1 +- 9.32E-2 | |
0.6 | 6.07E-1 +- 1.04E-1 | |
0.7 | 7.03E-1 +- 1.10E-1 | |
0.8 | 8.00E-1 +- 1.18E-1 | |
0.9 | 9.09E-1 +- 1.18E-1 | |
1.0 | 1.00E-1 +- 1.15E-1 |
Besides the negative approach to zero purity (but positive within the error) the method is rather stable. Even more in this way we avoid the border effect we have seen before (at least in the upper end).
Similar performance have been found for 10 and 30 bins in the whole purity range except in the two lowest values where the method returns negative purities (positive within the error though).
We could summarise this MC analysis into a few items to have in mind:
Having studied the reliability of the method on MC ad-hoc mixes, the next step is to use a more 'real' background sample. This other approach uses then as evaluation sample those reconstructed photons remaining in the JF17 sample after offline selection (/space1/data/J17NTUP-13.0.30.1-tid015310/NTUP* --> MixedSample.JF17.root). As we have to compare this 'data' against MC distributions at 20 GeV a cut on the highest pt photon have been applied ( 15GeV < pt < 25 GeV)
In order to compute its 'true' purity each good offline photon (|eta|<2.5,pt[15GeV,25GeV],IsEM==0) was matched with a truth object (closest matching in a R=0.1 cone) and classified by mother id when the associated is a photon.
The final fracs1 spectra is shown is this figure, for all the reco photons found (left) and discriminated by mother id (right). (no geant particles have been considered).
True Particle Matched | from... | #entries |
pi0,eta | 345 | |
gamma | DP | 53 |
q/g line | 98 | |
fakes | 20 |
The photon-fake subsample has the following composition
ph-Fakes composition | |
#entries | |
e | 3 |
p | 2 |
pi | 11 |
K | 1 |
K^0_s | 1 |
K^0_L | 1 |
total | 19 |
The converted photons (if we look now @ geant particle level and the best match is then one of the final electrons) are coming mainly from pi0 decays as it can be observed in this table
Conversion Mothers | ||
#entries | detailed | |
pi0,eta | 50 | pi0(42), eta(8) |
DP | 8 | |
q/g line | 13 | u(4), d(4), s(1), c(2), g(2) |
e | 9 | |
others | 4 | p(2), w(2) |
total | 84 |
thus how we treat the conversion recovery at the end will have more impact on the background distribution.
Running the MLL algorithm in blind mode (i.e. just with the same configuration as before) on it we got then
*50 bines, free varying configuration space* Signal Entries = 1712 Background Entries = 1712 Mixed Entries = 518 PARAMETER DEFINITIONS: NO. NAME VALUE STEP SIZE LIMITS 1 purity 0.00000e+00 1.00000e-02 no limits 2 N 5.00000e-01 1.00000e-02 no limits ********** ** 1 **MIGRAD ********** MIGRAD MINIMIZATION HAS CONVERGED. MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX. COVARIANCE MATRIX CALCULATED SUCCESSFULLY FCN=109.642 FROM MIGRAD STATUS=CONVERGED 44 CALLS 45 TOTAL EDM=4.99867e-09 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 purity 9.91902e-02 1.71884e-01 8.82752e-04 -5.61509e-04 2 N 3.01402e-01 1.32685e-02 6.81398e-05 1.96871e-03 EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=1 2.954e-02 -2.458e-12 -2.458e-12 1.761e-04 PARAMETER CORRELATION COEFFICIENTS NO. GLOBAL 1 2 1 0.00000 1.000 -0.000 2 0.00000 -0.000 1.000
As we don't have to mix the MC photons here, we have used all the MC sample as reference.
So the MLL parameters we would have to compare with the true ones are
From one hand, the true normalization factor in our case was truthN = 3.02E-01 showing a great agreement with the MLL value.
On the other, the puzzling thing is how to define our true purity. We could think in at least two options
It can be seen that the agreement is rather good in the last case as we could have expected. Since our MC signal are monochromatic photons, the method is taking as 'signal' only those quite alike to them (i.e. DP photons in the sample as they are quite isolated).
To check this, the effect of isolation at calocluster level for photons from different sources have been studied in detail here.
Finally, assuming we want only the amount of DP photons in our 'data', the MLL can extract it within an error of ~10-15%. We foreseen a further reduction of this percentage, while the method depends on the MC statistics available.
MC performance
50 bins, free varying config space | ||||
Mixed Entries | True Purity | True N | Purity MLL Value | N MLL Value |
368 | 0. | 5.38E-1 | -1.41E-1 +- 1.3E-1 | 5.40E-1 +- 2.82E-2 |
408 | 0.1 | 5.98E-1 | 3.39E-2 +- 1.14E-2 | 5.99E-1 +- 2.97E-2 |
460 | 0.2 | 6.73E-1 | 9.63E-2 +- 1.13E-1 | 6.75E-1 +- 3.15E-2 |
525 | 0.3 | 7.69E-1 | 1.59E-1 +- 1.08E-1 | 7.70E-1 +- 3.36E-2 |
613 | 0.4 | 8.97E-1 | 2.83E-1 +- 9.84E-2 | 9.0E-1 +- 3.63E-2 |
736 | 0.5 | 1.07E0 | 3.65E-1 +- 8.82E-2 | 1.08E0 +- 3.98E-2 |
613 | 0.6 | 8.97E-1 | 4.8E-1 +- 9.56E-2 | 9.0E-1 +- 9.56E-2 |
525 | 0.7 | 7.69E-1 | 5.28E-1 +- 1.02E-1 | 7.71E-1 +- 3.36E-2 |
460 | 0.8 | 6.73E-1 | 6.86E-1 +- 1.04E-1 | 6.73E-1 +- 3.14E-2 |
408 | 0.9 | 5.98E-1 | 7.34E-1 +- 1.08E-1 | 5.99E-1 +- 2.96E-2 |
368 | 1.0 | 5.38E-1 | 8.26E-1 +- 9.99E-2 | 5.40E-1 +- 2.82E-2 |
The lack of statistics is clearly an issue here, however the estimation of N remains insensitive to that.
The "weird" points (in red below) are being looked more carefully now to understand its odd behaviour.
#sigma_(purity) | |||
N=500 | N=700 | N=900 | N=1317 |
0.06 | 0.08 | 0.05 | 0.06 |
0.14 | 0.66 | 0.65 | 0.08 |
0.18 | 0.15 | 0.12 | 0.09 |
0.15 | 0.13 | 0.13 | 0.1 |
0.16 | 0.15 | 0.13 | 0.1 |
0.17 | 0.15 | 0.14 | 0.11 |
0.19 | 0.16 | 0.14 | 0.12 |
0.19 | 0.16 | 0.14 | 0.12 |
0.18 | 0.16 | 0.14 | 0.12 |
0.18 | 0.69 | 0.18 | 0.17 |
0.16 | 0.07 | 0.05 | 0.04 |
How is the error affected in the low&high end?? Is somehow the limited parameter range playing a role here? (i.e. the error is smaller in both 0 & 1 purity values which are the limits of the allowed parameter space).
-- MartinTripiana - 28 Aug 2008
I | Attachment | History | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|---|
gif | Et20.fracs1.IsEM.gif | r1 | manage | 9.3 K | 2008-08-28 - 11:45 | MartinTripiana | |
gif | JF17.fracs1.IsEM.convrecov.gif | r1 | manage | 11.1 K | 2008-09-03 - 13:33 | MartinTripiana | fracs1 distribution from selected JF17 photons. all(left), by mother(right). Conversion recovered. |
gif | JF17.fracs1.IsEM.gif | r1 | manage | 10.8 K | 2008-08-29 - 11:03 | MartinTripiana | fracs1 distribution from selected JF17 photons. all(left), by mother(right) |
gif | JF17.fracs1.IsEM.nogeant.gif | r1 | manage | 10.8 K | 2008-09-03 - 15:27 | MartinTripiana | fracs1 distribution from selected JF17 photons. all(left), by mother(right) . No geant particles considered |
gif | MLL.JF17.50b.gif | r1 | manage | 9.8 K | 2008-08-29 - 12:24 | MartinTripiana | |
gif | MLL.JF17.50b.nogeant.gif | r1 | manage | 9.8 K | 2008-09-03 - 15:28 | MartinTripiana | output ProfileMethod.MLL.cxx on JF17 dijets. No geant particles. |
gif | MLL.MC.all2gether.gif | r2 r1 | manage | 8.7 K | 2008-09-02 - 21:37 | MartinTripiana | |
gif | MLL.MC.p04.10b.gif | r1 | manage | 9.2 K | 2008-08-28 - 17:47 | MartinTripiana | output ProfileMethod.MLL.cxx |
gif | MLL.MC.p04.50b.gif | r1 | manage | 9.9 K | 2008-08-28 - 17:48 | MartinTripiana | output ProfileMethod.MLL.cxx |
gif | MLL.errorEvolution.gif | r1 | manage | 9.2 K | 2008-09-15 - 20:20 | MartinTripiana |
Webs
Welcome Guest