In the case there is no background, the error on the efficiency
can be expressed as
.
However, this formula assumes that the true efficiency is known. Consequently, when one wants to quote a true confidence interval this formula does not hold in the limit of small and small or large efficiencies. For instance, if you have 10 events and you accept 10 events, the efficiency is 100% and the error will be 0%. The frequentist solution to this problem is to construct a Neyman confidence belt and read off the confidence interval for the measured efficiency. The confidence belt can be constructed for example as central confidence intervals (Clopper-Pearson), or using a likelihood ratio ordering (Feldman-Cousins). In ROOT the error on the efficiency can be evaluated using the TEfficiency class. Example of usage:
#include "TEfficiency.h" TEfficiency* pEff = new TEfficiency(); pEff->SetStatisticOption(TEfficiency::kFFC); double CL = 0.6827; // confidence level double ntot = 10; double atot = 10; double lowerLimit = pEff->FeldmanCousins( ntot, atot, CL, false); double upperlimit = pEff->FeldmanCousins( ntot, atot, CL, true);The result is an upper limit of 100% and a lower limit of 88.45%. On the right side the Feldman-Cousins confidence belt for 10 events is shown.
In the example that you want to determine your efficiency on data using a tag-and-probe technique on a specific resonance, you typically face some amount of background under the mass peak. See for example the picture on the right. The background under the mass peak can be estimated from the number of events in the (signal-free) side bands. This background estimation can be subtracted from the total number of events that you observe in the peak. This you do for the total number of events and the number of accepted events. The efficiency is thus calculated as
The problem is now that the nominator and denominator of your efficiency number have an additional fluctuation (note that does not have any fluctuation in the background-free case). In fact, the efficiency can become smaller or larger than one, just because of a fluctuation of the background. Consequently, the error on the efficiency is increased as well.
Analogously to the background-free case, one could construct again a confidence belt using a likelihood-ordering principle. However, one has to deal now with the nuisance parameters and . In principle, this can be done by constructing a 6-dimensional confidence belt, but this is very time-consuming. The PDF for this would look like
,
where one needs to integrate over the number of events below the peak, and . Alternatively, one could treat the nuisance parameters in a Bayesian way and integrate them out. A simplified approach can be made when assuming that the number of background events from the sideband equals the true value of the background contribution under the peak. Then the total PDF for this problem can be constructed as
,
where B is a binomial PDF and P is a Poissonian PDF. In other words, it consists of a binomial distribution for the signal efficiency, a Poissonian distribution for the number of background events and again a binomial distribution for the background efficiency, which is given by:
In other words, the nuisance parameters fluctuate around their measured value. This approach will give some undercoverage in case the number of background events is small, or the background efficieny is close to 0% or 100%.
Another approach is to treat the variations of the signal efficiency, the number of background events and the background efficiency as independent parameters and add their contributions to the error quadratically. For large enough values of and for background and signal efficiencies far away from the boundary, one can approximate the measured values as the true values and use the normal bionomial and Poissonian errors. If this requirement is not fulfilled, one can use the Feldman-Cousins confidence intervals for the individual parameters. The problem with this method is that due to background fluctuations the individual PDF for the signal efficiency can become non-physical. For instance, in case you have
, , ,
The measured efficiency will be 101%. The binomial efficiency of 101/100 will be unphysical. The pragmatic solution to this problem is to use the closest physical result as the input to the binomial PDF. Similarly, due to the quadratic sum the upper and lower error bars can extend above 100% and below 0%. Therefore, the confidence limits will be bounded by these physical limits. The main advantage of this method is that it is fast and it gives proper coverage in case the parameters are far away from the physical bounds. The method is implemented a ROOT script which can be found on
/afs/cern.ch/user/j/jvantilb/public/efficiency/effFast.CTo run this script in ROOT:
> .L effFast.C++ > double pMeas, totErrUpp, totErrLow; > effFast(pMeas, totErrUpp, totErrLow, 200, 185, 100, 90) The efficiency is 0.95 +0.0378642 -0.0414426 (68% CL)This is for the case of 200 events in total, of which 100 are background and there are 185 accepted events with 90 background events. The expected efficiency is therefore: (185-90)/(200-100)=95%. One can change the CL to 90% (by default the CL is set to 68.27%) and switch the output to off by:
> effFast(pMeas, totErrUpp, totErrLow, 200, 185, 100, 90, 0.90, false)
-- JeroenVanTilburg - 16-Jun-2011
I | Attachment | History | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|---|
1BinTrackEffDenominator.pdf | r1 | manage | 29.6 K | 2011-06-16 - 13:20 | JeroenVanTilburg | Jpsi mass peak | |
FCbelt.pdf | r1 | manage | 71.2 K | 2011-06-16 - 13:40 | JeroenVanTilburg | ||
png | FCbelt.png | r1 | manage | 12.6 K | 2011-06-16 - 13:40 | JeroenVanTilburg | |
png | JpsiMassPeak.png | r1 | manage | 31.3 K | 2011-06-16 - 13:22 | JeroenVanTilburg |