%CERTIFY%

Low Mass Higgs Combination - Binned Log-Likelihood Ratio Technique

Introduction

It is standard practice to estimate significance using s/sqrt(b) however it is more accurate to define the sensitivity of the analysis by estimating the gaussian probability of the signal-like observation resulting from a fluctuation in the background distribution. A conventional 5 sigma observation is equivalent to a gaussian probability of 2.9E-7.
The Binned Log-Likelihood is a method for extracting the discovery and exclusion sensitivity of your analysis in this way.

Aim

The aim of this method is to produce an estimate for the discovery and exclusion significance for analyses completed at Atlas.
Additionally, (as it is straightforward in this framework) we take one further step and combine the results of each analysis to ascertain the combined sensitivity of the Atlas experiment to discovery and exclusion of expected new physics at the LHC.

The Method (The 'Non-Parameterised' or 'Binned' LLR)

This analysis was developed for LEP and is used at Tevatron, for independent CDF results and for the combination of CDF and D0 Higgs results.


The process works on the basis that a number of pseudo-experiments (Npe) will be carried out to produce the equivalent number of independent sets of pseudodata. Each of which, are used to produce a value of the LLR test statistic under the assumption of the null hypothesis and the test hypothesis.

  • Null Hypothesis, H0 - this is the background only hypothesis (i.e. no Higgs at a given mass).
  • Test Hypothesis, H1 - this is the signal+background hypothesis (i.e. Higgs at a given mass).

The test statistic is based on the log-likelihood ratio:
  -2lnQ = -2ln \biggl( \frac{ L( data|\hat{s} + \hat{b} ) }{ L( data|\hat{b} ) }\biggr) (1)


With the L(data|s+b) and L(data|b) written as Poisson likelihoods, we can re-write the test statistic in the appropriate form:

  -2lnQ = -2 \Sigma_{i=1}^{N} \biggr( s_{i} - n_{i}ln \biggr[ 1 + \frac{ s_{i} }{ b_{i} } \biggl] \biggl) (2)




The distributions of the test statistic are refered to as tPDF (Test statistic Probability Density Function) Distributions, and will look like this:
example_pdf.jpeg

From these tPDF distributions, we can then estimate the sensitivity following the approximately frequentist methods set out at LEP by the LEP Higgs WG.

Sensitivity and Limits

Following the methods defined at LEP, we can use the tPDF shown in the example_pdf.jpeg, to define the sensitivity of the analysis for which the tPDF distributions were made.
There are several variables to define. A short description of each is shown below.

Discovery Confidence and 1-CLb


With high statistics, it has been shown that the LLR is equivalent to a \Delta\chi^{2}<\latex> distribution. As such, the p-value for the background-only hypothesis, can be written as:
  1-CL_{b} = P_{ H_{0} }( \Delta \chi^{2} \leq \Delta \chi^{2}_{obs} )  (3)


And with the equivalence to the LLR, more familiarly, as:

  1-CL_{b} = P_{ H_{0} }( Q \geq Q_{obs} )  (4)


For discovery, this is equivalent to a 5sigma sensitivity in the counting experiment when the value of 1-CLb is 2.9E-7. As such, to accurately measure the value of 1-CLb without fitting tPDF(H0) distribution requires at least 1E8 pseudo-experiments be carried out using toy MC based on the expected background distributions. Alternatively, if the tPDF is shown with a high number of pseudo-experiments to be gaussian, it can be fitted and a value of 1-CLb is calculated from integrating the appropriate region in the fit.

Exclusion Confidence and CLs


For exclusion of the SM Higgs at a given mass and following the definition of the discovery sensitivity above, we can write the confidence in the signal-plus-background hypothesis as:
  CL_{s+b} = P_{ H_{1} }( \Delta \chi^{2} \geq \Delta \chi^{2}_{obs} ).  (5)


And similar to above, in the LLR framework:

  CL_{s+b} = P_{ H_{1} }( Q \leq Q_{obs} ). (6)


This can be interpreted as a standard frequentist confidence limit, such that, with CLs+b less than (or equal to) 0.05 we exclude the signal-plus-background hypothesis at the 95% CL. However, this interpretation is open to the possibility of excluding values of the discriminating variable which are beyond the sensitivity of the experiment. To circumvent this problem, we follow the example set by LEP and used by CDF, and define CLs, the modified frequentist confidence level, which avoids the possibility of excluding when there is no sensitivity:

  CL_{s} = \frac{ P_{ H_{1} }( \Delta \chi^{2} \geq \Delta \chi^{2}_{obs} ) }{ P_{ H_{0} }( \Delta \chi^{2} \geq \Delta \chi^{2}_{obs} )}, (7)


Or simply:

  CL_{s} = \frac{ CL_{s+b} }{ CL_{b} }. (8)


The interpretation of this value as a confidence level is not quite accurate, given that it is a ratio of confidences. The simplest way to interpret it is, given the definition of CLb as the exclusion potential of the experiment, (in a median background experiment, for example, this is always 50\%) the false exclusion probability, (CLs+b) cannot be any greater than 5% of the exclusion potential for the experiment.
One point to note is that though insensitive exclusion is avoided, overcoverage is introduced.

For a simple run down of how the Log-Likelihood and CLs Method works see, LLR and CLs Method Presentation.

The Combination



The major attraction of this method is the extension to the combination of channels. It is straight-forward.
All the definations of significance and exclusion limits still hold.
The only change, is the number of bins over which, the calculation of the test statistic is carried out.

For a single channel, the LLR test statistic is measured in each bin, and then summed over all bins to give a final value of the test statistic for the distribution.
Extending to the multi-channel case, the sum now runs over all the bins in each distribution of the discriminating variable, for each channel.

As such, a single value of the test statistic is produced for the combination of N-channels by summing over m x N bins, where m is the number of bins in the discriminating variable distrubtion for each channel.

Required Inputs

This is a Binned Log-Likelihood Analysis i.e. it works on the assumption that the input is provided in the form of histograms.
The test statistic is calculated per bin, of the distribution of the discriminating variable. The discriminating variable is commonly the invariant mass of the Higgs candidate, m_bb, but can be any other robust variable.
Here is an example of a suitable discriminating variable input to the analysis:


example_mass.jpeg

In detail, the inputs needed are:
  • Un-normalised histograms of the signal and EACH background , relevent for the channel. - Histograms have to be UNNORMALISED so that the statistical uncertainty on the Monte Carlo is taken into account in the sensitivity calculation correctly.
  • Scale Factors for a pre-defined integrated luminosity, e.g 10fb-1. - Scale Factors are calculated from the standard equation for int. Luminosity;
  L = N/ (\epsilon \sigma) (9)

Where N = number of events generated, $ \epsilon$ = efficiency of any GENERATOR level cuts, and $\sigma$ is the cross section of the interaction in whichever units you want the integrated luminosity to be in, so for Atlas, fb-1.

Using the Code

If you want to use this LEP-like statistical tool to compute the sensitivity, the power of an analysis or setup a limit on a model, the code provided by T. Junk and running under Root can be found on the PhyStat Code Repository. To download a tar file of the code, and the up-to-date documentation, see P-values and Bayesian limits and CLs limits and fits for new physics searches using ROOT.


Additionally, below you will find files for an example from the Higgs to 2 photons decay mode. ----------------------------------------------------------------------------------------------------------------------
To download the necessary files to run the example, see this page.
There is a README document in the tar file which provides details about each of the files and how to run the example.

Documentation

There are several very useful presentations and papers out there detailing the use of the log-likeliood ratio, it's relationship with chi^2, and the so-called 'CLs' method of exclusion. I have listed a few of the more important ones here.

For an example of this technique used with ATLAS Higgs Data, see the talk I gave at the UK Atlas Physics Meeting, 2008:

For an example of results using this method, see:

Validation

When the final combination result is produced from the toy inputs provided by the each Higgs Working Group (and subsequent updates) the combination result, and each independent channel result, will be shown here.

Soon, there will be a toy example here.

Inputs Provided for Higgs Combination Example

Currently, the Statistics Forum has requested inputs in the form of either, the Likelihood PDFs, that can then be used in the Profile Likelihood Method, or the histograms of your optimal, robust discriminating variable.
So far, normalised histograms have been provided to the group from the H->gammagamma, H->4l, h->WW and H->tautau channels, at a mass of 130GeV.
The current inputs are not sufficient for the binned log-likelihood approach to determine an accurate limit on the HIggs at this mass, or to comment on sensitivity, however taken as a toy example, the results for the channels are posted below. The inputs provided can be seen on the CombinationInputs page on the StatisticsTools TWiki page.

Credits and Contacts

This code was originally developed for use at CDF by Tom Junk, and has been tested for use within ATLAS for Low Mass Higgs Discovery and Combination by S.Ferrag and C.Wright.


Major updates:
-- CatherineWright - 03 Mar 2008

%RESPONSIBLE% CatherineWright
%REVIEW% Never reviewed

  • Example tPDf (jpeg format):
    example_pdf.jpeg

  • Example of discriminating variable distribution, m_bb for the ttH channel.:
    example_mass.jpeg

Latex rendering error!! dvi file was not created.
Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpeg example_mass.jpeg r1 manage 24.8 K 2008-03-05 - 13:00 CatherineWright Example of discriminating variable distribution, m_bb for the ttH channel.
JPEGjpeg example_pdf.jpeg r1 manage 22.7 K 2008-03-05 - 11:58 CatherineWright Example tPDf (jpeg format)
Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2011-12-01 - SotirisVlachos
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback