This page documents the tutorial for the "LPC Stats II Hands On Tutorial Event", focusing on the usage of the
CMS combination tool to extract the signal strength, statistical significance and upper limits in a physics model.
General Introduction
In this tutorial, we use the H->WW->lnln analysis as an example to illustrate how to build a physics analysis based on both cut-n-count and shape method and calculate the upper limits, significance. For simplification, we use ideal MC sample without reconstruction and include only one background (continuum WW) in the search.
Disclaimer: Numbers in this tutorial are motivated by the HWW analysis, but do not have one-to-one correspondance to the HWW results. Please do not be alarmed that your result is not the same as in the official note or AN.
Setting up the environment for executable combine
note: as for the rest of CMSSW software, this requires scientific linux 5, so you should log into lxplus5.cern.ch and not lxplus.cern.ch
setenv SCRAM_ARCH slc5_amd64_gcc472 # use export SCRAM_ARCH=slc5_amd64_gcc472 for bash
cmsrel CMSSW_6_1_1
cd CMSSW_6_1_1/src
cmsenv
addpkg HiggsAnalysis/CombinedLimit V03-01-12
scramv1 b
After this, you have an excutable called "combine" that can be used to do tons of statistical calculations. Use
combine help
to see the ultimate list of options.
Building cut-based analysis
In a cut and count analysis, we need to know what is the number of signal and background events expected, the number of events observed in data and associated systematic uncertainties. Each source of the systematic uncertainty is labelled as a nusiance. It is important to notice that even the MC statistic error for a given background is assumed as a nusiance.
Datacard
Datacard is the input to run the statistical tools in the Higgs analysis in
CMS. This can be adapted for other search program as well. There are two different softwares in
CMS that does this computation, one is in
LandS and other is in
HiggsLimit/combination. The common data card for a cut-based analysis is given below.
Here is an example of the card called hww_20fb_cut.txt. In this example we consider 20% systematic uncertainty for the signal and 10% for the background.
imax 1 number of channels
jmax * number of background
kmax * number of nuisance parameters
Observation 505
bin 1 1
process HWW qqWW
process 0 1
rate 90.000 430.000
uncert_HWW lnN 1.200 1.000
uncert_qqWW lnN 1.000 1.100
- This card needs to be reformatted to run within the HiggsAnalysis/CombinedLimit, by the following command.
combineCards.py of0j=hww_20fb_cut.txt > hww_20fb_cut_comb.txt
Note that if you have other channel (such as hww_20fb_1j_cut.txt), this can also combine them together by append other channels such as
combineCards.py of0j=hww_20fb_cut.txt of1j=hww_20fb_1j_cut.txt > hww_20fb_cut_comb.txt
Upper limit on the signal strength
combine -d hww_20fb_cut_comb.txt -M Asymptotic
Output:
-- Asymptotic --
Observed Limit: r < 1.8221
Expected 2.5%: r < 0.6019
Expected 16.0%: r < 0.8016
Expected 50.0%: r < 1.1289
Expected 84.0%: r < 1.6194
Expected 97.5%: r < 2.2782
This indicates that the 95% upperlimit observed is 1.8, while the median expected is 1.1 with 1sigma band [0.8, 1.6] and 2sigma band [0.6, 2.2]. This result tells us that we observed an excess in data at the level of 1-2 sigma.
Expected significance
combine -d hww_20fb_cut_comb.txt -M ProfileLikelihood -v 1 --significance --expectSignal=1 -t -1 -m 125 -n Expected
Output:
-- Profile Likelihood --
Significance: 1.80964
(p-value = 0.035176)
Observed significance
combine -d hww_20fb_cut_comb.txt -M ProfileLikelihood -v 1 --significance --expectSignal=1 -t -1 -m 125
Output:
-- Profile Likelihood --
Significance: 1.52713
(p-value = 0.063364)
Best fit signal strength with uncertainty
A maximum likelihood scan is performed to get the +/- 1 sigma error.
combine -d hww_20fb_cut_comb.txt -M MaxLikelihoodFit
Output:
--- MaxLikelihoodFit ---
Best fit r: 0.832828 -0.535222/+0.559076 (68% CL)
nll S+B -> -6.19048 nll B -> -5.02441
Building Shape analysis
Datacard
Upperlimit on signal strength
Expected significance
Observed significance
Best fit signal strength with uncertainty
Useful Links