L1 Trigger Check for L=1e31 HLT trigger menu

Path Name L1 Algorithm Installed
HLT_L1Jet15 L1_SingleJet15 yes
HLT_Jet30 L1_SingleJet15 yes
HLT_Jet80 L1_SingleJet50 yes
HLT_Jet110 L1_SingleJet70 yes
HLT_Jet180 L1_SingleJet70 yes
HLT_FwdJet20 L1_IsoEG10_Jet15_ForJet10 yes
HLT_DiJetAve15 L1_SingleJet15 yes
HLT_DiJetAve50 L1_SingleJet50 yes
HLT_DiJetAve70 L1_SingleJet70 yes
HLT_DiJetAve130 L1_SingleJet70 yes
HLT_QuadJet30 L1_QuadJet15 no
HLT_SumET120 L1_ETT60 yes
HLT_L1MET20 L1_ETM20 yes
HLT_MET25 L1_ETM20 yes
HLT_MET50 L1_ETM40 yes
HLT_MET65 L1_ETM50 yes
HLT_L1MuOpen yes
HLT_L1Mu L1_SingleMu7 or L1_DoubleMu3 yes
HLT_Mu5 L1_SingleMu5 yes
HLT_Mu9 L1_SingleMu7 yes
HLT_Mu11 L1_SingleMu10 yes
HLT_DoubleMu3 L1_DoubleMu3 yes
HLT_Ele15_LW_L1R L1_SingleEG10 yes
HLT_LooseIsoEle15_LW_L1R L1_SingleEG12 yes
HLT_IsoEle18_L1R L1_SingleEG15 yes
HLT_DoubleEle10_LW_OnlyPixelM_L1R L1_DoubleEG5 yes
HLT_Photon15_L1R L1_SingleEG12 yes
HLT_Photon25_L1R L1_SingleEG15 yes
HLT_IsoPhoton15_L1R L1_SingleEG12 yes
HLT_IsoPhoton20_L1R L1_SingleEG15 yes
HLT_DoubleIsoPhoton20_L1R L1_DoubleEG10 yes
HLT_BTagMu_Jet20_Calib L1_Mu5_Jet15 yes
HLT_LooseIsoTau_MET30 L1_SingleTauJet80 yes
HLT_IsoTau_MET65_Trk20 L1_SingleTauJet80 yes
HLT_LooseIsoTau_MET30_L1MET L1_TauJet30_ETM30 yes
HLT_IsoTau_MET35_Trk15_L1MET L1_TauJet30_ETM30 yes
HLT_DoubleLooseIsoTau L1_DoubleTauJet40 yes
HLT_DoubleIsoTau_Trk3 L1_DoubleTauJet40 yes
HLT_ZeroBias L1_ZeroBias no
HLT_MinBiasHcal no
HLT_MinBiasEcal L1_SingleEG2 or L1_DoubleEG1 yes
HLT_MinBiasPixel L1_ZeroBias no
HLT_MinBiasPixel_Trk5 L1_ZeroBias no
HLT_CSCBeamHalo L1_SingleMuBeamHalo yes
HLT_CSCBeamHaloOverlapRing1 L1_SingleMuBeamHalo yes
HLT_CSCBeamHaloOverlapRing2 L1_SingleMuBeamHalo yes
HLT_CSCBeamHaloRing2or3 L1_SingleMuBeamHalo yes
HLT_BackwardBSC 38 or 39 no
HLT_ForwardBSC 36 or 37 no
HLT_TrackerCosmics 24 or 25 or 26 or 27 or 28 no

† - L1_SingleMuOpen or L1_SingleMu3 or L1_SingleMu5
‡ - L1_SingleJetCountsHFTow or L1_DoubleJetCountsHFTow or L1_SingleJetCountsHFRing0Sum3 or L1_DoubleJetCountsHFRing0Sum3 or L1_SingleJetCountsHFRing0Sum6 or L1_DoubleJetCountsHFRing0Sum6

Higgs -> bb Long Exercise - CMSDAS 2014

%COMPLETE4%

Introduction

The observation of a Higgs boson-like in 2012 with a mass of 126 GeV has been an extraordinary event for CMS and for particle physics. The next goal of the experiment is to characterize the properties of this new boson, and confirm/exclude that it behaves as the Standard Model (SM) Higgs. One fundamental property to study is the new boson's coupling to fermions. We will focus on a H->bb search which has the largest decay branching ratio (~60% at 125 GeV) and particularly when H is produced in association with a vector boson (V = W or Z). This is commonly referred to as the VH(bb) analysis.

In this Exercise we will start with an overview of the differences between signal and background processes and corresponding discriminating variables in the channel. We will try to define our selection cuts in order to maximize the probability to observe a signal in our data. We will perform a fit using the dijet invariant mass as the discriminant to set exclusion limits on the Higgs production cross section (or observe an excess compatible with the SM Higgs). If time permits, we will also perform an analysis of the diboson process VZ(bb) that provides a crucial calibration for the H ->bb search. This Exercise is based on the LHCP 2013 m(jj) analysis in the highest boost regions with some simplification.

Here is a link to some slides for general introduction of the analysis..

The course area for this Exercise can be found at elog.

Prerequisites

We assume the participants to have basic knowledge of C++, ROOT and the CMSSW framework. We also assume the participants to have completed the CMSDAS Pre-Exercises and have an LPC computer account.

Suggested short exercises (not all of them, but as many as possible!): Jets, Btag & Vertexing, Particle Flow, RooStats, Electrons, Muons.

Facilitators

For CMSDAS @FNAL January 9-11, 2014:

Souvik Das (sdas@cernNOSPAMPLEASE.ch)
Caterina Vernieri (caterina.vernieri@cernNOSPAMPLEASE.ch)
Leonard Apanasevich (apana@fnalNOSPAMPLEASE.gov)
Jia Fu Low (jia.fu.low@cernNOSPAMPLEASE.ch)

If you have questions regarding the Exercise, please don't hesitate to ask the facilitators.

Exercise Outline

The Exercise is divided into a number of steps that should be completed in order. It is planned for the length of two days. On the first day, we will study the signal and background characteristics using MC samples, and try to optimize the selection to achieve the best sensitivity to the H->bb signal. On the second day, we will study how to use control regions in data to correct the Monte Carlo (MC) predictions and finally look at the real data in our signal regions. There are several questions to be answered in order (each one requires to large extent the studies performed for the previous questions). You will use the answers to the questions to prepare your presentation, so try to address them carefully smile )

Depending on the number of students, we will decide if we can cover all 5 VH(bb) channels. If there are 5 students, each student can take an individual channel. More experienced student should take more difficult channel (Z->νν, then W->eν, W->μν, Z->ee, Z->μμ).

The 2013 version of this Exercise focused on the WH channel and was better developed. It was an effort mainly carried out by David Lopes-Pegna and Michele de Gruttola. We decided to update in 2014 to include all five channels but it's still less well developed at the moment. Sorry about that frown

Getting Started

First of all, connect to a LPC machine. You might need to initialize your Kerberos credentials by doing ( Note: replace yourusername by your CMS username):

kinit -f -A yourusername@FNAL.GOV

(In recent Mac OS X versions, you will use kinit -A)

This will also allow you to login without using your cryptocard (if you have one). Now use SSH to connect to a LPC server:

ssh -Y yourusername@cmslpc-sl5.fnal.gov

Once connected, create the working directory, and set the CMSSW release.

mkdir VHbbAnalysis 
cd VHbbAnalysis

source /uscmst1/prod/sw/cms/cshrc prod
source /uscmst1/prod/grid/gLite_SL5.csh
source /uscmst1/prod/grid/CRAB/crab.csh

setenv SCRAM_ARCH slc5_amd64_gcc472
 
cmsrel CMSSW_6_1_2
cd CMSSW_6_1_2/src
cmsenv
cd ../..

Note about the Workflow

Several samples, both signal and various backgrounds, and datasets are required for this analysis. They are typically very large and require extensive use of grid computing to process. In order to avoid re-processing over the grid every time a minor selection criterion is changed (in the course of optimization), a multi-step workflow has been developed as discussed here. First, we run a PF2PAT sequence to produce EDM output ntuples (called "Step 1"). Then, we run on the Step 1 outputs to produce "Step 2" files. The purpose of the Step 2 is to reduce the output rate (i.e. skim) and save in simple root-tuple a fairly large set of variables used for the final analysis, so that the analysis can be run on a laptop.

The skimming part of the Step 2 does the following selection:

  • build a Higgs candidate by combining a pair of anti-kt R=0.5 particle-flow jets with pT>20. If there are more than two jets in the event, the pair with the highest pT (vectorial sum) is selected as the candidate.
  • build a vector boson candidate (Z->ll, W->lν, Z->νν) using well-identified isolated leptons and/or missing transverse energy (MET). Z->νν is obtained by simply requiring a large MET in the event when there is no isolated leptons.

For this Exercise, we apply additional cuts to further reduce the file size (requiring pT(V) > 120 GeV for Z->ll, pT(V) > 130 GeV for W->lν, and pT(V) > 150 GeV for Z->νν). These reduced Step 2 can be found at the following location:

/eos/uscms/store/user/cmsdas/2014/Hbb/Step2/

(Note: the "ZJets" name in ntuples refers to Z->νν events; while the "DYJets" refers to Z->ll events)

Task 1 (Day 1): Find discriminating variables

We will take a look at MC signal events (with mH=125 GeV) and get familiar with the most important variables. Open one of the following files:

# Zmm, Zee
/eos/uscms/store/user/cmsdas/2014/Hbb/Step2/Step2_ZllH125.root
# Wmn, Wen
/eos/uscms/store/user/cmsdas/2014/Hbb/Step2/Step2_WlnH125.root
# Znn
/eos/uscms/store/user/cmsdas/2014/Hbb/Step2/Step2_ZnnH125.root

You can open them in ROOT by root -l file.root. Once it's opened, you can browse by calling (in ROOT): new TBrowser. The definition of the variables in the root files is available in Ntupler.cc (but you don't have to go through it now).

Question 1: Which variable in the ntuple controls which vector boson mode we are reconstructing?
(Hint: its name has a "V".)

We have prepared macros (!GitHub repository) to help you make plots. To get them:

cp -r /eos/uscms/store/user/cmsdas/2014/Hbb/macros/ .
cd macros

For the Zmm channel, use the plotHistos_Zmm.C, which has some dependency (Note: for other channels, replace "Zmm" by the channel name):

  • plotHistos_Zmm.h: this is where all the functions are implemented.
  • tdrstyle.C: this is to make plots beautiful.
  • XSec_8TeV19invfb.h: this is to get cross sections of MC samples (will be relevant later).

Take a look at plotHistos_Zmm.C. You should quickly find where it says "Task 1 (a)". Try to understand what it is doing, and run :

root -l plotHistos_Zmm.C+

It should make a plot of the dijet invariant mass distribution from signal events and save it in png and pdf format (in plots directory). To view the plot, do display plots/Zmm_Hmass.png. (To quit, just press CTRL+C).

You can change the basic event selection (hence change the loading time) by changing this line:

TCut cutmc_all   = "Vtype==0";

you can change the variable to plot by changing this line:

TString var      = "H.mass";

and you can also apply additional cut per plot by changing this line:

TCut cut         = "V.pt>0";

The implementation of MakePlot(...) is in plotHistos_Zmm.h. Try more interesting variables and vary the cut. If you think of some variable but can't figure out the name of that variable in the ntuple, please feel free to ask the facilitators.

After that, you can comment out the block for "Task 1 (a)", and uncomment the block for "Task 1 (b)". Here, you can plot a variable using two different processes and compare their distributions. Try to make comparison plots of several variables for signal vs. V+jets (or ttbar).

Question 2: Can you find at least two variables for which the signal distribution is very different from the one for background events?
(Hint: think about what the final state observables are and their correlations.)

Task 2 (Day 1): Optimize signal selection

Now, we will start to build the set of selection criteria to isolate our signal events. The figure of merit (FOM) that we will use is a modified Punzi significance, given by:

S/(sqrt(B)+a/2+ 0.2*B)

where a is the statistical significance, and we aim for 3 sigmas. The linear term proportional to the background (=0.2 or 20%) accounts for the systematic uncertainty on the background knowledge. If you are interested to know more about this FOM, please read this.

Comment out the block for "Task 1 (b)", and uncomment the block for "Task 2". You will optimize for the best significance of the signal vs. all backgrounds added in the correct proportions. We recommend that you optimize the following variables:

  • pT(V): V.pt (variable name in the ntuple)
  • pT(H): H.pt
  • maxCSV: max(hJet_csv_nominal[0],hJet_csv_nominal[1]) (value goes from 0 to 1)
  • minCSV: min(hJet_csv_nominal[0],hJet_csv_nominal[1]) (value goes from 0 to 1)
  • Δφ(V,H): abs(HVdPhi) (value goes from 0 to π)
  • m(jj) window: H.mass

Let's spend a few minutes trying to understand what makePlots(...) does. Can you explain what is the difference between cutmc_all and cutmc? Since this uses all the backgrounds, it will take a longer time to run. We will run in batch mode to speed up:

root -l -b -q plotHistos_Zmm.C+
(Remember the "+" sign at the end!)

The MC samples are re-weighted by two scale factors: the pileup (PU) weight in order to match the data PU distribution, and the trigger weight which includes effects from trigger/lepton id/reconstruction efficiency. The leptonic triggers are not applied in the MC, so we re-weight the events based on the data trigger efficiency, which is determined bu using Tag-and-Probe techniques for all triggers involving leptons. Can you find out where these two weights are applied? (Hint: look for weightTrig2012 and PUweight)

For the MET triggers, we apply the MC simulated trigger decisions, and then correct the efficiency by a data/MC scale factor. In addition, we also apply a number of MET filters to reject events with fake MET. Look for triggerFlags and triggercorrMET(...).

As you can see the code prints the relevant variables for the FOM calculation. The task ahead of us is to determine the optimum set of cuts to maximize the FOM. As you have learned in the introduction, the topological signature that maximizes the FOM will be (using WH as example):

  • A boosted dijet system with both jets b-tagged (the Higgs candidate),
  • A boosted vector boson recoiling back-to-back w.r.t. the Higgs candidate
  • Little or none additional activity in the event (This cut is controlled by the string Sum$(aJet_pt>xx && abs(aJet_eta)<2.5)==0 in which you can choose the thresholds on the pT of the additional jets, if restrict to central jets or no, and if force/allow for extra jets passing the selection criteria listed)
  • No additional isolated leptons in the event (This cut is controlled by the string Sum$(aLepton_pt>xx && abs(aLepton_eta)<2.5 && (aLepton_pfCombRelIso<yy) )==0, currently not applied )
  • Restrict to an interval around the Higgs mass not less than 30 GeV (twice the typical jet-jet resolution)

Question 3: Find the set of cuts that optimizes the FOM
(Hint: use a "for" loop)

Question 4: Which are the most important backgrounds (i.e the largest)? Why? Prepare a couple of sentences to explain if/why you were expecting these backgrounds to be the most important ones.
(Hints:

  • Scan pT(V) and pT(H) at the same time, as they are strongly correlated, and possibly use asymmetric cuts.
  • Take into consideration the b-tag working points: CSV>0,898/0.679/0.244 for Tight, Medium, Loose.
  • Optimize against V+jets and ttbar as they are the dominant backgrounds.)

Once you are done, compare your optimized selections across channels. Do different channels end up with different selections? How come? Also, compare your results with the one in the 2013 VH(bb) analysis (Table 9 in AN-2013/069, which can be found here).

Note about b-Jet Energy Regression

In the VH(bb) analysis, a multivariate technique using boosted-decision-tree (BDT) regression is applied to calibrate the b-jet energy closer to its true value. You have probably realized by now that the dijet invariant mass is the most discriminating variable, and better jet energy measurement will improve the mass resolution. Due to some delay in preparation, the regression technique is currently not included in this Exercise. It was covered in the CMSDAS 2013 exercise (see here) which uses an older version of the ntuples, but they contain the regression-trained b-jet energy. Interested students can refer to this material or ask the facilitators for further information.

Task 3 (Day 1): Evaluate expected sensitivity using Higgs combination tool

A reasonable question to ask is if the FOM we chose is really providing the best results (i.e. best sensitivity). In the past (2011) we were interested in getting the best exclusion limits on the Higgs production as we didn't have enough statistics to be sensitive to the Higgs excess. With the current dataset at hand (and knowing that there is a new boson at 126 GeV!), we can try to optimize directly for the best significance of a possible Higgs excess.

In order to do this, we need to introduce the Higgs combination machinery (that you may be familiar with if you have done the RooStats exercise).

cd $CMSSW_BASE/src
git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit/
cd HiggsAnalysis/CombinedLimit/
git checkout V03-05-00
scramv1 b -j 5
rehash
cd ../../../..

Note that the HiggsAnalysis/CombinedLimit installation only needs to be done once.

First, we need to ask ourselves what are the systematic uncertainties associated with this search. So far, we have only looked at the MC simulation, and we do not know if they match the reality (real data), Can you name a few variables that could be different in MC vs. in data? Which ones have the largest effects on the search?

Now uncomment the block for Task 3 (keeping the Task 2). MakeDatacard(...) writes out a "datacard" (e.g. vhbb_Zmm_8TeV.txt for Zmm). This is a simple cut-and-count datacard, with the numbers of expected signal and background events and the observed number of events, as well as systematics. In the course of optimization, we want to remain "blind" to the real data to avoid (human) bias. Hence, the macro currently enters the total MC expectations for signal plus all backgrounds as your "observation". Try to understand what systematics are and how they are correlated. The best resource to help you is the Higgs combination twiki. Note that the systematics should be channel-dependent and selection-dependent, and should be measured carefully. For this Exercise, we are using some guesses, not realistic numbers.

To calculate the significance you will run:

combine -M ProfileLikelihood --signif datacard.txt

Question 5: Repeat the optimization using the significance from the combine tool as FOM.
(Hint: code a loop that produces many datacards and then feed them to combine. You should not start the optimization from zero, but start from the one you got with the Punzi FOM.)

Once you are happy, run the following to get the expected exclusion limit and report what you find:

combine -M Asymptotic datacard.txt

If you have enough time, try to repeat it for other Higgs mass points as well. And, if you are feeling adventurous, you can try to do a shape analysis using the m(jj) distribution. This is done in the actual analysis (see note below).

OPTIONAL: Suppose that you found in the optimization that the best sensitivity is pT(V)>130 for WH. It was found that splitting the events in two or more categories based on pT(V) helps improving the sensitivity. Can you guess why? Create two categories in pT(V) and then use the combineCards.py script to create the final datacard (for example, you may try pT(V)>130 && pT(V)<180 and pT(V)>180). The syntax of combineCards.py is:

combineCards.py datacard1.txt datacard2.txt > datacard_combined.txt

You may also try to optimize the significance by varying pT(V) cuts used to define the categories.

Note about Shape Analysis

Instead of optimizing for the best m(jj) window, one may consider including the full m(jj) spectrum in a "shape" analysis. This allows the final fit to use the m(jj) sideband to estimate the number of background events in the m(jj) signal window. You can think of a binned shape analysis as doing cut-and-count in more than one m(jj) bin, taking into account correlations between bins. In the VH(bb) analysis, this generally improves the sensitivity by 30-50% over cut-and-count.

Note about Multivariate Analysis

While doing the optimization studies, you may quickly realize that, as the number of variables grows, it becomes much more time consuming. You may wonder if there is a more clever way... Indeed, the problem is so common in high energy physics that there is a package in ROOT called TMVA that does multivariate data analysis. It allows one to use machine learning algorithms to find patterns among various input variables that can discriminate signal from background. In the VH(bb) analysis, we use the boosted decision tree (BDT) algorithm, which consists of a series of binary decision nodes, each node optimized to give the best signal/background separation. In the end, all the nodes are combined to give a final (continuous) discriminant. MVA could improve over an already optimized cut-and-count analysis by 10-30% sensitivity.

Task 4 (Day 2): Define control regions

Today, we are ready to start looking at data! When running on data, we need to apply additional cuts:

  • the events should fire the trigger bits corresponding to the trigger chosen for the analysis
  • the events should be flagged as good by the data quality group (EVENT.json==1)
This has already been done for you. The names of the triggers and the cut strings are listed in XSec_8TeV19invfb.h (e.g. trigZmm for Zmm).

We would like to use data to estimate the actual yields of the most important backgrounds. In order to achieve this, we need to build background-enriched control regions, not overlapping with the definition of the signal region that we have determined in the previous steps of the exercise, to isolate these important backgrounds. The idea is:

  • For W->lν channel, we can build 3 control regions to estimate, from data, the yields of these dominant backgrounds: W + light-flavor jets (LF), W + heavy-flavor jets (HF), and ttbar + jets with the following requirements:
    • W + LF: Same kinematic cuts for V & H, no b-tagging, no angular cuts, without additional jet activity.
    • W + HF: Same kinematic cuts for V & H but in the m(jj) sideband, at least one jet is b-tagged, no angular cuts, without additional jet activity.
    • TT: Same kinematic cuts for V & H, at least one jet is b-tagged, no angular cuts, with additional jet activity.

  • For Z->ll channel, we can build 3 control regions: Z + light-flavor jets (LF), Z + light-flavor jets (HF), and ttbar + jets with the following requirements:
    • Z + LF: Same kinematic cuts for V & H, no b-tagging.
    • Z + HF: Same kinematic cuts for V & H but in the m(jj) sideband, at least one jet is b-tagged.
    • TT: Same kinematic cuts for V & H but in the m(jj) and m(Z->ll) sidebands, only one b-tagged jet, with additional jet activity.

  • For Z->νν channel, we can build 5 control regions: W + light-flavor jets (LF), W + light-flavor jets (HF), Z + light-flavor jets (LF), Z + light-flavor jets (HF), and ttbar + jets with the following requirements:
    • Z + LF: Same kinematic cuts for V & H, no b-tagging, no angular cuts, tightened jet ID.
    • Z + HF: Same kinematic cuts for V & H but in the m(jj) sideband, at least one jet is b-tagged, no angular cuts.
    • W + LF: One isolated lepton, same kinematic cuts for V & H, no b-tagging, no angular cuts, without additional jet activity.
    • W + HF: One isolated lepton, same kinematic cuts for V & H but in the m(jj) sideband, at least one jet is b-tagged, no angular cuts, without additional jet activity.
    • TT: One isolated lepton, same kinematic cuts for V & H but in the m(jj) sideband, at least one jet is b-tagged, no angular cuts, with additional jet activity.

Let's start by building the V+LF control region for each channel, and plot a variable that we knows is different in data vs. MC: the number of primary vertices. In plotHistos_Zmm.C, remove the exclamation mark in front of plotData in

# To plot data, change !plotData to plotData
TString options  = "printStat:plotSig:!plotData:!plotLog";

Question 6: Find the variable in the ntuples for the primary vertex multiplicity. Plot data over MC for this variable with and without applying the PU reweighting variable PUweight. How does the MC distribution change compared to data? Send the two plots with and without PU reweighting to the facilitators. If you have extra time, you can also make pT(V) plots with and without applying the trigger weight.

Then, go ahead and make some of the most relevant distributions, and see if MC and data are in good agreement. You can make more than one plots in a single call by adding more MakePlots(...) commands. After that, start working on other control regions. As you will notice while playing with the cuts, it's very difficult to build control regions completely pure in one of these three background components. If you end up with too little statistics, you should loosen some of the cuts to gain more statistics.

Question 7: Prepare TCut's for the definition of the control regions based on Section 10 in AN-2013/069. Show... Hide ZllCR.png ZnnCR.png WlnCR.png

Question 8: Compare the data and MC yields you obtain with the TCut's you have just defined. Send to the facilitators the yields for data and all MC components that you have found.

Task 5 (Day 2): Evaluate and apply data/MC scale factors

The goal here is to provide data/MC scale factors (SF's) to correct the yields of V+LF, V+HF, ttbar processes. Due to the fact that the purity of these control regions is not 100%, we do not apply the raw data/MC scale factors to reweight these backgrounds. Instead, we use the MC templates for every backgrounds in every control region to fit simultaneously the scale factors. For this Exercise, we will do a simpler thing: focus on a single control region at a time, find a scale factor for the dominant background there; then do the next control region, applying the scale factor found previously. We should start with the largest background then do the next largest background and so on. A simple way to find the scale factor is to solve for SF:

Ndata = Nbkg i * SFi + Nall other bkgs

As a good practice, you should use a variable other than the m(jj) e.g. pT(V), maxCSV or minCSV. Admittedly, it does not matter for this simple scale factor method. In principle, you also want to know what is the uncertainty on your scale factors. Think about how to estimate the uncertainty.

Once you have your scale factors, you may enter them in plotHistos_Zmm.C by looking for scalefactors, and check whether you have improved (or degrade) data vs. MC agreement. Try to identify where they are applied.

They are applied when a plot is being "projected/drawn", see e.g.:
// The following enters each entry into the histogram with a weight of "cutmc" * PUweight * lumi_TT * sf_TT
// where lumi_TT = lumi_data * xsec_TT / nMCEvents produced
ev->TT->Project("TT", var, cutmc * Form("%5f * PUweight", ev->lumi_TT) * Form("%f", ev->sf_TT));

Note that as the optimization procedure depends on the relative amount of signal and background, now that you have reweighted some of the background contributions based on data control regions, you may wonder if your optimization is affected. So you could try to rerun the optimization with the scale factors applied (although you don't have to).

Question 9: Prepare data over MC plots for each control region for 3 variables of your choice, before and after applying your scale factors. Also, report the scale factors that you apply. Example plots can be found in pg.56-112 the AN-2013/069.

Task 6 (Day 2): Extract the observed limits from final data distributions

We are now ready to produce the final m(jj) plots in the signal region. We will repeat Task 2 and Task 3, but now with your own scale factors (and using your optimized selection)!

Note: for W->lν and Z->νν channels, if the macro is running too slowly, consider moving all the cuts in cutmc into cutmc_all, and do the same for cutdata and cutdata_all.

Question 10: Prepare data over MC plots for m(jj) using the signal region TCuts (except the m(jj) window cut).

Let's now run the limit extraction to get the observed limits. For each channel, you need to enter the observed yield on data and the MC expectations for the signal and the different backgrounds. After proper modifications to the datacards, to get the observed limits, just run again:

combine -M Asymptotic datacard.txt

How well do they match your expected limits? To characterize observed excess significance (if any):

combine -M ProfileLikelihood --signif datacard.txt
# Use the following to get expected significance without changing anything in datacards:
#combine -M ProfileLikelihood --signif -t -1 --expectSignal=1 datacard.txt

Question 11: Enter the MC expectations and observed yield on data in the signal region in your datacard and send it to the facilitators.

If every channel is managed to get a datacard, please upload it by doing:

cp vhbb_Zmm_8TeV.txt /eos/uscms/store/user/cmsdas/2014/Hbb/datacards

When all the datacards are there, download them all by:

cp -r /eos/uscms/store/user/cmsdas/2014/Hbb/datacards .

To combine them, we use the script combineCards.py:

cd datacards
combineCards.py vhbb_Zmm_8TeV.txt vhbb_Zee_8TeV.txt vhbb_Wmn_8TeV.txt vhbb_Wen_8TeV.txt vhbb_Znn_8TeV.txt > vhbb_8TeV.txt

Isn't that easy? Please open vhbb_8TeV.txt and take a look. Now try to get exclusion limit and statistical significance again. Does the "combined" has better sensitivity than the highest sensitivity of a single channel? By how much? How do your limits compare to the current official CMS limits?

If you were able to get to the shape analysis, you can repeat the limit extraction again. How much better is the shape sensitivity compared to that of the C&C? If you have uploaded your shape datacard (and its input root file) to the same location and downloaded other shape datacards, then you can also make a combined m(jj) plot by doing:

# First, go back to your "macros" directory
# Then, do:
root -l plot_combined_mjj.C+

Task 7 (Extra): Perform search with VZ(bb) as signal

As you know, background from diboson production such as WZ(bb) and ZZ(bb) have the same signature as our signal events (they are "irreducible"). For this reason, the VV background is challenging but it also provides a excellent sample to calibrate the VH(bb) analysis. Get an additional macro:

cp /eos/uscms/store/user/cmsdas/2014/Hbb/VV_Bkg.C $PWD

and look at it. Modify it appropriately and produce a VV-bkg subtracted plot to show if you see any evidence of diboson events with Z->bb. You can also try to extract the VZ(bb) signal, modifying properly the datacard you wrote for the VH signal to get the observed limit in this case.

Preparation of the presentation

Please review the slides given under Introduction. Plots can be shared by uploading them to the elog. Make some nice table from your results. If you require any material, please let us know. Good luck!

Additional information

-- LeonardApanasevich - 22 Jan 2009

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2014-01-27 - LeonardApanasevich
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback