QCD Background Estimation with Jet Smearing

A SUSY Discovery Analysis

This is the Twiki home page for the Jet Smearing analysis.


Currently, the code is located at UserCode/CRiedel/QCDSmearing/. It is split into four pieces as follows:

This EDanalyzer takes a sample of QCD and derives a jet-smearing function which depends on eta and phi. The function for each eta-pt bin is saved as a histogram in the output root file. Currently, it can use two methods: MC Truth (jetEt_truth/jetEt_measured) and photon+jets (gammaE_truth/jetEt_measured).
This is a simple EDFilter which selects well-measured ("virginQCD") events from an unfiltered Gumbo QCD sample ("bulkQCD") so that they can be smeared ("smearedQCD").
This EDproducer takes a smearing function and a sample of virgin QCD. It produces two new collections: a reco::CaloJetCollection for the smeared jets and a reco::CaloMETCollection for smeared MET. The CaloMETCollection has only one member, of course. Currently, these two products are not output to a file. If you want to use them, you must run AnalyzeSmearedQCD in the same path in your configuration file. (The reason for this is that smearing the jets probably doesn't take much time compared to calculating all the variables which are saved to histograms. Therefore, it doesn't make any sense to save all the smeared products to disk when you can just quickly regenerate them.)
This EDanalyzer takes a sample and produces many plots which may be relevant to a SUSY analysis. It can be run on bulk, virgin, or smeared QCD, or benchmark SUSY sample.

Extracting the Smearing Function

We have used MC Truth information to extract smearing functions from the MC, but we need to develop a way to get a data-driven smearing function. We've tried 2 different ways to get it from data: photon+jet events, and the simple 3-jet balancing method.

In events with only one high Pt photon and one high Pt jet, they should be balanced. Since the energy resolution for photons is much better than jets, just define the smear to by E_CaloJet/E_Photon.
Simple 3-jet balancing
In events with exactly 3 jets about a certain Pt threshold, look for events in which the MET is aligned along on of the jets. Attribute the MET completely to mis measurement of the associated jet. Smear defined by E_CaloJet/(E_CaloJet+MET).
The problem so far is that, for these two methods, the statistics appear to be much too low to construct an adequate smearing function. This is probably an error in the way they are implemented, so this needs to be investigated.

Ultimately, it seems like extracting a smearing function ourselves will be a large undertaking. It would probably be more efficient to construct a smearing function using the jet resolution and jet energy corrections which are produced by the JetMET group. Here are some quick specifics on the terminology, which I didn't know before I talked to the JetMET guys:

  • Jet Energy Corrections (JEC) refer specifically to correcting the offset between the true jet energy and average jet energy of the reconstructed jet. The "factorized" strategy which the JetMET guys are using means that there are several individual JEC corrections, depending on how far back you want to correct the energy: to the particle flow (generator) level, the parton level, etc.
  • Jet Resolution refers specifically to the broadness of the resolution function i.e. the width of the smearing function.
  • I have been using the term smearing function to refer to the function which incorporates both of the above. That is, smearing function = E_{RecoJet} / E_{TrueJet}, where E_{TrueJet} usually is E_{parton}.
In principle, we should be able to just apply the appropriate JEC's (hopefully all the way back to the parton level, where the true MET is necessarily zero) and be left with no absolute offset in our smearing function. Then, the smearing function is just the resolution function.

So, we would like to confirm that the JEC's adequately eliminate the offset, and then take the jet resolution functions (binned by eta, pt, a flavor-correlated variable, and possible by EM Energy fraction) to be our smearing function. From what I can tell, the JetMET group only measures the RMS of the resolution function. This is a problem since we don't think the resolution function is actually Gaussian, but rather has significant tails (large kurtosis). So, we could try to get these tails ourselves through the methods discussed above, or we could try to see if the JetMET group code is easily modifiable to extract the full resolution function. The latter makes more sense, I think.

Filtering for Virgin QCD

We tried cutting on MET, MET significance, MET/sqrt(sumEt), and MET/sumEt. (This was using the Gumbo sample with L2L3L4 corrected jets). Cuts on MET significance and MET/sqrt(sumEt) were found to have identical effects. This makes sense because the only difference between these variables are equal up to small correction terms in MET significance.

As expected, the best variable for identifying poorly measured events---without biasing the virgin sample compared to the bulk QCD---was MET significance. The number of events (weighted for 100pb^-1 cross section) and entries (raw number of Gumbo events with no weighting) which are left after cuts is displayed in the chart below:

METsig < events entries
(no cut) 24680000 137696
4.0 24640000 112112
3.0 23770000 094009
2.0 19890000 065912
1.5 15560000 047341
1.0 09458000 027476
0.7 05854000 015757
0.5 03230000 008828
0.3 01330000 003430
0.2 00485800 001561

METsig was the best variable, but cutting on it still biased our sample somewhat. Cutting on METsig led to the following effects (click variable names for plots showing relative effect):

  • The fraction of events with the leading jet Et below 200 GeV increased.
  • The fraction of events with sumEt below 500 GeV increased.
  • The events were biased toward jets with a larger fraction of EM energy: the fraction of events with EmFrac of the leading jet below 0.5 decreased and the fraction of events with EmFrac of the leading jet above 0.5 increased.
  • The events were biased toward eventEta in the central region.
Not very much biasing was observed in the jet multiplicity of the event. The plots were all made with this C++ script which allows one to easily see the effect on one variable due to various cuts on another variable.

Also troubling was the fact that the smear of the leading jet (the ratio of its reconstructed enegy to its MC truth energy) seemed to get only slightly better after cutting on METsig. Rather than tightening the smearing function around unity, the cuts on METsig seemed to cause it to peak around 0.7. However, the cuts did seem to substantially decrease drastically mis-measured jets (smearing factor > 1.7 or < 0.3).

Double Smearing

When producing the smeared QCD sample, we face the problem of double smearing. To get the original sample of QCD (before smearing) we need to take it from actual data. We try to select for well-measured virgin events with the filtering described above, but it seems that we will only be able to slightly improve the "well-measured-ness" of events with this filter. Then, when we smear the virgin sample with the smearing function, the original parton-level jets will have been smeared twice (first by detector mis-measurement, and then manually by us). In general, double smearing should result in a conservative over-estimation of the QCD background MET spectrum. For individual events, it is possible for double smearing to reduce the MET. But in general, the two acts of smearing should be uncorrelated. So there should be more MET in the smeared QCD than the actual QCD background.

Since we are really only worried about the high-MET background, our major concern is accurately estimating the QCD in this regime. This corresponds to few very poorly measured jets (the tails of the smearing function) or many jet which are only somewhat poorly measured (the bulk of the smearing function). Since the virgin filtering seems able to reduce the number of event with very poorly measured jets, we probably don't have to worry about events where the jets happen to have been smeared by the tails of the smearing function twice. However, we will still have to deal with events which have been smeared by the bulk once and the tail one.

Then again, these distinctions may not be that important. After all, no matter how the jets in an event were smeared by the reconstruction, it will only pass our filter if it has low MET significance. So even if the original parton jets are smeared twice, the first smearing had little effect on the MET significance.


Once we have the virgin QCD and the smearing function, the actual smearing seems to be pretty straight forward. A minor issue is what to do with jets which are outside of our smearing function binning (pt < 20GeV, |eta| > 5). Right now they are just left alone---no smearing.

Variables for distinguishing between SUSY and QCD


Some quick terminology, which is true to the best of my knowledge:

  • ET and MET refer to all calo tower energy, whether or not it is associated with jet
  • HT and MHT refer only to energy identified with jets. Typically, I think this means calo tower energy.
  • "MET significance" is very very closely, but not exactly, equal to MET / sqrt( sum ET ). The sum is over the same energy source as are used to calculate the MET.


The Hamburg ABCD method changed from using MET to using MHT as their primary discriminating variable. This is probably a more natural choice for our analysis too, since we are concentrating only on smearing energy due to identified jets.

MET vs. MET significance

More importantly, it seems to make sense to use MET significance (or MHT significance?) rather than MET (or MHT). After all, we are looking for events with MET which is unlikely to be due to jet mis-measurement.

Additionally, for our discriminating variable we really should be the same variable we used to filter for the virgin QCD. That way, when we smear the virgin QCD, we will be extrapolating into the signal region (high MET significance) from strictly outside the signal region (low MET significance). If we were to filter on MET significance but using MET as our discriminating variable instead, then our virgin QCD is going to have some events already in the signal region (high MET) before we even smear.

And remember, we have to cut on MET significance (or maybe MHT significance?). Otherwise, we will bias our virgin sample.

Angular Information in the Transverse Plane

The problem with just using MET or MET significance is that they don't capture the information concerning the direction of the MET compared to the jets in the event. Two events with equal MET and MET significance might different likelihood of being SUSY because the MET is aligned along the jets in one event but not in the other. MET aligned along one or more jets is more likely caused by mis-measurement of those jets. MET not aligned along an jets is more likely to be due to a LSP.

The TDR defines two similar variables, R1 and R2:

  • R1 = sqrt(DPhi2^2 + (Pi-DPhi1)^2)
  • R2 = sqrt(DPhi1^2 + (Pi-DPhi2)^2)
where DPhi1 (DPhi2) is the angle in the transverse plane between the MET and the highest (second-highest) pt jet in the event. Both the TDR and the Hamburg ABCD analysis cut on these variables: R1, R2 > 0.5. Upon examination, it is clear that these cuts are designed to eliminate events with a pair of back-to-back jets aligned along the MET. Plots in the TDR show that these cuts are very effective at eliminating QCD fakes.

However, the angular variables clearly do not capture all the angular information. The TDR further defined DPhiMin, which is the angle in the transverse plane between the MET and the closest jet. The TDR makes the cut DPhiMin > 0.3. Hamburg, on the other hand, uses DPhiMin as one of the two variables for the ABCD method (the other being MHT).

Alternate Variables

To me, the above choices of angular variables seem a bit crude. I have an idea for a variable I call DirectionalMETSignificance which is slightly more complicated but could potentially incorporate all the information in the event (both MET and all the angles).

However, it is likely that---crude or not---the above angular variables capture all the significant angular information in the event so that further cutting won't be helpful. Need to do studies to determine...

To Do

  • Add things to this list!
  • Settle on filter for virgin QCD and demonstrate that this filter (a) doesn't effect physics variables like sumEt, and (b) improves the resolution of jets in the sample as calculated with MC truth.
  • Do study to see if we can extract robust smearing function from 3-jet QCD sample with the method below. Specifically, we want to see it is more likely that MET aligned along jet is due mostly to (a) mis-measurement of that jet or (b) a mis-measurement of several jets.
    • Find events where MET is aligned along one jet
    • Assume that mis-measurement of that jet is source of MET
  • In several of the modules in the QCDSmearing package, the reco jets are matched by hand to the genJets, using a simple Delta R cutoff. When time permits, we should instead use PAT collections which contain both type of jets together, already matched. This will simplify the code.
  • Settle on the ultimate discriminating variables.
  • See if DirectionalMETSignificance is a useful variable.
  • Investigate whether we can use the eta of the event (forward vs. central) to separate QCD from SUSY, a la David Stuart.

-- CharlesRiedel - 17 Sept 2008

Topic attachments
I Attachment History Action Size Date Who Comment
C source code filec cut_test.C r1 manage 2.6 K 2008-09-15 - 12:32 CharlesRiedel  
PNGpng eventEta_bias.png r2 r1 manage 11.7 K 2008-09-11 - 13:25 CharlesRiedel  
PNGpng jetMult_bias.png r2 r1 manage 12.5 K 2008-09-11 - 13:26 CharlesRiedel  
PNGpng leadEmFrac_bias.png r2 r1 manage 14.2 K 2008-09-11 - 13:26 CharlesRiedel  
PNGpng leadEt_bias.png r2 r1 manage 16.7 K 2008-09-11 - 13:26 CharlesRiedel  
PNGpng leadSmear_bias.png r2 r1 manage 14.4 K 2008-09-11 - 13:26 CharlesRiedel  
PNGpng sumEt_bias.png r2 r1 manage 14.0 K 2008-09-11 - 13:26 CharlesRiedel  
Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2008-09-16 - CharlesRiedel
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback