ABCD Background Estimation
 With TRooABCD

Rate This Page
Score: 0, My vote: 0, Total votes: 0
WillButtinger

This twiki page describes how to perform a likelihood-based version of the ABCD background estimation technique, using the TRooABCD application of the TRooFit toolkit (an extension for the RooFit model-building framework).

trooabcd example.png
Example output of TRooABCD tool. The top four panels show the post-fit distributions in the four regions (C, A, D, and B). The bottom left panel shows the fitted transfer factor, and the bottom right panel has summary information

Quick Start Instructions

To set up for use of TRooABCD, first set up an analysis release (only needed to get ROOT), then clone the TRooFit project and compile it:

mkdir source build
cd build
acmSetup AthAnalysis,21.2.13
acm clone_project TRooFit will/TRooFit TRooFit-ABCD-0.2.0
acm find_packages
acm compile

If all works, you should be able to open the ROOT prompt and create a TRooABCD object like this:

root [0] TRooABCD abcd("abcd","abcd");

Basic usage of TRooABCD

To use TRooABCD you must create histograms of your input data. Then the general idea is that you:

  1. Create an instance of TRooABCD
  2. Load histograms of the data in each region into TRooABCD using the AddData(region, histogram) method. To blind the signal region, simply do not provide the tool with a histogram for that region (if you are unblinded and have zero data in the signal region, then give the tool an empty histogram).
    • Note that if your signal region is blinded you might want to use the asimov (expected) data for this region. See this section for more info
  3. Change some of the properties of the model (e.g. allow the "m1" parameter to float if you want a linear model) (use abcd.PrintParameters() to list the available parameters)
  4. Call the Fit method.

What is TRooABCD doing? It is a simple tool that will take histograms and other numbers (scale factors or uncertainties) and construct a TRooFit likelihood model that can be fit to the data. The fit can be performed by the tool and the pre and post-fit distributions will be displayed, along with diagnostic information to help assess the goodness of the fit and give a final background estimate in the signal region.

The most basic ABCD example

The minimum input required for the TRooABCD tool is a set of histograms corresponding to data present in the control regions (region B, C, and D). In the most basic of example, these would all be single-bin histograms, i.e. simply the event count in each region. When doing a single-bin ABCD background-estimation, you can only use the uniform transfer factor model (modelType=0), since there are not enough measurements to use the linear model. To use the linear model you need at least two bins for the region C and D histograms.

//Create histograms for region B, C, and D, and fill with data ...
TH1D* h_b = new TH1D("h_b","region B data",1,0,1);
TH1D* h_c = new TH1D("h_c","region C data",1,0,1);
TH1D* h_d = new TH1D("h_d","region D data",1,0,1);

h_b->Fill(0.5, 55);
h_c->Fill(0.5, 7);
h_d->Fill(0.5, 103);
//with these numbers, the signal region prediction should come out as: 55*7/103 = 3.7

TRooABCD abcd("abcd","my abcd");
abcd.AddData(1,h_b); //region 1 = region B
abcd.AddData(2,h_c);
abcd.AddData(3,h_d);

TRooFitResult* result = abcd.Fit(); 

//result object can be used, for example, to inspect correlation matrix of fit:
TCanvas newCanvas;
result->Draw("cor"); //draw the correlation matrix

The above example should perform a fit in region B, C, and D, using the basic ABCD uniform model. In this model, the assumption is that:

  • $N_A = \tilde{m}_0\tilde{N}_B$
  • $N_C = \tilde{m}_0\tilde{N}_D$

where $\tilde{m}_0$ is the transfer factor. The three degrees of freedom are $\tilde{N}_B,\tilde{N}_D,\tilde{m}_0$, which are fit to the data.


The following examples use the histograms found in the demo file inside the package. This file contains histograms for the data and the signal shape in the four regions illustrated in the figure below:
figure1.png
Shape of data and signal used to build the example histograms in the package. The example histograms are 100-bin histograms constructed for each region.

Using multiple bins

You can give histograms to TRooABCD that have multiple bins in them, and each bin of region B and D will be given its own degree of freedom. You should ensure the binning is consistent between regions A (C) and B (D). The example below will perform the standard ABCD method on 5-bin histograms:

///Load the histograms
  TFile f1("$SourceArea/TRooFit/demo/ABCD/abcdInputHistograms.root");
  TH1* h_data[4];
  const char* regions[4] = {"A","B","C","D"};
  for(int i=0;i<4;i++) {
    h_data[i] = (TH1*)f1.Get(Form("data_%s",regions[i]));
    h_data[i]->Rebin(20); //rebin to 5 bins (100 bins initial, group 20 bins into one)
  }

  TRooABCD abcd("abcd","my abcd");
  ///add regions B, C, and D only
  for(int i=1;i<4;i++) {
    abcd.AddData(i,h_data[i]);
  }
  
  TRooFitResult* result = abcd.Fit(); 
  

The output of this fit is shown in the image below:

tutorial1.png
Output of multi-bin fit with the uniform ABCD model

As can be seen from the fit, the standard ABCD model appears to be inappropriate for this data (the transfer factor is not uniform in the x variable).

Using a linear ABCD model

From the post-fit results, the example appears to favour a transfer factor with an approximately linear dependence on the x variable. This linear ABCD model is given by:

  • $N_A(x) = (\tilde{m}_0 + \tilde{m}_1x)\tilde{N}_B(x)$
  • $N_C(x) = (\tilde{m}_0 + \tilde{m}_1x)\tilde{N}_D(x)$

Note that the additional degree of freedom ($\tilde{m}_1$) means that this model cannot be used in a single-bin fit where the signal region is blinded - in this single-bin blinded scenario there would only be three data points ($N_B,N_C,N_D$), but four degrees of freedom ($\tilde{N}_B,\tilde{N}_D,\tilde{m}_0,\tilde{m}_1$).

With TRooABCD, you can activate this model by changing the const status of the "m1" parameter before running the fit:

abcd.GetParameter("m1")->setConstant(false); //effectively activates the linear model
  TRooFitResult* result = abcd.Fit();

The output of this fit is shown below:

tutorial2.png
Output of multi-bin fit with the linear ABCD model

This model appears to be much more appropriate for this data, and the background prediction in the signal region is now much larger than the one obtained from the (inappropriate) uniform model.

Adding a signal of unknown strength

The likelihood-based ABCD method can account for signal contamination, even when the strength of the signal is unknown (which is often the case in exotic physics searches). Signal histograms can be added to the tool using the AddSignal method. For this example we will also only add the signal contamination for regions A and C (i.e. assume no signal contamination in region B and D), because it leads to interesting results in this example.

//load signal
  TH1* h_signal[4];
  for(int i=0;i<4;i++) {
    h_signal[i] = (TH1*)f1.Get(Form("sig_%s",regions[i]));
    h_signal[i]->Rebin(20); //rebin to 5 bins (100 bins initial, group 20 bins into one)
  }
  //add signal for only region A and C
  for(int i=0;i<4;i++) {
    if(i==1||i==3) continue; //skip region B and D for this demo
    abcd.AddSignal(i,h_signal[i]);
  }

The result of the fit is shown in the image below:

tutorial3.png
Output of multi-bin fit with the linear ABCD model, when signal is included. The fit is run twice in order to determine the uncertainty due to signal strength uncertainty

When you provide any signal histograms to TRooABCD, signal terms will be added to the model, with a free parameter $\tilde{\mu}$ representing the signal strength:

  • $N_A(x) = (\tilde{m}_0 + \tilde{m}_1x)\tilde{N}_B(x) + \tilde{\mu}N_{A}^{sig}(x)$
  • $N_C(x) = (\tilde{m}_0 + \tilde{m}_1x)\tilde{N}_D(x) + \tilde{\mu}N_{C}^{sig}(x)$

The nominal background prediction is obtained after running a fit with the signal strength parameter floating. TRooABCD chooses the initial value for the signal strength such that the total signal yield across the unblinded regions is equal to the data in those regions - in this way, the initial assumption is that all of the data is in fact signal rather than background. The post-fit result shows that the fit favours a small amount of signal contamination in region B, leading to a large signal prediction in region A (the red histogram).

Uncertainty due to signal strength
Since the signal strength is actually an unknown, TRooABCD will also rerun the fit with the signal strength fixed at 0 (i.e. neglecting all signal contributions). The difference between the resulting background prediction and the nominal prediction is taken as a systematic uncertainty, and is labelled "$\mu$ syst." on the post-fit results page.

By performing this fit with and without signal (where signal is allowed to be sufficiently large to cover the data), any issue with signal having the same shape as the background (data) will manifest itself in a large uncertainty on the background prediction.

Adding a systematic uncertainty (e.g. non-closure uncertainty).

It is important that you perform a validation of your ABCD background prediction by rerunning your setup with a validation region in place of your signal region. When you do this, you might find non-closure, in which case an additional systematic uncertainty should be added to the background prediction. To add an uncertainty to the background prediction in region A, you can use the following method:

abcd.AddBkgScaleFactor(0, 1.0, 0.04); //0 = region A, 1.0 = sf value, 0.04 = sf uncert 

The above will effectively add a 4% uncertainty onto the background prediction in region A.

As the name suggests, you can also use this method to add scale factors too. This method can therefore effectively be used to incorporate the $\tilde{\rho}$ parameter of the roostats four-bin instructional. Effectively just add it as a scale factor to either region A or B.

All scale factors will be constrained in the likelihood by a gaussian with appropriate mean (= sf value) and standard deviation (=sf uncert).

Validation and Asimov data

The 'signal region' (region A, or region 0) has two special features in the tool. Data can be added to this region in three ways:

  • AddData(0, hist) - data is added as unblinded data, and will be used in the fit
  • AddValidationData(0, hist) - data is added as validation data. It will appear in blue in the fit panel, but this data will not be used in the fit.
  • AddAsimovData(0, signalStrength=0) - will add the data expected under the background-only (or signalStrength=whatever) hypothesis and will use that expected data in the final fit. When the fit is run, first a fit will be perform with given signal strength and the prediction in the signal region will be used as expected data. Then the main fit is run that uses this expected data as if it was actual data. It appears in the fit panel as a hollow circle. Adding asimov data in this way will artifically reduce your statistical uncertainty, since it is effectively giving you extra stats for nothing (it's assuming that you will see some events you haven't yet seen).

Inspecting and saving the TRooFit model

The model can be inspected and even saved by importing it into a RooWorkspace. Just use the GetModel method of TRooABCD to obtain the RooSimultaneous object that represents the complete model, and import it into a workspace:

RooWorkspace w("w","w");
  w.import( *abcd.GetModel() );
  w.Print(); //inspect the workspace

The type of objects you will find in here are (only listing important objects):

pdfs:

  • TRooH1D: For each term in the stacks, you will find one of these objects (e.g. called bkg0, bkg1, signal0, etc). These can be inspected with w.pdf("bkg0")->Draw() for example.
  • TRooHStack: These represent a stack of the TRooH1D. There should be four of these. Can be drawn also.

functions:

  • TRooHF1D::transfer1: this will be the transfer factor function. Draw it, with errors, like this: ((TRooHF1D*)w.function("transfer1"))->Draw("e3005",result) (you must pass the result object, which contains the covariance matrix required to calculate the errors correctly).

-- WillButtinger - 2017-12-23

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf ABCDGuide_draft18Oct18.pdf r1 manage 530.2 K 2018-10-17 - 14:58 WillButtinger  
PNGpng abcd_logo.png r1 manage 5.4 K 2017-12-24 - 01:17 WillButtinger  
PNGpng figure1.png r1 manage 18.0 K 2018-01-02 - 21:17 WillButtinger  
PNGpng trooabcd_example.png r1 manage 19.8 K 2017-12-23 - 20:45 WillButtinger  
PNGpng tutorial1.png r1 manage 17.9 K 2018-01-02 - 21:18 WillButtinger  
PNGpng tutorial2.png r1 manage 18.5 K 2018-01-02 - 21:51 WillButtinger  
PNGpng tutorial3.png r1 manage 19.9 K 2018-01-02 - 22:19 WillButtinger  
Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2018-10-17 - WillButtinger
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback