 TWiki> RooStats Web>RooStatsTutorialsJune2013 (revision 16)EditAttachPDF

# RooFit/RooStats tutorials for INFN School of Statistics 2013

Set of RooFit/RooStats tutorials given at INFN school of statistics (3-7 June 2013) , see the school agenda .

Below is given the required materials used during the RooFit/RooStats tutorials. The materials consists of exercises that can be solved from what has been presented in the lecture. Two levels of help will be provided: a hint and a full solution.

Here the hint is shown.
Here the solution is shown.
The hint allows to create the ROOT macro required to complete the exercise step by step, while the solution provides the full running code. You can find the single code attached (links at the end of the page) or you can use the provided tar file. Some points linked to the exercises are optional and marked with a icon: they are useful to scrutinise in more detail some aspects. Try to solve or look at them if you have the time.

Additional material can be found on:

## Introduction

### Getting started with the software

A few advice words before starting. The tutorials are based on the ROOT version 5.34, possibly the latest one, 5.34.07 which is also installed in the Virtual Machine available for the school.

```           root>  .x myMacro.C        // interpreted
root>  .x myMacro.C+      // compiled
```
• It is recommended to add the following at the beginning of your ROOT macro, that will load automatically the Roofit and Roostats libraries:
```           using namespace RooFit;
using namespace RooStats;
```
• Avoid running a macro multiple times in the same ROOT CINT sessions (it might unfortunately crash sometimes if you do it, because some objects might not be properly deleted in the macro)
• The Roostats calculator are quite verbose, you can suppress the output by doing:
```RooMsgService::instance().setGlobalKillBelow(RooFit::WARNING) ;
```

### Terminology and Conventions

Here we give pragmatic definitions for a few basic concepts that we will use.
• observable - something you measure in an experiment, for example a particle's momentum. Often, a function of measured quantities, for example an invariant mass of several particles
• global observable or auxiliary observable - an observable from another measurement, for example, the integrated luminosity
• model - a set of probability density functions (PDFs), which describe distributions of the observables or functions of observables
• model parameter - any variable in your PDF expression, which is not an observable
• parameter of interest (POI) - a model parameter that you study, for example a cross section of your signal process
• nuisance parameter - every other model parameter, which is not your parameter of interest
• data or dataset - a set of values of observables, either measured in an experiment, or simulated
• likelihood - a model computed for a particular dataset
• hypothesis - a particular model, with specified observables, POI, nuisance parameters, and prior PDFs (in case of Bayesian inference)
• prior PDF - a probability density for an observable or a model parameter, which is known a priori, i.e. before a measurement is considered. This is a Bayesian concept exclusively. Prior has no meaning or place in a frequentist type of inference
• Bayesian - a type of statistical inference that usually produces probability of the hypothesis given the data. Requires a prior.
• frequentist - a type of statistical inference that usually produces probability of the data given the hypothesis.

# RooFit Exercises: Create and fitting a model

The aim of the RooFit exercises is to learn how to build a RooFit model, how to fit the model on a set of data that we will generate and how to plot the result. We will also learn how to save the full RooFit workspace containing the model and data in a file which we can read back later, for example when we will do the RooStats exercises. We will learn also how to generate the data from the model. The same generated data we will then use to fit the model

During the exercises we will give some code snippets and hint on how to write a full ROOT macro which can solve the exercises. If you are not able to write the macro yourself, because you find too difficult or because the time is not sufficient, you can use directly the solution macro which will contain the running code and (hopefully) you will be able to run the tutorial exercise.

Using the snippets of code, one can build a full ROOT macro which can be run interactively in ROOT or compiled using ACLIC.

## Exercise 1: Gaussian model and fit it to random generated data

We will start with a very simple exercise. If you already know a bit of RooFit you might skip this exercise and go directly to the next one. The aim of this exercise is to show and make you practise the basic functionality of RooFit.

We will create a Gaussian model from which we will generate a pseudo-data set and then we will fit this data set. After fitting, we will plot the result. You can find the instruction for each steps in the lecture slides.

• Start creating the Gaussian model using the workspace factory, with the syntax introduced in the lecture slides.
• Use the `generate(...)` method of the `RooAbsPdf` class to generate a data set with 1000 events.
• Plot the data set using `RooPlot` in a ROOT Canvas.
• Fit the generate data using the `fitTo(...)` method of the `RooAbsPdf` class
• Plot the resulting fit function on top of the data

• Creating the model
• Use the syntax of the `RooWorkspace` factory to create first the Gaussian p.d.f. function with the variables (observables and parameters).
• Every variable in RooFIt, when created needs to be created with a value a min and a max allowed range. Use very large value if you don't know the range. If you provided only a value, the variable is considered constant. If you provide only the minimum and the maximum, the initial value will be taken as half the range. It is important, to avoid undesired side effect the value given should be defined between the min and max of the given range.
• Note one can create like this any existing RooFit pdf class by using the Pdf class name removed by the `Roo` suffix. The required variables to be passed depends on the pdf, and they are defined according to the class constructor.
• You need to define the [value, min, max] of a variable only the first time you create in the factory. Afterwards you can reference the variable by its name.
```// create Gaussian Pdf
RooWorkspace w("w");
w.factory("Gaussian:pdf(x[-10,10],mu[1,-1000,1000],s[2,0,1000])");   ```
• Retrieving objects from the workspace
• After you have created the variable and p.d.f in the workspace, you can access their pointers, by using `RooWorkspace::var` for variables or `=RooWorkspace::pdf` for p.d.f. Example:
```// retrieving model components
RooAbsPdf * pdf = w.pdf("pdf");   // access object from workspace
RooRealVar * x = w.var("x");   // access object from workspace
```
• Generating a dataset:
• Note that you can generate also binned data sets. In that case you have two options:
• use a `RooDataHist` and `generateBinned`: `RooDataHist * binData = pdf->generateBinned(*x, 1000);`.
• use the option `AllBinned()` which will generate a weighted data set: `RooDataSet * data = pdf->generate(*x, NumEvents(1000), AllBinned());`.
```// generate an unbinned dataset of 1000 events
RooDataSet * data = pdf->generate( *x, 1000); ```
• Plot the data:
```// create a RooPlot object and plot the data
RooPlot * plot = x->frame();
data->plotOn(plot); ```
• Fit the data: you need to call the `RooAbsPdf::fitTo`.
```// fit the data and save its result. Use the optional =Minuit2= minimiser
RooFitResult * res = pdf->fitTo(*data, Minimizer("Minuit2","Migrad"), Save(true) );```
• Plot the resulting fit function in the same plot
```pdf->plotOn(plot);
plot->Draw();   // to show the RooPlot in the current ROOT Canvas
```

```// make a simple Gaussian model

#include "RooWorkspace.h"
#include "RooRealVar.h"
#include "RooAbsPdf.h"
#include "RooDataSet.h"
#include "RooPlot.h"

#include "TCanvas.h"

using namespace RooFit;

void GaussianModel(int n = 1000) {

RooWorkspace w("w");
// define a Gaussian pdf
w.factory("Gaussian:pdf(x[-10,10],mu[1,-1000,1000],sigma[2,0,1000])");

RooAbsPdf * pdf = w.pdf("pdf");   // access object from workspace
RooRealVar * x = w.var("x");   // access object from workspace

// generate n gaussian measurement for a Gaussian(x, 1, 1);
RooDataSet * data = pdf->generate( *x, n);

data->SetName("data");

// RooFit plotting capabilities
RooPlot * pl = x->frame(Title("Gaussian Fit"));
data->plotOn(pl);
pl->Draw();

// now fit the data set

// plot the pdf on the same RooPlot object we have plotted the data
pdf->plotOn(pl);
pdf->paramOn(pl, Layout(0.6,0.9,0.85));

pl->Draw();

// import data in workspace (IMPORTANT for saving it )
w.import(*data);

w.Print();

// write workspace in the file (recreate file if already existing)
w.writeToFile("GaussianModel.root", true);

cout << "model written to file " << endl;

}
```

• Gaussian Fit Result: The fit has only two shape parameters. How can you also add the a normalisation fit parameter, the number of events ?

You need to make an Extended likelihood fit. For doing this in RooFit you need to create a `RooExtendPdf` class. This can be done as following:
```w.factory("Gaussian:gaus(x[-10,10],mu[1,-1000,1000],sigma[2,0,1000])");
w.factory("ExtendPdf:pdf(gaus, nevt[100,0,100000])");``` The fit is an unbinned likelihood fit. How can you make a binned likelihood fit ?

Generate a binned data set using `generateBinned` which will return a `RooDataHist` class or by using `generate` with the option `AllBinned`. In this last case a `RooDataSet` class is returned.

## Exercise 2: Fit of a Signal over an Exponential Background

In this exercise we will create a composite model formed by a Gaussian signal over a falling exponential background. We will have 100 Gaussian-distributed signal event over ~ 1000 exponential-distributed background. We assume that the location and width of the signal are known. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.3. The exponential decay of the background is 2.

We will follow the same step as in the previous exercise:

• Start creating the model using the workspace factory. Create the separate signal and background pdf and then combine together using the `RooAddPdf` class.
• Use the `generate(...)` method of the `RooAbsPdf` class to generate a data set with 1000 events.
• Plot the data set using `RooPlot` in a ROOT Canvas.
• Fit the generate data using the `fitTo(...)` method of the `RooAbsPdf` class
• Plot the resulting fit function on top of the data
• Save the workspace in a file

• Creating the model
• Use the syntax of the `RooWorkspace` factory to create first the Gaussian p.d.f. function with the variables (observables and parameters) and the Exponential
• Create then a `RooAddPdf` using the Gaussian and the Exponential and the relative number of events. Note that we could instead of the number of events a relative fraction. In this last case we would fit only for the shape and not the normalisation.
• The `RooAddPdf` is created using the `SUM` operator of the factory syntax.
```// create model
RooWorkspace w("w");
w.factory("Exponential:bkg_pdf(x[0,10], a[-0.5,-2,-0.2])");
w.factory("Gaussian:sig_pdf(x, mass, sigma[0.3])");

w.factory("SUM:model(nsig[100,0,10000]*sig_pdf, nbkg[1000,0,10000]*bkg_pdf)");  // for extended model
```
• Retrieving objects from the workspace
• After you have created the variable and p.d.f in the workspace, you can access their pointers, by using `RooWorkspace::var` for variables or `=RooWorkspace::pdf` for p.d.f. Example:
```// retrieving model components
RooAbsPdf * pdf = w.pdf("pdf");   // access object from workspace
RooRealVar * x = w.var("x");   // access object from workspace
```
• Generating a dataset:
• Note that you can generate also binned data sets. In that case you have two options:
• use a `RooDataHist` and `generateBinned`: `RooDataHist * binData = pdf->generateBinned(*x, 1000);`.
• use the option `AllBinned()` which will generate a weighted data set: `RooDataSet * data = pdf->generate(*x, NumEvents(1000), AllBinned());`.
```// generate an unbinned dataset of 1000 events
RooDataSet * data = pdf->generate( *x, 1000); ```
• Plot the data:
```// create a RooPlot object and plot the data
RooPlot * plot = x->frame();
data->plotOn(plot); ```
• Fit the data: you need to call the `RooAbsPdf::fitTo`.
```// fit the data and save its result. Use the optional =Minuit2= minimiser
RooFitResult * res = pdf->fitTo(*data, Minimizer("Minuit2","Migrad"), Save(true) );```
• Plot the resulting fit function in the same plot.
• Plot also the signal and background fit components with different colour and style
```pdf->plotOn(plot);
//draw the two separate pdf's
pdf->plotOn(plot, RooFit::Components("bkg_pdf"), RooFit::LineStyle(kDashed) );
pdf->plotOn(plot, RooFit::Components("sig_pdf"), RooFit::LineColor(kRed), RooFit::LineStyle(kDashed) );
plot->Draw();   // to show the RooPlot in the current ROOT Canvas
```
• Save the workspace in a file for re-using it without the need to generate the data
```w.writeToFile("GausExpModel.root",true);  // true means the file is re-created
```

Note that in the solution we also create a `ModelConfig` object that we save in the workspace. This will be useful for the RooStats exercises (see later)/
```#include "RooWorkspace.h"
#include "RooAbsPdf.h"
#include "RooDataSet.h"
#include "RooFitResult.h"
#include "RooPlot.h"
#include "RooRealVar.h"
#include "RooRandom.h"

#include "RooStats/ModelConfig.h"

using namespace RooFit;

void GausExpModel(int nsig = 100,    // number of signal events
int nbkg = 1000 )  // number of background events
{

RooWorkspace w("w");
w.factory("Exponential:bkg_pdf(x[0,10], a[-0.5,-2,-0.2])");
w.factory("Gaussian:sig_pdf(x, mass, sigma[0.3])");

w.factory("SUM:model(nsig[0,10000]*sig_pdf, nbkg[0,10000]*bkg_pdf)");  // for extended model

RooAbsPdf * pdf = w.pdf("model");
RooRealVar * x = w.var("x");  // the observable

// set the desired value of signal and background events
w.var("nsig")->setVal(nsig);
w.var("nbkg")->setVal(nbkg);

// generate the data

// use fixed random numbers for reproducibility (use 0 for changing every time)
RooRandom::randomGenerator()->SetSeed(111);

// fix number of bins to 50 to plot or to generate data (default is 100 bins)
x->setBins(50);

RooDataSet * data = pdf->generate( *x);  // will generate accordint to total S+B events
//RooDataSet * data = pdf->generate( *x, AllBinned());  // will generate accordint to total S+B events
data->SetName("data");
w.import(*data);

data->Print();

RooPlot * plot = x->frame(Title("Gaussian Signal over Exponential Background"));
data->plotOn(plot);
plot->Draw();

RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad"));
r->Print();

pdf->plotOn(plot);
//draw the two separate pdf's
pdf->plotOn(plot, RooFit::Components("bkg_pdf"), RooFit::LineStyle(kDashed) );
pdf->plotOn(plot, RooFit::Components("sig_pdf"), RooFit::LineColor(kRed), RooFit::LineStyle(kDashed) );

pdf->paramOn(plot,Layout(0.5,0.9,0.85));

plot->Draw();

RooStats::ModelConfig mc("ModelConfig",&w);
mc.SetPdf(*pdf);
mc.SetParametersOfInterest(*w.var("nsig"));
mc.SetObservables(*w.var("x"));
// define set of nuisance parameters
w.defineSet("nuisParams","a,nbkg");

mc.SetNuisanceParameters(*w.set("nuisParams"));

// import model in the workspace
w.import(mc);

// write the workspace in the file
TString fileName = "GausExpModel.root";
w.writeToFile(fileName,true);
cout << "model written to file " << fileName << endl;
}
```

• Gaussian plus Exponential fit result:

## Exercise 3: Fit to a Combined model

In this exercise we will create a combined model based on two channels and we will perform a simultaneous fit on both channels.

• The first channel will be as the one of the previous exercise. A gaussian signal over an exponential decay background. We assume as before that the location and width of the signal are known. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.3. The exponential decay of the background is 2. We assume we have 1000 background events.
• The second channel will be similar to the first one, we will have the same signal model, but a different background model. We assume we have less and a flatter background. We have an exponential decay of 4 and 100 background events in total.
• We assume the number of signal events in the first channel is 50 and in the second channel is 30. We introduce a common rate parameter `mu` expressing the signal strength.

We will so the same steps as before:

• Create the models for the two channels
• Create a combined model using simultaneous pdf (class `RooSimultaneous`).
• Generate a combined data set, fit and plot the result

• Create first the two separate background models and a common Gaussian signal model
• Express the number of signal events in term of the signal strength `mu`: the number of signal events is the product of `mu` with a fixed number (50 for the first channel and 30 for the second).
```RooWorkspace w("w");
w.factory("Exponential:bkg1_pdf(x[0,10], a1[-0.5,-2,-0.2])");
w.factory("Exponential:bkg2_pdf(x, a2[-0.25,-2,-0.2])");
w.factory("Gaussian:sig_pdf(x, mass, sigma[0.3])");
// express number of events as the
w.factory("prod:nsig1(mu[1,0,5],xsec1)");
w.factory("prod:nsig2(mu,xsec2)");

w.factory("SUM:model1(nsig1*sig_pdf, nbkg1[1000,0,10000]*bkg1_pdf)");  // for extended model
w.factory("SUM:model2(nsig2*sig_pdf, nbkg2[100,0,10000]*bkg2_pdf)");  // for extended model
```
• Make a simultaneous pdf * create first the category label to identify the channels (`RooCategory`) * use the factory operator `SIMUL` for creating the `RooSimultaneous` pdf.

```// Create discrete observable to label channels
w.factory("index[channel1,channel2]");
// Create joint pdf (RooSimultaneous)
w.factory("SIMUL:jointModel(index,channel1=model1,channel2=model2)");```
• Generate the events. We can generate directly a combined data sets from the model, with the number of events fluctuating around the number of expected events in each channel.
• Note that for creating a combined data set we need to add the Category variable in the data set as an additional observable
```// generate combined data set
RooDataSet * data = pdf->generate( RooArgSet(*x,*index)); ```
• Plot the data. We make two plots representing the two channels. We need then to tell to the combined data set to select the channel we want
```RooPlot * plot1 = x->frame();
RooPlot * plot2 = x->frame();
data->plotOn(plot1,RooFit::Cut("index==index::channel1"));
data->plotOn(plot2,RooFit::Cut("index==index::channel2")); ```
• Fit the data is as before using the `fitTo` method.
• Plot the resulting model on the two data sets. We need to project the simultaneous pdf on the corresponding channel. This is done as following:
```// fit
RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad"));
r->Print();

// plotting (draw the two separate pdf's )
pdf->plotOn(plot1,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel1"));
pdf->plotOn(plot2,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel2"));

// draw the two plots in two separate pads:
c1 = new TCanvas();
c1->Divide(1,2);
c1->cd(1); plot1->Draw();
c1->cd(2); plot2->Draw();```

Code solution:

```#include "RooWorkspace.h"
#include "RooAbsPdf.h"
#include "RooArgSet.h"
#include "RooCategory.h"
#include "RooDataSet.h"
#include "RooFitResult.h"
#include "RooPlot.h"
#include "RooRealVar.h"
#include "RooRandom.h"
#include "TCanvas.h"

#include "RooStats/ModelConfig.h"

using namespace RooFit;

void CombinedModel() {

RooWorkspace w("w");

w.factory("Exponential:bkg1_pdf(x[0,10], a1[-0.5,-2,-0.2])");
w.factory("Gaussian:sig_pdf(x, mass, sigma[0.3])");

w.factory("prod:nsig1(mu[1,0,5],xsec1)");

w.factory("SUM:model1(nsig1*sig_pdf, nbkg1[1000,0,10000]*bkg1_pdf)");  // for extended model

w.factory("Exponential:bkg2_pdf(x, a2[-0.25,-2,-0.2])");

w.factory("prod:nsig2(mu,xsec2)");
w.factory("SUM:model2(nsig2*sig_pdf, nbkg2[100,0,10000]*bkg2_pdf)");  // for extended model

// Create discrete observable to label channels
w.factory("index[channel1,channel2]");

// Create joint pdf (RooSimultaneous)
w.factory("SIMUL:jointModel(index,channel1=model1,channel2=model2)");

RooAbsPdf * pdf = w.pdf("jointModel");
RooRealVar * x = w.var("x");  // the observable
RooCategory * index = w.cat("index");  // the category

// generate the data (since it is exetended we can generate the global model

// use fixed random numbers for reproducibility
// use 0 for changing every time
RooRandom::randomGenerator()->SetSeed(111);
// generate binned
// plot the generate data in 50 bins (default is 100)
x->setBins(50);

// generate events of joint model
// NB need to add also category as observable
RooDataSet * data = pdf->generate( RooArgSet(*x,*index));  // will generate accordint to total S+B events
data->SetName("data");
w.import(*data);

data->Print();

RooPlot * plot1 = x->frame(Title("Channel 1"));
RooPlot * plot2 = x->frame(Title("Channel 2"));
data->plotOn(plot1,RooFit::Cut("index==index::channel1"));
data->plotOn(plot2,RooFit::Cut("index==index::channel2"));
// plot->Draw();

RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad"));
r->Print();

pdf->plotOn(plot1,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel1"));
pdf->plotOn(plot2,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel2"));
//draw the two separate pdf's
pdf->paramOn(plot1,Layout(0.65,0.85,0.85),Parameters(RooArgSet(*w.var("a1"),*w.var("nbkg1"))));
pdf->paramOn(plot2,Layout(0.65,0.85,0.85),Parameters(RooArgSet(*w.var("a2"),*w.var("nbkg2"))));
pdf->paramOn(plot2,Layout(0.65,0.85,0.7),Parameters(RooArgSet(*w.var("mu"))));
TCanvas * c1 = new TCanvas();
c1->Divide(1,2);
c1->cd(1); plot1->Draw();
c1->cd(2); plot2->Draw();

RooStats::ModelConfig mc("ModelConfig",&w);
mc.SetPdf(*pdf);
mc.SetParametersOfInterest(*w.var("mu"));
mc.SetObservables(RooArgSet(*w.var("x"),*w.cat("index")));

// define set of nuisance parameters
w.defineSet("nuisParams","a1,nbkg1,a2,nbkg2");

mc.SetNuisanceParameters(*w.set("nuisParams"));

// set also the snapshot
mc.SetSnapshot(*w.var("mu"));

// import model in the workspace
w.import(mc);

// write the workspace in the file
TString fileName = "CombinedModel.root";
w.writeToFile(fileName,true);
cout << "model written to file " << fileName << endl;
}

```

• Combined Model fit result:

# RooStats Exercises

We are going to perform a statistical analysis on two types of models:

• Observable-based analysis. This is the model we have used in the RooFit exercise 2. It is composed of 50 Gaussian-distributed signal event over ~ 1000 exponential-distributed background. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.5. We assume that the location and width of the signal are known. We can re-use the macro from the RooFit exercise to create the model.
• Poisson model: let's assume you observe n events in data while expecting b background event. We consider some systematic uncertainties in the background model value, b. We start expressing this uncertainty as Gaussian. We will see also how we can use a log-normal distribution for expressing this uncertainty or extend the model adding other systematic uncertainties.

For these model we compute:

• 68% CL 2-sided confidence interval from the (profiled-) likelihood ratio. As option one could compute also the significance using the profiled likelihood.
• 68% credible interval using the MCMCCalculator or the Bayesian Calculator (based on numerical integration). Compute as well the 95% Bayesian upper limit.

For running the RooStats calculators we read the workspace from the ROOT file. All the Roostats calculators take as input a data set and a ModelConfig object. It is therefore convenient to save the workspace (with model and data) in a ROOT file and then read it afterwards for running the calculators.

If you have problem creating the code for running the calculators, you can run the provided solution or just the standard Roostats tutorials (in \$ROOTSYS/tutorials/roostats)

Tutorials you could use are:

• `StandardProfileLikelihoodDemo.C`
• `StandardBayesianNumericalDemo.C`
• `StandardBayesianMCMCDemo.C`

They can be run as following from the Root prompt by passing the workspace file name (e.g. "SPlusBExpoModel.root" ), workspace name, model config name and data set name:

```
root  .L \$ROOTSYS/tutorials/roostats/StandardProfileLikelihoodDemo.C
root  StandardProfileLikelihoodDemo("SPlusBExpoModel.root", "w", "ModelConfig", "data" )
```

Other tutorials go into more details to create various models and run the calculators. See them all at http://root.cern.ch/root/html/tutorials/roostats/index.html .

## Exercise 1: Create a `ModelConfig` from the Gaussian plus Exponential Model used in the RooFit Exercise 2.

We show first how to create the `ModelConfig` object which we need to import into the workspace in order to be able to run later the RooStats calculators. We need to specify in the `ModelConfig`:

• the p.d.f of the model
• the parameters of interest (in this case the number of signal events)
• the nuisance parameters (the number of background events and the exponential shape parameter )
• the observables (in this case x, the reconstructed mass)

We will see later, that if we have a model where some external knowledge on a parameter is introduced, we would need to define the model global observables.

Below is how we create a `ModelConfig` for the Gaussian signal plus Exponential background model:

```RooStats::ModelConfig mc("ModelConfig",&w);
mc.SetPdf(*pdf);
mc.SetParametersOfInterest(*w.var("nsig"));
mc.SetObservables(*w.var("x"));
// define set of nuisance parameters
w.defineSet("nuisParams","a,nbkg");

mc.SetNuisanceParameters(*w.set("nuisParams"));

// import the ModelConfig in the workspace
w.import(mc);

// write workspace in the file
w.writeToFile("GausExpModel.root"); ```
These lines must be included at the end of the macro we used for exercise 2 (they are already present in the solution).

## Exercise 2: Profile Likelihood Calculator

In this exercise we will see how to run the RooStats Profile Likelihood calculator on the Gaussian signal plus Exponential background model. We will create a ROOT macro, doing the following:

• Read the workspace from the file and get a pointer to the `ModelConfig` and data set objects.
• Create the Profile likelihood calculator class
• Run the calculator by getting the interval at the corresponding desired confidence level
• Plot the interval and the profile likelihood function

Here is the code to read the workspace from the file. Remember to add at the beginning of the macro `using namespace RooStats;`.
```using namespace RooStats;

void SimplePL(
const char* infile =  "GausExpModel.root",
const char* workspaceName = "w",
const char* modelConfigName = "ModelConfig",
const char* dataName = "data" )
{

// open input file
TFile *file = TFile::Open(infile);
if (!file) return;

// get the workspace out of the file
RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);

// get the modelConfig out of the file
RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName);

// get the modelConfig out of the file
RooAbsData* data = w->data(dataName);
```
We create now the `ProfileLikelihoodCalculator class passing the =ModelConfig` object and a data set object (`RooAbsData`):
`ProfileLikelihoodCalculator pl(*data,*mc);`
Now the calculator has all information and can compute the interval. We need to set first the desired confidence level and then call `GetInterval()`:
```pl.SetConfidenceLevel(0.683); // 68% interval
LikelihoodInterval* interval = pl.GetInterval();```
We can then retrieve the actual interval lower/upper value for the desired parameter (e.g. the parameter of interest):
```// find the iterval on the first Parameter of Interest
RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();

double lowerLimit = interval->LowerLimit(*firstPOI);
double upperLimit = interval->UpperLimit(*firstPOI);

cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
lowerLimit << ", "<<
upperLimit <<"] "<<endl;```
At the end we can plot the interval and the profiled likelihood function in a ROOT Canvas using the `LikelihoodIntervalPlot` class:
```LikelihoodIntervalPlot * plot = new LikelihoodIntervalPlot(interval);
plot->SetRange(0,150);  // change range to have interval visible (default range is variable range)
plot->Draw(); ```
Note that for some complex model, which require time to evaluate, it can be convenient to use less points and also use the _"tf1"_option
```plot->SetPoints(50);
plot->Draw("tf1"); ```

```using namespace RooStats;

void SimplePL(
const char* infile =  "GausExpModel.root",
const char* workspaceName = "w",
const char* modelConfigName = "ModelConfig",
const char* dataName = "data" )
{
/////////////////////////////////////////////////////////////
// First part is just to access the workspace file
////////////////////////////////////////////////////////////

// open input file
TFile *file = TFile::Open(infile);
if (!file) return;

// get the workspace out of the file
RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);

// get the modelConfig out of the file
RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName);

// get the modelConfig out of the file
RooAbsData* data = w->data(dataName);

ProfileLikelihoodCalculator pl(*data,*mc);
pl.SetConfidenceLevel(0.683); // 68% interval
LikelihoodInterval* interval = pl.GetInterval();

// find the iterval on the first Parameter of Interest
RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();

double lowerLimit = interval->LowerLimit(*firstPOI);
double upperLimit = interval->UpperLimit(*firstPOI);

cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<     lowerLimit << ", "<<     upperLimit <<"] "<<endl;   LikelihoodIntervalPlot * plot = new LikelihoodIntervalPlot(interval);   plot->SetRange(0,150);  // possible eventually to change ranges
//plot->SetNPoints(50);  // do not use too many points, it could become very slow for some models
plot->Draw("");  // use option TF1 if too slow (plot.Draw("tf1")

}
```

• Profile Likelihood result on Gaussian signal plus Exponential background model:

```68% interval on nsig is : [60.3382, 97.0782]
```

## Exercise 3a: Bayesian MCMC Calculator

On the same model used before for the Profile Likelihood calculator we can also run a Bayesian analysis. We have the choice between using the MCMC calculator or the numerical Bayesian calculator. We will start using the `MCMCCalculator`. What we need to do is very similar to the previous exercise:

• Read the model from the file as in the previous exercise
• Create the `MCMCCalculator` class
• Configure the calculator setting the proposal function, number of iterations, etc..
• Call the `GetInterval` function to compute the desired interval. The returned `MCMCInterval` class will contained the resulting Markov-Chain.
• Plot the result using the `MCMCIntervalPlot` class

Assuming we have already the code for reading a model from the previous exercise, we create the `MCMCCalculator class passing the =ModelConfig` object and a data set object (`RooAbsData`) exactly as we did for the `ProfileLikelihoodCalculator`:
```MCMCCalculator mcmc(*data,*mc);
mcmc.SetConfidenceLevel(0.683); // 683% interval
```
The next step is to configure the calculator. We set the proposal function and the number of iterations. As proposal we use the `SequentialProposal` function. This function samples each parameter independently using a gaussian pdf with a sigma given by a fraction of the parameter range. We use in this case a value of 0.1 for the fraction. It is important to check that the acceptance of the proposal is not too low. In that case we are wasting iterations.
```// Define proposal function. This proposal function seems fairly robust
SequentialProposal sp(0.1);
mcmc.SetProposalFunction(sp);```
We set then the number of iterations. We use a not too large number of iterations to avoid waiting too long in this tutorial example.
```// set number of iterations and initial burning steps
mcmc.SetNumIters(100000);         // Metropolis-Hastings algorithm iterations
mcmc.SetNumBurnInSteps(1000);       // first N steps to be ignored as burn-in
```
We configure then the interval we want to compute. We can compute a shortest interval (default and the only one defined in multi-simension) central or one sided (lower or upper limits)
```mcmc.SetLeftSideTailFraction(0.5); // 0.5 for central Bayesian interval  and 0 for upper limit
```
We can then run the calculator (obtain the Markov chain and retrieve the actual interval lower/upper value for the desired parameter (e.g. the parameter of interest) by integrating the posterior function:
```MCMCInterval* interval = mcmc.GetInterval();

// print out the iterval on the first Parameter of Interest
RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();
cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
interval->LowerLimit(*firstPOI) << ", "<<
interval->UpperLimit(*firstPOI) <<"] "<< endl;```
At the end we can plot the interval and the posterior function in a ROOT Canvas using the `MCMCIntervalPlot` class:
```// make a plot of posterior function
new TCanvas("IntervalPlot");
MCMCIntervalPlot plot(*interval);
plot.Draw();```

```using namespace RooFit;
using namespace RooStats;

void SimpleMCMC( const char* infile =  "GausExpModel.root",
const char* workspaceName = "w",
const char* modelConfigName = "ModelConfig",
const char* dataName = "data" )
{
/////////////////////////////////////////////////////////////
// First part is just to access the workspace file
////////////////////////////////////////////////////////////

// open input file
TFile *file = TFile::Open(infile);
if (!file) return;

// get the workspace out of the file
RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);

// get the modelConfig out of the file
RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName);

// get the modelConfig out of the file
RooAbsData* data = w->data(dataName);

RooRealVar * nsig = w->var("nsig");
if (nsig) nsig->setRange(0,200);

MCMCCalculator mcmc(*data,*mc);
mcmc.SetConfidenceLevel(0.683); // 68.3% interval

// Define proposal function. This proposal function seems fairly robust
SequentialProposal sp(0.1);
mcmc.SetProposalFunction(sp);

// set number of iterations and initial burning steps
mcmc.SetNumIters(100000);         // Metropolis-Hastings algorithm iterations
mcmc.SetNumBurnInSteps(1000);       // first N steps to be ignored as burn-in

// default is the shortest interval
// here we use central interval
mcmc.SetLeftSideTailFraction(0.5); // for central Bayesian interval
//mcmc.SetLeftSideTailFraction(0); // for one-sided Bayesian interval

// run the calculator
MCMCInterval* interval = mcmc.GetInterval();

// print out the iterval on the first Parameter of Interest
RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();
cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<     interval->LowerLimit(*firstPOI) << ", "<<     interval->UpperLimit(*firstPOI) <<"] "<<endl;

// make a plot of posterior function
new TCanvas("IntervalPlot");
MCMCIntervalPlot plot(*interval);
plot.Draw();

}
```

• MCMC Bayesian result on Gaussian signal plus Exponential background model:

```Metropolis-Hastings progress: ....................................................................................................
[#1] INFO:Eval -- Proposal acceptance rate: 10.469%
[#1] INFO:Eval -- Number of steps in chain: 10469
68% interval on nsig is : [61.245, 98.2044]
``` Posterior function for the number of signal events obtained using the `MCMCCalculator`

## Exercise 3b: Bayesian Numerical Calculator

We can try now to use also the numerical Bayesian calculator on the same model. As before we create the class and configure it setting for example the integration type or the number of iterations. Since the numerical integration can be tricky, it is better in this case to set before a reasonable range from the nuisance parameters which will be integrated to obtain the marginalised posterior function. A sensible choice is to use for example an interval using 10-sigma around the best fit values for the nuisance parameters.

Note that we have not defined a prior distribution for the parameter of interest (the number of signal events). If a prior is not defined in the model, it is assumed that the prior distribution for the parameter of interest is the uniform distribution in the parameter range.

Again we create the `BayesianCalculator class passing the =ModelConfig` object and a data set object, `RooAbsData`:
`BayesianCalculator bayesianCalc(*data,*mc);`

```// define reasanable range for nuisance parameters ( +/- 10 sigma around fitted values) to facilitate the integral
RooArgList nuisPar(*mc->GetNuisanceParameters());
for (int i = 0; i < nuisPar.getSize(); ++i) {
RooRealVar & par = (RooRealVar&) nuisPar[i];
par.setRange(par.getVal()-10*par.getError(), par.getVal()+10*par.getError());
}```

We can then configure the calculator:

```// set the type of interval (not really needed for central which is the default)
bayesianCalc.SetLeftSideTailFraction(0.5); // for central interval
//bayesianCalc.SetShortestInterval(); // for shortest interval
bayesianCalc.SetConfidenceLevel(0.683); // 68% interval

// set the integration type (default is "ADAPTIVE")
// possible alternative values are  "VEGAS" , "MISER", or "PLAIN"  but they require  libMathMore
// or  "TOYMC" (toy MC integration, work when nuisances exist and they have a constraints pdf)
bayesianCalc.SetIntegrationType("TOYMC");

// limit the number of iterations to speed-up the integral
bayesianCalc.SetNumIters(1000);

// compute interval by scanning the posterior function
// it is done by default when computing shortest intervals
bayesianCalc.SetScanOfPosterior(50);```
We can then run the calculator and retrieve the result. The plot of the posterior can be retrieved directly from the class as a `RooPlot` object:
```SimpleInterval* interval = bayesianCalc.GetInterval();
if (!interval) {
cout << "Error computing Bayesian interval - exit " << endl;
return;
}

double lowerLimit = interval->LowerLimit();
double upperLimit = interval->UpperLimit();

RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();
cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
lowerLimit << ", "<<
upperLimit <<"] "<<endl;

// draw plot of posterior function
RooPlot * plot = bayesianCalc.GetPosteriorPlot();
if (plot) plot->Draw();  ```

```using namespace RooStats;

void SimpleBayes(    const char* infile =  "GausExpModel.root",
const char* workspaceName = "w",
const char* modelConfigName = "ModelConfig",
const char* dataName = "data" )
{
/////////////////////////////////////////////////////////////
// First part is just to access the workspace file
////////////////////////////////////////////////////////////

// open input file
TFile *file = TFile::Open(infile);
if (!file) return;

// get the workspace out of the file
RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);

// get the modelConfig out of the file
RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName);

// get the modelConfig out of the file
RooAbsData* data = w->data(dataName);

RooRealVar * nsig = w->var("nsig");
if (nsig) nsig->setRange(0,200);

// define reasanable range for nuisance parameters ( +/- 10 sigma around fitted values)
// to facilitate the integral
RooArgList nuisPar(*mc->GetNuisanceParameters());
for (int i = 0; i < nuisPar.getSize(); ++i) {       RooRealVar & par = (RooRealVar&) nuisPar[i];      par.setRange(par.getVal()-10*par.getError(), par.getVal()+10*par.getError());   }   BayesianCalculator bayesianCalc(*data,*mc);   bayesianCalc.SetConfidenceLevel(0.683); // 68% interval   // set the type of interval (not really needed for central which is the default)   bayesianCalc.SetLeftSideTailFraction(0.5); // for central interval   //bayesianCalc.SetLeftSideTailFraction(0.); // for upper limit   //bayesianCalc.SetShortestInterval(); // for shortest interval   // set the integration type (not really needed for the default ADAPTIVE)   // possible alternative values are  "VEGAS" , "MISER", or "PLAIN"  (MC integration from libMathMore)    // "TOYMC" (toy MC integration, work when nuisances exist and they have a constraints pdf)   TString integrationType = "";   // this is needed if using TOYMC   if (integrationType.Contains("TOYMC") ) {      RooAbsPdf * nuisPdf = RooStats::MakeNuisancePdf(*mc, "nuisance_pdf");     if (nuisPdf) bayesianCalc.ForceNuisancePdf(*nuisPdf);   }   // limit the number of iterations to speed-up the integral    bayesianCalc.SetNumIters(1000);   bayesianCalc.SetIntegrationType(integrationType);    // compute interval by scanning the posterior function   // it is done by default when computing shortest intervals   bayesianCalc.SetScanOfPosterior(50);   RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();

SimpleInterval* interval = bayesianCalc.GetInterval();
if (!interval) {
cout << "Error computing Bayesian interval - exit " << endl;      return;   }   double lowerLimit = interval->LowerLimit();
double upperLimit = interval->UpperLimit();

cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<     lowerLimit << ", "<<     upperLimit <<"] "<<endl;   // draw plot of posterior function   RooPlot * plot = bayesianCalc.GetPosteriorPlot();   if (plot) plot->Draw();

}
```

• Numerical Bayesian result on Gaussian signal plus Exponential background model:

```[#1] INFO:Eval -- BayesianCalculator::GetInterval - found a valid interval : [61.01 , 97.9247 ]

68% interval on nsig is : [61.01, 97.9247]
``` Posterior function for the number of signal events obtained using the `BayesianCalculator`

## Exercise 4: Create a Poisson Counting Model

We create now a counting model based on Poisson Statistics. We observe `nobs` events when we expect `nexp`, where `nexp = s+b` (`s` is the number of signal events and `b` is the number of background events. The expected distribution for `nexp` is a Poisson : `Poisson( nobs | s+b)`. `s` is the parameter we want to estimate or set a limit (the parameter of interest), while `b` is the nuisance parameter. `b` is not known exactly, but has uncertainty `sigmab`. The nominal value of `b` is = b0=. To express this uncertainty we add in the model a Gaussian constraint. We can interpret this term as having an additional measurement `b0` with an uncertainty `sigmab`: i.e. we have a likelihood for that measurement `=Gaussian( b0 | b, sigma)`. In case of Bayesian statistics we can interpret the constraint as a prior knowledge on the parameter b.

• Make first the RooFit workspace using its factory syntax. Let's assume `nobs=3`, `b0=1`, `sigmab=0.2`.
• Create the `ModelConfig` object and import in the workspace. We need to add in the `ModelConfig` also `b0` as a "global observable=. The reason is that `b0` needs to be treated as an auxiliary observable in case of frequentist statistics and varied when tossing pseudo-experiments.

We make first the RooFit workspace using its factory syntax. We have `nobs=3`, `b0=1`, `sigmab=0.2`.
```int nobs = 3;   // number of observed events
double b0 = 1; // number of background events
double sigmab = 0.2;   // relative uncertainty in b

RooWorkspace w("w");

// make Poisson model * Gaussian constraint
w.factory("sum:nexp(s[3,0,15],b[1,0,10])");
// Poisson of (n | s+b)
w.factory("Poisson:pdf(nobs[0,50],nexp)");
// Gaussian constraint to express uncertainty in b
w.factory("Gaussian:constraint(b0[1,0,10],b,sigmab[0.2])");
// the total model p.d.f will be the product of the two
w.factory("PROD:model(pdf,constraint)");```
We need also to define `b0`, the global observable, as a constant variable (this is needed when we fit the model). However, `b0` often needs to have a range. w.var("b0")->setConstant(true);

After creating the model p.d.f we create the `ModelConfig` object and we set `b0` as the "global observable=

```ModelConfig mc("ModelConfig",&w);
mc.SetPdf(*w.pdf("model"));
mc.SetParametersOfInterest(*w.var("s"));
mc.SetObservables(*w.var("nobs"));
mc.SetNuisanceParameters(*w.var("b"));
// need now to set the global observable
mc.SetGlobalObservables(*w.var("b0"));
// this is needed for the hypothesis tests
mc.SetSnapshot(*w.var("s"));

// import model in the workspace
w.import(mc);```

We generate then an hypothetical observed data set. Since we have a counting model, the data set will contain only one event and the observable `nobs` will have our desired value (i.e. 3). Remember to import the data set in the workspace and then save the workspace in a ROOT file.

```// make data set with the namber of observed events
RooDataSet data("data","", *w.var("nobs"));
w.var("nobs")->setVal(3);
// import data set in workspace and save it in a file
w.import(data);
w.writeToFile("ComtingModel.root", true);```

```using namespace RooFit;
using namespace RooStats;

void CountingModel(  int nobs = 3,           // number of observed events
double b = 1,           // number of background events
double sigmab = 0.2 )   // relative uncertainty in b
{
RooWorkspace w("w");

// make Poisson model * Gaussian constraint
w.factory("sum:nexp(s[3,0,15],b[1,0,10])");
// Poisson of (n | s+b)
w.factory("Poisson:pdf(nobs[0,50],nexp)");
w.factory("Gaussian:constraint(b0[0,10],b,sigmab)");
w.factory("PROD:model(pdf,constraint)");

w.var("b0")->setVal(b);
w.var("b0")->setConstant(true); // needed for being treated as global observables
w.var("sigmab")->setVal(sigmab*b);

ModelConfig mc("ModelConfig",&w);
mc.SetPdf(*w.pdf("model"));
mc.SetParametersOfInterest(*w.var("s"));
mc.SetObservables(*w.var("nobs"));
mc.SetNuisanceParameters(*w.var("b"));

// these are needed for the hypothesis tests
mc.SetSnapshot(*w.var("s"));
mc.SetGlobalObservables(*w.var("b0"));

mc.Print();
// import model in the workspace
w.import(mc);

// make data set with the namber of observed events
RooDataSet data("data","", *w.var("nobs"));
w.var("nobs")->setVal(3);
// import data set in workspace and save it in a file
w.import(data);

w.Print();

TString fileName = "CountingModel.root";

// write workspace in the file (recreate file if already existing)
w.writeToFile(fileName, true);

}
```

## Exercise 5: Compute a 95% Upper Limit with the Counting Model

Using the previously defined Counting model we can compute a 95% upper limit on the signal parameter `s`. We could use :

• The Bayesian Calculator or the MCMC Calculator
• The Profile Likelihood Calculator

Suppose we use the Bayesian calculator:

In this case we can re-use the macro `SimpleBayes.C`. We just need to change these lines, setting the tip elf interval and give as input the right ROOT file containing the workspace.
```//bayesianCalc.SetConfidenceLevel(0.68); // 68% interval
bayesianCalc.SetConfidenceLevel(0.95); // 95% interval

// set the type of interval
//bayesianCalc.SetLeftSideTailFraction(0.5); // for central interval
bayesianCalc.SetLeftSideTailFraction(0.); // for upper limit
```

This is the result we will obtain on the Bayesian calculator:

• Numerical Bayesian 95% Upper limit result on Counting Poisson model

```[#1] INFO:Eval -- BayesianCalculator::GetInterval - found a valid interval : [0 , 6.80149 ]

95% interval on s is : [0, 6.80149]
``` Posterior function for the counting model (nobs = 3, b = 1) obtained using the `BayesianCalculator` Try to use the Profile Likelihood Calculator. Which result do you obtain ? We can try to use a different prior on the signal instead of the default uniform in the case of the Bayesian calculator. For example we can try the Jeffrey reference prior, `P(s) = 1./sqrt(s)`.

For using a different signal prior than the uniform, we need to define the prior pdf in the workspace and add it also in the `ModelConfig` so the Bayesian RooStats are aware of its existence:
```// define as a pdf based on an expression
w.factory("EXPR::prior_s('1./sqrt(s)',s)");
// define prior in ModelConfig
mc.SetPriorPdf(*w.pdf("prior_s"));```

## Exercise 6: Compute the significance (Hypothesis Test)

We can compute the significance of observing a signal for the Gaussian plus Exponential model. It is better to re-generate the model using a lower value for the number of signal events (e.g. 50), in order not to have a too high significance, which will be difficult to estimate when using pseudo-experiments.

When doing an hypothesis test in RooStats, we need to have two models (two `ModelConfig` object). The one defining the null hypothesis and the one defining the alternate hypothesis. In our example, the null hypothesis is the background only model, and the alternate model is signal plus background model. We have defined in the workspace saved in the ROOT file only the signal plus background model. We can then create the background model as a signal plus background model that has a value of `nsig=0`. We do this by cloning the S+B model and by defining a snapshot in the `ModelConfig` for `nsig` with a different value:

```ModelConfig*  sbModel = (RooStats::ModelConfig*) w->obj(modelConfigName);
RooRealVar* poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first();
// define the S+B snapshot (this is used for computing the expected significance)
poi->setVal(50);
sbModel->SetSnapshot(*poi);
// create B only model
ModelConfig * bModel = (ModelConfig*) sbModel->Clone();
bModel->SetName("B_only_model");
poi->setVal(0);
bModel->SetSnapshot( *poi  );```

Using the B model (null model) and the S+B model (alt model) plus the observed data set we can create one of the RooStats hypothesis test calculator to compute the significance. We have the choice between

• Frequentist calculator
• Hybrid Calculator
• Asymptotic calculator

The first two require generate pseudo-experiment, while the asymptotic one, requires only a fit to the two models. For the Frequentist and the Hybrid calculator we can also choose the test statistics. We have various choices in RooStats. Since it is faster, we will use for the exercise the Asymptotic calculator. The asymptotic calculator it is based on the Profile Likelihood test statistics, the one used at LHC (see slides for its definition). For testing the significance, we need to configure it to use the one-side test statistic. We need to create the `AsymptoticCalculator` class using the two models and the data set and then run the test of hypothesis using the `GetHypoTest` function.

```AsymptoticCalculator   ac(*data, *sbModel, *bModel);
ac.SetOneSidedDiscovery(true);  // for one-side discovery test

// this run the hypothesis test and print the result
HypoTestResult * result = ac.GetHypoTest();
result->Print();```

```using namespace RooStats;
using namespace RooFit;

void SimpleHypoTest( const char* infile =  "GausExpModel.root",
const char* workspaceName = "w",
const char* modelConfigName = "ModelConfig",
const char* dataName = "data" )
{

/////////////////////////////////////////////////////////////
// First part is just to access the workspace file
////////////////////////////////////////////////////////////

// open input file
TFile *file = TFile::Open(infile);
if (!file) return;

// get the workspace out of the file
RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);

// get the data  out of the file
RooAbsData* data = w->data(dataName);

// get the modelConfig (S+B) out of the file
// and create the B model from the S+B model
ModelConfig*  sbModel = (RooStats::ModelConfig*) w->obj(modelConfigName);
sbModel->SetName("S+B Model");
RooRealVar* poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first();
poi->setVal(50);  // set POI snapshot in S+B model for expected significance
sbModel->SetSnapshot(*poi);
ModelConfig * bModel = (ModelConfig*) sbModel->Clone();
bModel->SetName("B Model");
poi->setVal(0);
bModel->SetSnapshot( *poi  );

// create the AsymptoticCalculator from data,alt model, null model
AsymptoticCalculator  ac(*data, *sbModel, *bModel);
ac.SetOneSidedDiscovery(true);  // for one-side discovery test
//ac.SetPrintLevel(-1);  // to suppress print level

// run the calculator
HypoTestResult * asResult = ac.GetHypoTest();
asResult->Print();

return;  // comment this line if you want to run the FrequentistCalculator
std::cout << "\n\nRun now FrequentistCalculator.....\n";   FrequentistCalculator   fc(*data, *sbModel, *bModel);   fc.SetToys(2000,500);    // 2000 for null (B) and 500 for alt (S+B)    // create the test statistics   ProfileLikelihoodTestStat profll(*sbModel->GetPdf());
// use one-sided profile likelihood
profll.SetOneSidedDiscovery(true);

// configure  ToyMCSampler and set the test statistics
ToyMCSampler *toymcs = (ToyMCSampler*)fc.GetTestStatSampler();
toymcs->SetTestStatistic(&profll);

if (!sbModel->GetPdf()->canBeExtended())
toymcs->SetNEventsPerToy(1);

// run the test
HypoTestResult * fqResult = fc.GetHypoTest();
fqResult->Print();

// plot test statistic distributions
HypoTestPlot * plot = new HypoTestPlot(*fqResult);
plot->SetLogYaxis(true);
plot->Draw();

}
```

• Result of the significance test on the Gaussian Signal plus Background model (nsig=50, nbckg=1000):

```Results HypoTestAsymptotic_result:
- Null p-value = 0.00118838
- Significance = 3.0386
- CL_b: 0.00118838
- CL_s+b: 0.502432
- CL_s: 422.787
``` Run the Frequentist calculator

For the Frequentist calculator we need to :
• set the test statistic we want to use
• set the number of toys for the null and alternate model
```FrequentistCalculator   fc(*data, *sbModel, *bModel);
fc.SetToys(2000,500);    // 2000 for null (B) and 500 for alt (S+B)

// create the test statistics
ProfileLikelihoodTestStat profll(*sbModel->GetPdf());
// use one-sided profile likelihood
profll.SetOneSidedDiscovery(true);

// configure  ToyMCSampler and set the test statistics
ToyMCSampler *toymcs = (ToyMCSampler*)fc.GetTestStatSampler();
toymcs->SetTestStatistic(&profll);

if (!sbModel->GetPdf()->canBeExtended())
toymcs->SetNEventsPerToy(1);
```
We run now the hypothesis test
```HypoTestResult * fqResult = fc.GetHypoTest();
fqResult->Print();```
And we can also plot the test statistic distributions obtained from the pseudo-experiments for the two models. We use the `HypTestPlot` class for this.
```// plot test statistic distributions
HypoTestPlot * plot = new HypoTestPlot(*fqResult);
plot->SetLogYaxis();
plot->Draw();```

Use the macro SimpleHypoTest.C remove the `return` statement in the middle and run it again

• Result of the significance test on the Gaussian Signal plus Background model (nsig=50, nbckg=1000) using the Frequentist calculator:

```Results HypoTestCalculator_result:
- Null p-value = 0.001 +/- 0.000706753
- Significance = 3.09023 +/- 0.2099 sigma
- Number of Alt toys: 500
- Number of Null toys: 2000
- Test statistic evaluated on data: 4.61654
- CL_b: 0.001 +/- 0.000706753
- CL_s+b: 0.496 +/- 0.02236
- CL_s: 496 +/- 351.262
``` Test statistics distribution for the B model (red) and S+B model obtained with the Frequentist calculator

## Exercise 6b: Compute significance (p-value) as function of the signal mass

As an optional exercise, we can also study the significance (or equivalent the p-value) we obtain in our Gaussian signal plus exponential background model as function of the signal mass hypothesis. For doing this we perform the test of hypothesis, using the AsymptoticCalculator, for several mass points. We can then plot the obtained p-value for the null hypothesis (`p0`).

```using namespace RooStats;
using namespace RooFit;

void p0Plot() {

const char* infile =  "GausExpModel.root";
const char* workspaceName = "w";
const char* modelConfigName = "ModelConfig";
const char* dataName = "data";

/////////////////////////////////////////////////////////////
// First part is just to access the workspace file
////////////////////////////////////////////////////////////

// Check if example input file exists
TFile *file = TFile::Open(infile);

// get the workspace out of the file
RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);

// get the modelConfig out of the file
RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName);

// get the modelConfig out of the file
RooAbsData* data = w->data(dataName);

// get the modelConfig (S+B) out of the file
// and create the B model from the S+B model
ModelConfig*  sbModel = (RooStats::ModelConfig*) w->obj(modelConfigName);
sbModel->SetName("S+B Model");
RooRealVar* poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first();
poi->setVal(50);  // set POI snapshot in S+B model for expected significance
sbModel->SetSnapshot(*poi);
ModelConfig * bModel = (ModelConfig*) sbModel->Clone();
bModel->SetName("B Model");
poi->setVal(0);
bModel->SetSnapshot( *poi  );

vector<double> masses;
vector<double> p0values;
vector<double> p0valuesExpected;

double massMin = 0.5;
double massMax = 8.5;

// loop on the mass values

for( double mass=massMin; mass<=massMax; mass += (massMax-massMin)/40.0 )   {                    cout << endl << endl << "Running for mass: " << mass << endl << endl;      w->var("mass")->setVal( mass );

AsymptoticCalculator *  ac  = new AsymptoticCalculator(*data, *sbModel, *bModel);
ac->SetOneSidedDiscovery(true);  // for one-side discovery test
AsymptoticCalculator::SetPrintLevel(-1);

HypoTestResult* asymCalcResult = ac->GetHypoTest();

asymCalcResult->Print();

masses.push_back( mass );
p0values.push_back( asymCalcResult->NullPValue() );

double expectedP0 = AsymptoticCalculator::GetExpectedPValues(  asymCalcResult->NullPValue(),  asymCalcResult->AlternatePValue(), 0, false);
p0valuesExpected.push_back( expectedP0 );
std::cout << "expected p0 = " << expectedP0 << std::endl;   }   TGraph * graph1  = new TGraph(masses.size(),&masses,&p0values);   TGraph * graph2  = new TGraph(masses.size(),&masses,&p0valuesExpected);   graph1->SetMarkerStyle(20);
graph1->Draw("APC");
graph2->SetLineStyle(2);
graph2->Draw("C");
graph1->GetXaxis()->SetTitle("mass");
graph1->GetYaxis()->SetTitle("p0 value");
graph1->SetTitle("Significance vs Mass");
graph1->SetMinimum(graph2->GetMinimum());
graph1->SetLineColor(kBlue);
graph2->SetLineColor(kRed);

}
``` p value obtained as function of the signal mass in the Gaussian plus exponential background model

## Exercise 7: Compute Limits using Hypothesis Test Inversion (`CLs` Limits).

As last exercise we will compute an upper limit using the hypothesis test inversion. We need to perform the hypothesis test for different parameter of interest points and compute the corresponding p-value. Since we are interesting in computing a limit, the test null hypothesis is the S+B model, while the alternate hypothesis is the B only model. It is important to remember this, when we construct the hypothesis test calculators. For doing the inversion `RooStats` provides an helper class, `HypoTestInverter`, which is constructed using one of the HypoTest calculator (Frequentist, Hybrid or Asymptotic). To compute the limit we need to do:

• Create/read a workspace with S+B and B only model. If the B model does not exist we can create as in the previous exercise (setting the poi to zero).
• Create an HypoTest calculator but using as null the S+B model and as alt the B only model
• Configure the calculator if needed (set test statistics, number of toys, etc..)
• Create the `HypoTestInverter` class and configure it (e.g. to set the number and range of points to scan, to use `CLs` for computing the limit, etc)
• Run the Inverter using `GetInterval` which returns an `HypoTestInverterResult` object.
• Query the result (e.g. limit, expected limits and bands, .) , plot the result of the scan and the test statistic distribution obtained at every point.

Suppose we use the `AsymptotiCalculator`. If we use the Frequentist is the same, just we need to set also the test statistics and number of toys
```// asymptotic calculator
AsymptoticCalculator ac(*data, *bModel, *sbModel);
ac.SetOneSided(true);  // for one-side tests (limits)
AsymptoticCalculator::SetPrintLevel(-1);  // to avoid to much verbosity
```
We create now the Inverter class and configure it for CLs
```// create hypotest inverter
// passing the desired calculator
HypoTestInverter calc(ac);

// set confidence level (e.g. 95% upper limits)
calc.SetConfidenceLevel(0.95);

// for CLS
calc.UseCLs(true);```

We run now the Inverter:

```int npoints = 10;  // number of points to scan
// min and max (better to choose smaller intervals)
double poimin = poi->getMin();
double poimax = poi->getMax();
//poimin = 0; poimax=10;

std::cout << "Doing a fixed scan  in interval : " << poimin << " , " << poimax << std::endl;
calc.SetFixedScan(npoints,poimin,poimax);

// run inverter
HypoTestInverterResult * r = calc.GetInterval();```

We can then query the obtained result for the limit, expected limits (median +/- 1 sigma bands):

```double upperLimit = r->UpperLimit();
// estimate of the error on the upper limit
double ulError = r->UpperLimitEstimatedError();

std::cout << "The computed upper limit is: " << upperLimit << " +/- " << ulError << std::endl;

// compute expected limit and bands
std::cout << "Expected upper limits, using the B (alternate) model : " << std::endl;
std::cout << " expected limit (median) " << r->GetExpectedUpperLimit(0) << std::endl;
std::cout << " expected limit (-1 sig) " << r->GetExpectedUpperLimit(-1) << std::endl;
std::cout << " expected limit (+1 sig) " << r->GetExpectedUpperLimit(1) << std::endl;```

Plot now the result of the scan using the `HypoTestInverterPlot` class:

```HypoTestInverterPlot *plot = new HypoTestInverterPlot("HTI_Result_Plot","Feldman-Cousins Interval",r);

// plot in a new canvas
TCanvas * c1 = new TCanvas("HypoTestInverter Scan");
c1->SetLogy(false);

plot->Draw("CLb 2CL");  // plot also CLb and CLs+b
//plot->Draw("OBS");  // plot only observed p-value
```

If we were using toys (e.g. FrequentistCalculator) we could plot in a different Canvas the test statistics distributions.

```// plot test statistics distributions for the two hypothesis
// when distribution is generated (case of FrequentistCalculators)
const int n = r->ArraySize();
if (n> 0 &&  r->GetResult(0)->GetNullDistribution() ) {
TCanvas * c2 = new TCanvas("Test Statistic Distributions","",2);
if (n > 1) {
int ny = TMath::CeilNint( sqrt(n) );
int nx = TMath::CeilNint(double(n)/ny);
c2->Divide( nx,ny);
}
for (int i=0; i<n; i++) {
if (n > 1) c2->cd(i+1);
SamplingDistPlot * pl = plot->MakeTestStatPlot(i);
pl->SetLogYaxis(true);
pl->Draw();
}
}```

```using namespace RooStats;
using namespace RooFit;

void SimpleHypoTestInv( const char* infile =  "CountingModel.root",
const char* workspaceName = "w",
const char* modelConfigName = "ModelConfig",
const char* dataName = "data" )
{
/////////////////////////////////////////////////////////////
// First part is just to access the workspace file
////////////////////////////////////////////////////////////

// open input file
TFile *file = TFile::Open(infile);
if (!file) return;

// get the workspace out of the file
RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);

// get the modelConfig out of the file
RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName);

// get the modelConfig out of the file
RooAbsData* data = w->data(dataName);

ModelConfig*  sbModel = (RooStats::ModelConfig*) w->obj(modelConfigName);
RooRealVar* poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first();
ModelConfig * bModel = (ModelConfig*) sbModel->Clone();
bModel->SetName(TString(sbModel->GetName())+TString("_with_poi_0"));
poi->setVal(0);
bModel->SetSnapshot( *poi  );

FrequentistCalculator  fc(*data, *bModel, *sbModel);
fc.SetToys(1000,500);

// asymptotic calculator
AsymptoticCalculator  ac(*data, *bModel, *sbModel);
ac.SetOneSided(true);  // for one-side tests (limits)
//  ac->SetQTilde(true);
AsymptoticCalculator::SetPrintLevel(-1);

// create hypotest inverter
// passing the desired calculator
HypoTestInverter calc(ac);    // for asymptotic
//HypoTestInverter calc(fc);  // for frequentist

// set confidence level (e.g. 95% upper limits)
calc.SetConfidenceLevel(0.95);

// for CLS
bool useCLs = true;
calc.UseCLs(useCLs);
calc.SetVerbose(false);

// configure ToyMC Samler (needed only for frequentit calculator)
ToyMCSampler *toymcs = (ToyMCSampler*)calc.GetHypoTestCalculator()->GetTestStatSampler();

// profile likelihood test statistics
ProfileLikelihoodTestStat profll(*sbModel->GetPdf());
// for CLs (bounded intervals) use one-sided profile likelihood
if (useCLs) profll.SetOneSided(true);

// ratio of profile likelihood - need to pass snapshot for the alt
// RatioOfProfiledLikelihoodsTestStat ropl(*sbModel->GetPdf(), *bModel->GetPdf(), bModel->GetSnapshot());

// set the test statistic to use
toymcs->SetTestStatistic(&profll);

// if the pdf is not extended (e.g. in the Poisson model)
// we need to set the number of events
if (!sbModel->GetPdf()->canBeExtended())
toymcs->SetNEventsPerToy(1);

int npoints = 10;  // number of points to scan
// min and max (better to choose smaller intervals)
double poimin = poi->getMin();
double poimax = poi->getMax();
//poimin = 0; poimax=10;

std::cout << "Doing a fixed scan  in interval : " << poimin << " , " << poimax << std::endl;   calc.SetFixedScan(npoints,poimin,poimax);      HypoTestInverterResult * r = calc.GetInterval();   double upperLimit = r->UpperLimit();

std::cout << "The computed upper limit is: " << upperLimit << std::endl;      // compute expected limit   std::cout << "Expected upper limits, using the B (alternate) model : " << std::endl;   std::cout << " expected limit (median) " << r->GetExpectedUpperLimit(0) << std::endl;   std::cout << " expected limit (-1 sig) " << r->GetExpectedUpperLimit(-1) << std::endl;   std::cout << " expected limit (+1 sig) " << r->GetExpectedUpperLimit(1) << std::endl;      // plot now the result of the scan    HypoTestInverterPlot *plot = new HypoTestInverterPlot("HTI_Result_Plot","HypoTest Scan Result",r);   // plot in a new canvas with style   TCanvas * c1 = new TCanvas("HypoTestInverter Scan");    c1->SetLogy(false);

plot->Draw("CLb 2CL");  // plot also CLb and CLs+b
//plot->Draw("OBS");  // plot only observed p-value

// plot also in a new canvas the test statistics distributions

// plot test statistics distributions for the two hypothesis
// when distribution is generated (case of FrequentistCalculators)
const int n = r->ArraySize();
if (n> 0 &&  r->GetResult(0)->GetNullDistribution() ) {
TCanvas * c2 = new TCanvas("Test Statistic Distributions","",2);
if (n > 1) {
int ny = TMath::CeilNint( sqrt(n) );
int nx = TMath::CeilNint(double(n)/ny);
c2->Divide( nx,ny);
}
for (int i=0; i<n; i++) {         if (n > 1) c2->cd(i+1);
SamplingDistPlot * pl = plot->MakeTestStatPlot(i);
pl->SetLogYaxis(true);
pl->Draw();
}
}

}
```

## Exercise 7b: Using the `StandardHypoTestInvDemo.C` tutorial

This is a macro in \$ROOTSYS/tutorials/roostats which can run the hypothesis test inversion with several possible options. We can find the tutorial code also here . Suppose we want to compute the 95% `CLs` limit on the parametric unbinned model (Gaussian plus Exponential background), that we have used in the significance exercise:

```
root  .L \$ROOTSYS/tutorials/roostats/StandardHypoTestInvDemo.C+
root  StandardHypoTestInvDemo("GausExpModel.root", "w", "ModelConfig", "", "data", 2, 3, true, 20, 0, 150 )
```
You can look at the printed out observed and expected limits, and the CLs scan plot. Next we can run the Frequentist calculation (it will take considerably more time, but we use only 500 toys):
```
root  StandardHypoTestInvDemo("GausExpModel.root", "w", "ModelConfig", "", "data", 0, 3, true, 10, 0, 100, 500, false )
```
We can also look at the sampling distribution plots obtained.

If running on the number counting model you need to set to true the last option

```
root  StandardHypoTestInvDemo("CountingModel.root", "w", "ModelConfig", "", "data", 0, 3, true, 10, 0, 10, 500, true )
```
Here is the list of input arguments to the macro:
```StandardHypoTestInvDemo(const char * infile = 0,
const char * wsName = "combined",
const char * modelSBName = "ModelConfig",
const char * modelBName = "",
const char * dataName = "obsData",
int calculatorType = 0,
int testStatType = 0,
bool useCLs = true ,
int npoints = 6,
double poimin = 0,
double poimax = 5,
int ntoys=1000,
bool useNumberCounting = false,
const char * nuisPriorName = 0){
```
Where we have for the calculatorType:
• type = 0 Freq calculator
• type = 1 Hybrid calculator
• type = 2 Asymptotic calculator

• testStatType = 0 LEP
• = 1 Tevatron
• = 2 Profile Likelihood
• = 3 Profile Likelihood one sided (i.e. = 0 if mu < mu_hat)

• useCLs : scan for CLs (otherwise for CLs+b)

• npoints: number of points to scan , for autoscan set npoints = -1

• poimin,poimax: min/max value to scan in case of fixed scans (if min > max, try to find automatically)

• ntoys: number of toys to use

• useNumberCounting: set to true when using number counting events

• nuisPriorName: name of prior for the nnuisance. This is often expressed as constraint term in the global model

Topic attachments
I Attachment History Action Size Date Who Comment png AsymptoticCls.png r1 manage 19.6 K 2013-06-04 - 23:16 LorenzoMoneta Asymptotic CLs result on Counting model png BayesInterval.png r1 manage 10.2 K 2013-06-03 - 23:05 LorenzoMoneta Bayesian Interval result png BayesUpperLimit.png r1 manage 9.4 K 2013-06-04 - 10:53 LorenzoMoneta Bayesian 95% Upper limit result c CombinedModel.C r1 manage 3.0 K 2013-06-02 - 12:35 LorenzoMoneta Combined Model ROOT macro png CombinedModelFit.png r1 manage 22.6 K 2013-06-02 - 12:35 LorenzoMoneta Combined model fit result c CountingModel.C r1 manage 1.5 K 2013-06-04 - 10:52 LorenzoMoneta ROOT macro for creating the counting model png FrequentistHTResult.png r1 manage 11.3 K 2013-06-04 - 15:00 LorenzoMoneta FrequentistCalculator result on a test of significance c GausExpModel.C r1 manage 2.2 K 2013-06-02 - 23:20 LorenzoMoneta Gaussian plus Exponential ROOT macro png GausExpModelFit.png r1 manage 17.0 K 2013-06-01 - 17:54 LorenzoMoneta Gaussian plus Exponential fit result c GaussianModel.C r1 manage 1.2 K 2013-06-02 - 23:20 LorenzoMoneta Gaussian Model ROOT Macro png GaussianModelFit.png r1 manage 16.4 K 2013-06-01 - 16:54 LorenzoMoneta Gaussian Fit Result png MCMCInterval.png r1 manage 7.1 K 2013-06-03 - 23:04 LorenzoMoneta MCMC interval result png PLInterval.png r1 manage 8.2 K 2013-06-02 - 23:08 LorenzoMoneta Profile Likelihood interval result c SimpleBayes.C r1 manage 3.0 K 2013-06-03 - 23:04 LorenzoMoneta Bayesian Calculator macro c SimpleHypoTest.C r1 manage 2.5 K 2013-06-04 - 15:01 LorenzoMoneta Hypothesis Test (Significance) ROOT macro c SimpleHypoTestInv.C r1 manage 4.5 K 2013-06-04 - 23:26 LorenzoMoneta Hypothesis Test Inversion macro c SimpleMCMC.C r1 manage 2.0 K 2013-06-03 - 23:03 LorenzoMoneta MCMC Calculator macro c SimplePL.C r1 manage 1.6 K 2013-06-02 - 23:07 LorenzoMoneta Profile Likelihood Interval macro c p0Plot.C r1 manage 2.9 K 2013-06-04 - 17:05 LorenzoMoneta ROOT macro for scanning p-values png p0Plot.png r1 manage 10.1 K 2013-06-04 - 17:05 LorenzoMoneta Result of p value scan css tutorial.css r1 manage 0.2 K 2013-06-02 - 11:58 LorenzoMoneta style for code background colour
Edit | Attach | Watch | Print version |  | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r16 - 2013-06-05 - LorenzoMoneta Log In  Cern Search TWiki Search Google Search RooStats All webs Copyright &© 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback