# RooFit Exercises: Create and fitting a model

The aim of the RooFit exercises is to learn how to build a RooFit model, how to fit the model on a set of data that we will generate and how to plot the result. We will also learn how to save the full RooFit workspace containing the model and data in a file which we can read back later, for example when we will do the RooStats exercises. We will learn also how to generate the data from the model. The same generated data we will then use to fit the model

Using the snippets of code, one can build a full ROOT macro which can be run interactively in ROOT. If you want to run in compiled mode using ACLIC you might need to add all the needed header files.

## Exercise 1: Fitting using ROOT and RooFit

We will start with a very simple exercise. The aim of this exercise is to show the basic fitting functionality in ROOT and RooFit.

Create an histogram with 100 bins between -5 and 5 and fill with it 500 gaussian distributed numbers with mean 0 and sigma 1. Then fit the histogram with a gaussian function (possibly using its normalized form) using ROOT to estimate the number of events, the mean and the sigma of the underlined gaussian distribution.

• fit using the Neyman chi-square
• fit using the Pearson chi-square
• fit using a binned maximum likelihood
• examine the results obtained and plot the three different functions obtained
• fit the histogram using RooFit. You need to create a Gaussian model, which include the number of events (i.e. an extended model) and a RooFit binned data object (RooDataHist).
• plot the resulting fit function on top of the data.

If you are already familiar with ROOT and RooFit you can also proceed to the optional second part

• If you have problems on how to create and fill an histogram you can look at the solution or use the attached file, `GaussianHistogram.root`.
• Then you need to create a `TF1` representing a normalised gaussian. You can use the `"gausn"` pre-defined function or `ROOT::Math::normal_pdf` or `TMath::Gauss(x,mu,sigma, true)`.
• Use `TH1::Fit` with the default options for the Neyman chi-square
• Use the option `"P"` for the Pearson chi-square
• Use the option `"L"` for the binned likelihood fit

• For using RooFit you need first to create the `RooDataHist` class from an histogram. See the lecture slides on how to do it.
• For creating the RooFit model you need to first to create a `RooGaussian` pdf which has the mean and the sigma as parameters. For adding the number of events you need to create the `RooExtendPdf` class. You can create the classes directly or by using the factory functionality of the `RooWorkspace`.
• How to create a RooFit model
• Every variable in RooFit, needs to be created with a defined range [xmin,xmax]. Use a very large value if you don't know the range. If you provided only a value, the variable is considered constant. If you provide only the minimum and the maximum, the initial value will be taken in the middle of the range. The initial value must be within the given range.
• Using the workspace factory syntax any existing RooFit pdf class by using the class name stripped by the `Roo` suffix. The exact signature required (variables to be passed) is defined in the class constructor (You can browse the reference documentation here.
• You need to define the [value, min, max] of a variable only the first time you create in the factory. Afterwards you can reference the variable by its name.
```// create a Gaussian Pdf and an extended pdf
RooWorkspace w("w");
w.factory("Gaussian:pdf(x[-10,10],mu[1,-1000,1000],s[2,0,1000])");
w.factory("ExtendPdf:model(g,n[100,0,10000])");   ```
• How to use the model from the workspace
• After you have created the variable and p.d.f in the workspace, you can access their pointers, by using `RooWorkspace::var` for variables or `RooWorkspace::pdf` for p.d.f. Example:
```// retrieving model components
RooAbsPdf * pdf = w.pdf("model");   // access object from workspace
RooRealVar * x = w.var("x");   // access object from workspace
```
• Plotting the data:
```// create a RooPlot object and plot the data
RooPlot * plot = x->frame();
data->plotOn(plot); ```
• Fitting the data using `RooAbsPdf::fitTo`.
```// fit the data and save its result. Use the optional =Minuit2= minimiser
RooFitResult * res = pdf->fitTo(*data, Minimizer("Minuit2","Migrad"), Save(true) );```
• Plotting the fit function resulting from the fit:
```pdf->plotOn(plot);
plot->Draw();   // to show the RooPlot in the current ROOT Canvas
```

```// make a simple Gaussian Fit in ROOT and RooFit

using namespace RooFit;

void GaussianFit(int n = 500) {

TH1D * h1 = new TH1D("h1","Gaussian Fit",100,-5,5);
h1->FillRandom("gaus", n);

TF1 * f1  = new TF1("f1","gausn",-5,5);
f1->SetLineColor(kBlue);
f1->SetParameters(100,0,1);

// Neyman chi2 fit
TFitResultPtr r1 = h1->Fit(f1,"S");

TF1 * f2  = new TF1("f2","gausn",-5,5);
f2->SetLineColor(kBlue-3);
f2->SetParameters(100,0,1);

// Pearson chi2 fit
TFitResultPtr r2 = h1->Fit(f2,"P + S");

TF1 * f3  = new TF1("f3","gausn",-5,5);
f3->SetLineColor(kRed);
f3->SetParameters(100,0,1);

// Binnel likelihood fit
TFitResultPtr r3 = h1->Fit(f3,"L + S");

TLegend * l = new TLegend(0.65,0.55,0.88,0.7);
l->Draw();

// do now the fit with RooFit

RooWorkspace w("w");
// define a Gaussian pdf
w.factory("Gaussian:g(x[-5,5],mu[0,-10,10],sigma[1,0,1000])");
// create extend pdf with number of events
w.factory("ExtendPdf:model(g,nevt[100,0,100000])");

RooAbsPdf * pdf = w.pdf("model");   // access object from workspace
RooRealVar * x = w.var("x");   // access object from workspace

// generate n gaussian measurement for a Gaussian(x, 0, 1);
//RooDataSet * data = pdf->generate( *x);
//data->SetName("unbinData");

// create dataset
RooDataHist * data = new RooDataHist("data","data",*x,h1);

new TCanvas();
// RooFit plotting capabilities
RooPlot * pl = x->frame(Title("Gaussian Fit"));
data->plotOn(pl);
pl->Draw();

w.var("sigma")->setVal(1);
w.var("mu")->setVal(0);

// now fit the data set
RooFitResult * r4 = pdf->fitTo(*data, Minimizer("Minuit2","Migrad"), Save(1) );

// plot the pdf on the same RooPlot object we have plotted the data
pdf->plotOn(pl);
pdf->paramOn(pl, Layout(0.6,0.9,0.85));

pl->Draw();

// import data in workspace (IMPORTANT for saving it )
w.import(*data);

w.Print();

// write workspace in the file (recreate file if already existing)
w.writeToFile("GaussianModel.root", true);

cout << "model written to file " << endl;    std::cout << "\nResult of Neyman chi2 \n";    r1->Print();
std::cout << "\nResult of Pearson chi2 \n";    r2->Print();
std::cout << "\nResult of Binned Likelihood \n";    r3->Print();
std::cout << "\nResult of Binned Likelihood fit with ROOTFIT\n";    r4->Print();

}
```

• Gaussian Fit Result using ROOT

Fit of the Gaussian model using chi-square and likelihood methods in ROOT

• Gaussian Fit Result using RooFit

Fit of the Gaussian model using RooFit

We have performed a binned fit of an histogram. If you have all the data used to generate the histogram, how can you make an un-binned likelihood fit (in ROOT or RooFit) ?

Here is the suggestion on how to generate a data set with RooFit using the `RooAbsPdf::generate` function.
• We need to pass the observable we want to generate the data on (e.g. `x`) and the number of events
• Note that we can generate also binned data sets. In that case we have two options:
• use a `RooDataHist` and `generateBinned`: `RooDataHist * binData = pdf->generateBinned(*x, 1000);`.
• use the option `AllBinned()` which will generate a weighted data set: `RooDataSet * data = pdf->generate(*x, NumEvents(1000), AllBinned());`.
```// generate an unbinned dataset of 1000 events
RooDataSet * data = pdf->generate( *x, 1000); ```
Doing the un-binned fit
• using the gaussian pdf (`w.pdf("gaus")->fitTo(*data)`) to fit only the shape of the distribution (not-extend fit);
• using the extended pdf, (`w.pdf("model")->fitTo(*data)`)which includes the number of events, to perform an extended unbinned fit.
If we are using ROOT, we can use `TTree::UnbinnedFit` or the `ROOT::Fit::Fitter` class directly. See the user documentation or the attached macro on how to perform the unbinned (extended or not-extended) fit in ROOT. It is important to note that in un-binned fit (extended or not extended) the fit model function needs to be normalised in its fitting range. RooFit performs this automatically, while in ROOT the user needs to provide as input a normalised function.

Study the difference of performing in ROOT a fit to an histogram using a Neyman, Pearson and binned likelihood. Study the bias obtained in the fit parameters for the gaussian case, by generating 1000 pseudo-histograms and fitting them with the three methods. Study the obtained pull distributions.

## Exercise 2: Fit of a Higgs Signal over an Exponential Background

The aim of this exercise is to learn how to build a composite model in RooFit made of two p.d.f, one representing a signal and one a background distributions. We want to determine the number of signal events. For this we need to perform an extended maximum likelihood fit, where the signal events is one of the fit parameter.

Create first the composite model formed by a Gaussian signal over a falling exponential background. Then read the data (in text format) from the file and create a `RooDataSet` class with all the data. Perform then an extended unbinned fit to the data to extract the Higgs signal strength. Plot the resulting fit function from the fit with separate signal and background components.

• Read first the attached file (Hgg.txt in a `RooDataSet` using `RooDataSet::read` or in a ROOT `TTree` using `TTree::ReadFile`
• Create the model using the `RooWorkspace` factory
• create the Exponential pdf and the Gaussian signal p.d.f. function. We create the exponential as two different components to reproduce better the data.
• create then a `RooAddPdf` using the Gaussian and the Exponential and the relative number of events. Note that we could instead of the number of events a relative fraction. In this last case we would fit only for the shape and not the normalisation.
• The `RooAddPdf` is created using the `SUM` operator of the factory syntax.
```// create model
RooWorkspace w("w");
w.factory("x[110,160]");  // invariant mass

// create exponential model as two components
w.factory("a1[ 7.5, -500, 500]");
w.factory("a2[-1.5, -500, 500]");
w.factory("expr::z('-(a1*x/100 + a2*(x/100)^2)', a1, a2, x)");
w.factory("Exponential::bmodel(z, 1)");

// signal model
w.factory("mass[130, 110, 150]");
w.factory("width[1, 0.5, 5]");
w.factory("Gaussian::smodel(x, mass, width)");

// create RooAddPdf in extended mode
w.factory("nbackground[10000, 0, 1000000]");
w.factory("nsignal[100, 0.0, 1000.0]");
w.factory("SUM::model(nbackground*bmodel, nsignal*smodel)");
w.factory("SUM:model(nsig[100,0,10000]*sig_pdf, nbkg[1000,0,10000]*bkg_pdf)");  // for extended model
```

• Fit the data: you need to call the `RooAbsPdf::fitTo`.
```// fit the data and save its result. Use eventually the optional =Minuit2= minimiser
RooFitResult * res = pdf->fitTo(*data, Minimizer("Minuit2","Migrad"), Save(true) );```
• Plot the resulting fit function in the same plot.
• Plot also the signal and background fit components with different colour and style
```pdf->plotOn(plot);
//draw the two separate pdf's
pdf->plotOn(plot, RooFit::Components("bkg_pdf"), RooFit::LineStyle(kDashed) );
pdf->plotOn(plot, RooFit::Components("sig_pdf"), RooFit::LineColor(kRed), RooFit::LineStyle(kDashed) );
plot->Draw();   // to show the RooPlot in the current ROOT Canvas
```

```// macro to fit Higgs to gg spectrum

using namespace RooFit;

void fitHgg() {

// read from the file and create a ROOT tree

TTree tree("tree","tree");
if (nevt <= 0) {       Error("fitHgg","Error reading data from input file ");       return;    }    std::cout << "Read " << nevt << " from the file " << std::endl;    // make the RooFit model     RooWorkspace w("w");    w.factory("x[110,160]");  // invariant mass        w.factory("nbackground[10000, 0, 10000]");    //w.factory("Exponential::z1(x, a1[-1,-10,0])");    w.var("nbackground")->setVal(nevt);
w.var("nbackground")->setMin(0.1*nevt);
w.var("nbackground")->setMax(10*nevt);

// create exponential model as two components
w.factory("a1[ 7.5, -500, 500]");
w.factory("a2[-1.5, -500, 500]");
w.factory("expr::z('-(a1*x/100 + a2*(x/100)^2)', a1, a2, x)");
w.factory("Exponential::bmodel(z, 1)");

// signal model
w.factory("nsignal[100, 0.0, 1000.0]");
//w.factory("mass[%f, %f, %f]' % (massguess, massmin, massmax))
w.factory("mass[130, 110, 150]");
w.factory("width[1, 0.5, 5]");
w.factory("Gaussian::smodel(x, mass, width)");
RooAbsPdf * smodel = w.pdf("smodel");

w.factory("SUM::model(nbackground*bmodel, nsignal*smodel)");
RooAbsPdf * model = w.pdf("model");

// create RooDataSet
RooDataSet data("data","data",*w.var("x"),Import(tree) );

RooFitResult * r = model->fitTo(data, Minimizer("Minuit2"),Save(true), Offset(true));

// plot data and function

RooPlot * plot = w.var("x")->frame();
data.plotOn(plot);
model->plotOn(plot);
model->plotOn(plot, Components("bmodel"),LineStyle(kDashed));
model->plotOn(plot, Components("smodel"),LineColor(kRed));

plot->Draw();

}
```

• Hgg Fit Result

Fit of the H->gg CMS data

Do a scan of the likelihood (or profile likelihood) as function of the Higgs mass parameter.

## Exercise 3: Fit to a Combined model

In this exercise we will create a combined model based on two channels and we will perform a simultaneous fit on both channels. We will learn how to create a `RooSimultaneous` p.d.f. and a combined data set, which can be fit for one (or more) common parameters.

• The first channel is a Gaussian signal over an Exponential decay background. We assume that the location and width of the signal are known. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.3. The exponential decay of the background is 2. We assume we have 1000 background events.
• The second channel will be similar to the first one, we will have the same signal model, but a different background model. We assume we have less and a flatter background. We have an exponential decay of 4 and 100 background events in total.
• We assume the number of signal events in the first channel is 50 and in the second channel is 30. We introduce a common rate parameter `mu` expressing the signal strength.

We will so the following steps:

• Create the models for the two channels
• Create a combined model using simultaneous pdf (class `RooSimultaneous`).
• Generate a combined data set, fit and plot the result

• Create first the two separate background models and a common Gaussian signal model
• Express the number of signal events in term of the signal strength `mu`: the number of signal events is the product of `mu` with a fixed number (50 for the first channel and 30 for the second).
```RooWorkspace w("w");
w.factory("Exponential:bkg1_pdf(x[0,10], a1[-0.5,-2,-0.2])");
w.factory("Exponential:bkg2_pdf(x, a2[-0.25,-2,-0.2])");
w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.3])");
// express number of events as the
w.factory("prod:nsig1(mu[1,0,5],xsec1[50])");
w.factory("prod:nsig2(mu,xsec2[30])");

w.factory("SUM:model1(nsig1*sig_pdf, nbkg1[1000,0,10000]*bkg1_pdf)");  // for extended model
w.factory("SUM:model2(nsig2*sig_pdf, nbkg2[100,0,10000]*bkg2_pdf)");  // for extended model
```
• Make a simultaneous pdf * create first the category label to identify the channels (`RooCategory`) * use the factory operator `SIMUL` for creating the `RooSimultaneous` pdf.

```// Create discrete observable to label channels
w.factory("index[channel1,channel2]");
// Create joint pdf (RooSimultaneous)
w.factory("SIMUL:jointModel(index,channel1=model1,channel2=model2)");```
• Generate the events. We can generate directly a combined data sets from the model, with the number of events fluctuating around the number of expected events in each channel.
• Note that for creating a combined data set we need to add the Category variable in the data set as an additional observable
```// generate combined data set
RooDataSet * data = pdf->generate( RooArgSet(*x,*index)); ```
• Plot the data. We make two plots representing the two channels. We need then to tell to the combined data set to select the channel we want
```RooPlot * plot1 = x->frame();
RooPlot * plot2 = x->frame();
data->plotOn(plot1,RooFit::Cut("index==index::channel1"));
data->plotOn(plot2,RooFit::Cut("index==index::channel2")); ```
• Fit the data is as before using the `fitTo` method.
• Plot the resulting model on the two data sets. We need to project the simultaneous pdf on the corresponding channel. This is done as following:
```// fit
RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad"));
r->Print();

// plotting (draw the two separate pdf's )
pdf->plotOn(plot1,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel1"));
pdf->plotOn(plot2,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel2"));

// draw the two plots in two separate pads:
c1 = new TCanvas();
c1->Divide(1,2);
c1->cd(1); plot1->Draw();
c1->cd(2); plot2->Draw();```

Code solution:

```#include "RooWorkspace.h"
#include "RooAbsPdf.h"
#include "RooArgSet.h"
#include "RooCategory.h"
#include "RooDataSet.h"
#include "RooFitResult.h"
#include "RooPlot.h"
#include "RooRealVar.h"
#include "RooRandom.h"
#include "TCanvas.h"

#include "RooStats/ModelConfig.h"

using namespace RooFit;

void CombinedModel() {

RooWorkspace w("w");

w.factory("Exponential:bkg1_pdf(x[0,10], a1[-0.5,-2,-0.2])");
w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.3])");

w.factory("prod:nsig1(mu[1,0,5],xsec1[50])");

w.factory("SUM:model1(nsig1*sig_pdf, nbkg1[1000,0,10000]*bkg1_pdf)");  // for extended model

w.factory("Exponential:bkg2_pdf(x, a2[-0.25,-2,-0.2])");

w.factory("prod:nsig2(mu,xsec2[30])");
w.factory("SUM:model2(nsig2*sig_pdf, nbkg2[100,0,10000]*bkg2_pdf)");  // for extended model

// Create discrete observable to label channels
w.factory("index[channel1,channel2]");

// Create joint pdf (RooSimultaneous)
w.factory("SIMUL:jointModel(index,channel1=model1,channel2=model2)");

RooAbsPdf * pdf = w.pdf("jointModel");
RooRealVar * x = w.var("x");  // the observable
RooCategory * index = w.cat("index");  // the category

// generate the data (since it is exetended we can generate the global model

// use fixed random numbers for reproducibility
// use 0 for changing every time
RooRandom::randomGenerator()->SetSeed(111);
// generate binned
// plot the generate data in 50 bins (default is 100)
x->setBins(50);

// generate events of joint model
// NB need to add also category as observable
RooDataSet * data = pdf->generate( RooArgSet(*x,*index));  // will generate accordint to total S+B events
data->SetName("data");
w.import(*data);

data->Print();

RooPlot * plot1 = x->frame(Title("Channel 1"));
RooPlot * plot2 = x->frame(Title("Channel 2"));
data->plotOn(plot1,RooFit::Cut("index==index::channel1"));
data->plotOn(plot2,RooFit::Cut("index==index::channel2"));
// plot->Draw();

RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad"));
r->Print();

pdf->plotOn(plot1,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel1"));
pdf->plotOn(plot2,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel2"));
//draw the two separate pdf's
pdf->paramOn(plot1,Layout(0.65,0.85,0.85),Parameters(RooArgSet(*w.var("a1"),*w.var("nbkg1"))));
pdf->paramOn(plot2,Layout(0.65,0.85,0.85),Parameters(RooArgSet(*w.var("a2"),*w.var("nbkg2"))));
pdf->paramOn(plot2,Layout(0.65,0.85,0.7),Parameters(RooArgSet(*w.var("mu"))));
TCanvas * c1 = new TCanvas();
c1->Divide(1,2);
c1->cd(1); plot1->Draw();
c1->cd(2); plot2->Draw();

RooStats::ModelConfig mc("ModelConfig",&w);
mc.SetPdf(*pdf);
mc.SetParametersOfInterest(*w.var("mu"));
mc.SetObservables(RooArgSet(*w.var("x"),*w.cat("index")));

// define set of nuisance parameters
w.defineSet("nuisParams","a1,nbkg1,a2,nbkg2");

mc.SetNuisanceParameters(*w.set("nuisParams"));

// set also the snapshot
mc.SetSnapshot(*w.var("mu"));

// import model in the workspace
w.import(mc);

// write the workspace in the file
TString fileName = "CombinedModel.root";
w.writeToFile(fileName,true);
cout << "model written to file " << fileName << endl;
}

```

• Combined Model fit result:

Fit of the Combined model of two channels using RooFit

-- LorenzoMoneta - 2015-03-23

Topic attachments
I Attachment History Action Size Date Who Comment
c CombinedModel.C r1 manage 3.3 K 2015-03-24 - 12:26 LorenzoMoneta Files for exercises 3 (CombinedModel)
pdf CombinedModel.pdf r1 manage 23.7 K 2015-03-24 - 12:26 LorenzoMoneta Files for exercises 3 (CombinedModel)
c GaussianFit.C r1 manage 2.5 K 2015-03-24 - 09:05 LorenzoMoneta Files for exercises 1 (Gaussian fit)
pdf GaussianFit_ROOT.pdf r1 manage 16.2 K 2015-03-24 - 09:05 LorenzoMoneta Files for exercises 1 (Gaussian fit)
pdf GaussianRooFit.pdf r1 manage 22.5 K 2015-03-24 - 09:05 LorenzoMoneta Files for exercises 1 (Gaussian fit)
c GaussianToyStudy.C r1 manage 1.8 K 2015-03-24 - 16:38 LorenzoMoneta Toy study of the Gaussian fit (Exercise 1)
pdf GaussianToyStudy.pdf r1 manage 29.8 K 2015-03-24 - 16:42 LorenzoMoneta Gaussian Toy study
txt Hgg.txt r1 manage 330.5 K 2015-03-24 - 06:36 LorenzoMoneta Higgs data (text file)
c fitHgg.C r1 manage 1.7 K 2015-03-24 - 09:16 LorenzoMoneta
pdf fitHgg.pdf r1 manage 25.2 K 2015-03-24 - 09:18 LorenzoMoneta
Topic revision: r6 - 2015-03-26 - LorenzoMoneta

 Cern Search TWiki Search Google Search RooStats All webs
Copyright &© 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback