RooFit Exercises: Create and fitting a model

The aim of the RooFit exercises is to learn how to build a RooFit model, how to fit the model on a set of data that we will generate and how to plot the result. We will also learn how to save the full RooFit workspace containing the model and data in a file which we can read back later, for example when we will do the RooStats exercises. We will learn also how to generate the data from the model. The same generated data we will then use to fit the model

Using the snippets of code, one can build a full ROOT macro which can be run interactively in ROOT. If you want to run in compiled mode using ACLIC you might need to add all the needed header files.

Exercise 1: Fitting using ROOT and RooFit

We will start with a very simple exercise. The aim of this exercise is to show the basic fitting functionality in ROOT and RooFit.

Create an histogram with 100 bins between -5 and 5 and fill with it 500 gaussian distributed numbers with mean 0 and sigma 1. Then fit the histogram with a gaussian function (possibly using its normalized form) using ROOT to estimate the number of events, the mean and the sigma of the underlined gaussian distribution.

  • fit using the Neyman chi-square
  • fit using the Pearson chi-square
  • fit using a binned maximum likelihood
  • examine the results obtained and plot the three different functions obtained
  • fit the histogram using RooFit. You need to create a Gaussian model, which include the number of events (i.e. an extended model) and a RooFit binned data object (RooDataHist).
  • plot the resulting fit function on top of the data.

If you are already familiar with ROOT and RooFit you can also proceed to the optional second part

  • If you have problems on how to create and fill an histogram you can look at the solution or use the attached file, GaussianHistogram.root.
  • Then you need to create a TF1 representing a normalised gaussian. You can use the "gausn" pre-defined function or ROOT::Math::normal_pdf or TMath::Gauss(x,mu,sigma, true).
  • Use TH1::Fit with the default options for the Neyman chi-square
  • Use the option "P" for the Pearson chi-square
  • Use the option "L" for the binned likelihood fit

  • For using RooFit you need first to create the RooDataHist class from an histogram. See the lecture slides on how to do it.
  • For creating the RooFit model you need to first to create a RooGaussian pdf which has the mean and the sigma as parameters. For adding the number of events you need to create the RooExtendPdf class. You can create the classes directly or by using the factory functionality of the RooWorkspace.
  • How to create a RooFit model
    • Every variable in RooFit, needs to be created with a defined range [xmin,xmax]. Use a very large value if you don't know the range. If you provided only a value, the variable is considered constant. If you provide only the minimum and the maximum, the initial value will be taken in the middle of the range. The initial value must be within the given range.
    • Using the workspace factory syntax any existing RooFit pdf class by using the class name stripped by the Roo suffix. The exact signature required (variables to be passed) is defined in the class constructor (You can browse the reference documentation here.
    • You need to define the [value, min, max] of a variable only the first time you create in the factory. Afterwards you can reference the variable by its name.
// create a Gaussian Pdf and an extended pdf
RooWorkspace w("w"); 
w.factory("Gaussian:pdf(x[-10,10],mu[1,-1000,1000],s[2,0,1000])");   
w.factory("ExtendPdf:model(g,n[100,0,10000])");   
  • How to use the model from the workspace
    • After you have created the variable and p.d.f in the workspace, you can access their pointers, by using RooWorkspace::var for variables or RooWorkspace::pdf for p.d.f. Example:
// retrieving model components
RooAbsPdf * pdf = w.pdf("model");   // access object from workspace
RooRealVar * x = w.var("x");   // access object from workspace
  • Plotting the data:
// create a RooPlot object and plot the data
RooPlot * plot = x->frame(); 
data->plotOn(plot); 
  • Fitting the data using RooAbsPdf::fitTo.
// fit the data and save its result. Use the optional =Minuit2= minimiser
RooFitResult * res = pdf->fitTo(*data, Minimizer("Minuit2","Migrad"), Save(true) );
  • Plotting the fit function resulting from the fit:
pdf->plotOn(plot); 
plot->Draw();   // to show the RooPlot in the current ROOT Canvas

// make a simple Gaussian Fit in ROOT and RooFit


using namespace RooFit; 

void GaussianFit(int n = 500) { 

   TH1D * h1 = new TH1D("h1","Gaussian Fit",100,-5,5);
   h1->FillRandom("gaus", n);

   TF1 * f1  = new TF1("f1","gausn",-5,5);
   f1->SetLineColor(kBlue);
   f1->SetParameters(100,0,1);

   // Neyman chi2 fit
   TFitResultPtr r1 = h1->Fit(f1,"S");

   TF1 * f2  = new TF1("f2","gausn",-5,5);
   f2->SetLineColor(kBlue-3);
   f2->SetParameters(100,0,1);

   // Pearson chi2 fit
   TFitResultPtr r2 = h1->Fit(f2,"P + S");

   TF1 * f3  = new TF1("f3","gausn",-5,5);
   f3->SetLineColor(kRed);
   f3->SetParameters(100,0,1);

   // Binnel likelihood fit
   TFitResultPtr r3 = h1->Fit(f3,"L + S");


   TLegend * l = new TLegend(0.65,0.55,0.88,0.7);
   l->AddEntry(f1,"Neyman chi2","l");
   l->AddEntry(f2,"Person chi2","l");
   l->AddEntry(f3,"Binned ML  ","l");
   l->Draw();

   // do now the fit with RooFit

   RooWorkspace w("w");
   // define a Gaussian pdf
   w.factory("Gaussian:g(x[-5,5],mu[0,-10,10],sigma[1,0,1000])");
   // create extend pdf with number of events
   w.factory("ExtendPdf:model(g,nevt[100,0,100000])");   

   RooAbsPdf * pdf = w.pdf("model");   // access object from workspace
   RooRealVar * x = w.var("x");   // access object from workspace
   

   // generate n gaussian measurement for a Gaussian(x, 0, 1);
   //RooDataSet * data = pdf->generate( *x);
   //data->SetName("unbinData");  

   // create dataset
   RooDataHist * data = new RooDataHist("data","data",*x,h1); 


   new TCanvas(); 
   // RooFit plotting capabilities 
   RooPlot * pl = x->frame(Title("Gaussian Fit")); 
   data->plotOn(pl); 
   pl->Draw(); 

   w.var("sigma")->setVal(1);
   w.var("mu")->setVal(0);

   // now fit the data set 
   RooFitResult * r4 = pdf->fitTo(*data, Minimizer("Minuit2","Migrad"), Save(1) ); 
 

   // plot the pdf on the same RooPlot object we have plotted the data 
   pdf->plotOn(pl);
   pdf->paramOn(pl, Layout(0.6,0.9,0.85));

   pl->Draw();

   // import data in workspace (IMPORTANT for saving it )
   w.import(*data); 

   w.Print();

   // write workspace in the file (recreate file if already existing)
   w.writeToFile("GaussianModel.root", true);

   cout << "model written to file " << endl;    std::cout << "\nResult of Neyman chi2 \n";    r1->Print();
   std::cout << "\nResult of Pearson chi2 \n";    r2->Print();
   std::cout << "\nResult of Binned Likelihood \n";    r3->Print();
   std::cout << "\nResult of Binned Likelihood fit with ROOTFIT\n";    r4->Print();


}

  • Gaussian Fit Result using ROOT

GaussianFit ROOT.pdf
Fit of the Gaussian model using chi-square and likelihood methods in ROOT

  • Gaussian Fit Result using RooFit

GaussianRooFit.pdf
Fit of the Gaussian model using RooFit

Pointing hand We have performed a binned fit of an histogram. If you have all the data used to generate the histogram, how can you make an un-binned likelihood fit (in ROOT or RooFit) ?

Here is the suggestion on how to generate a data set with RooFit using the RooAbsPdf::generate function.
  • We need to pass the observable we want to generate the data on (e.g. x) and the number of events
  • Note that we can generate also binned data sets. In that case we have two options:
    • use a RooDataHist and generateBinned: RooDataHist * binData = pdf->generateBinned(*x, 1000);.
    • use the option AllBinned() which will generate a weighted data set: RooDataSet * data = pdf->generate(*x, NumEvents(1000), AllBinned());.
// generate an unbinned dataset of 1000 events
RooDataSet * data = pdf->generate( *x, 1000); 
Doing the un-binned fit
  • using the gaussian pdf (w.pdf("gaus")->fitTo(*data)) to fit only the shape of the distribution (not-extend fit);
  • using the extended pdf, (w.pdf("model")->fitTo(*data))which includes the number of events, to perform an extended unbinned fit.
If we are using ROOT, we can use TTree::UnbinnedFit or the ROOT::Fit::Fitter class directly. See the user documentation or the attached macro on how to perform the unbinned (extended or not-extended) fit in ROOT. It is important to note that in un-binned fit (extended or not extended) the fit model function needs to be normalised in its fitting range. RooFit performs this automatically, while in ROOT the user needs to provide as input a normalised function.

Pointing handStudy the difference of performing in ROOT a fit to an histogram using a Neyman, Pearson and binned likelihood. Study the bias obtained in the fit parameters for the gaussian case, by generating 1000 pseudo-histograms and fitting them with the three methods. Study the obtained pull distributions.

Exercise 2: Fit of a Higgs Signal over an Exponential Background

The aim of this exercise is to learn how to build a composite model in RooFit made of two p.d.f, one representing a signal and one a background distributions. We want to determine the number of signal events. For this we need to perform an extended maximum likelihood fit, where the signal events is one of the fit parameter.

Create first the composite model formed by a Gaussian signal over a falling exponential background. Then read the data (in text format) from the file and create a RooDataSet class with all the data. Perform then an extended unbinned fit to the data to extract the Higgs signal strength. Plot the resulting fit function from the fit with separate signal and background components.

  • Read first the attached file (Hgg.txt in a RooDataSet using RooDataSet::read or in a ROOT TTree using TTree::ReadFile
  • Create the model using the RooWorkspace factory
    • create the Exponential pdf and the Gaussian signal p.d.f. function. We create the exponential as two different components to reproduce better the data.
    • create then a RooAddPdf using the Gaussian and the Exponential and the relative number of events. Note that we could instead of the number of events a relative fraction. In this last case we would fit only for the shape and not the normalisation.
    • The RooAddPdf is created using the SUM operator of the factory syntax.
// create model
   RooWorkspace w("w");
   w.factory("x[110,160]");  // invariant mass
   
   // create exponential model as two components
   w.factory("a1[ 7.5, -500, 500]");
   w.factory("a2[-1.5, -500, 500]");
   w.factory("expr::z('-(a1*x/100 + a2*(x/100)^2)', a1, a2, x)");
   w.factory("Exponential::bmodel(z, 1)");
 
    // signal model   
   w.factory("mass[130, 110, 150]");
   w.factory("width[1, 0.5, 5]");
   w.factory("Gaussian::smodel(x, mass, width)");

   // create RooAddPdf in extended mode
   w.factory("nbackground[10000, 0, 1000000]");
   w.factory("nsignal[100, 0.0, 1000.0]");
   w.factory("SUM::model(nbackground*bmodel, nsignal*smodel)");
 w.factory("SUM:model(nsig[100,0,10000]*sig_pdf, nbkg[1000,0,10000]*bkg_pdf)");  // for extended model

  • Fit the data: you need to call the RooAbsPdf::fitTo.
// fit the data and save its result. Use eventually the optional =Minuit2= minimiser
RooFitResult * res = pdf->fitTo(*data, Minimizer("Minuit2","Migrad"), Save(true) );
  • Plot the resulting fit function in the same plot.
  • Plot also the signal and background fit components with different colour and style
pdf->plotOn(plot); 
//draw the two separate pdf's
pdf->plotOn(plot, RooFit::Components("bkg_pdf"), RooFit::LineStyle(kDashed) );
pdf->plotOn(plot, RooFit::Components("sig_pdf"), RooFit::LineColor(kRed), RooFit::LineStyle(kDashed) );
plot->Draw();   // to show the RooPlot in the current ROOT Canvas

// macro to fit Higgs to gg spectrum

using namespace RooFit;

void fitHgg() {

   // read from the file and create a ROOT tree

   TTree tree("tree","tree");
   int nevt = tree.ReadFile("Hgg.txt","x");
   if (nevt <= 0) {       Error("fitHgg","Error reading data from input file ");       return;    }    std::cout << "Read " << nevt << " from the file " << std::endl;    // make the RooFit model     RooWorkspace w("w");    w.factory("x[110,160]");  // invariant mass        w.factory("nbackground[10000, 0, 10000]");    //w.factory("Exponential::z1(x, a1[-1,-10,0])");    w.var("nbackground")->setVal(nevt);
   w.var("nbackground")->setMin(0.1*nevt);
   w.var("nbackground")->setMax(10*nevt);

   // create exponential model as two components
   w.factory("a1[ 7.5, -500, 500]");
   w.factory("a2[-1.5, -500, 500]");
   w.factory("expr::z('-(a1*x/100 + a2*(x/100)^2)', a1, a2, x)");
   w.factory("Exponential::bmodel(z, 1)");

   // signal model   
   w.factory("nsignal[100, 0.0, 1000.0]");
   //w.factory("mass[%f, %f, %f]' % (massguess, massmin, massmax))
   w.factory("mass[130, 110, 150]");
   w.factory("width[1, 0.5, 5]");
   w.factory("Gaussian::smodel(x, mass, width)");
   RooAbsPdf * smodel = w.pdf("smodel");

   w.factory("SUM::model(nbackground*bmodel, nsignal*smodel)");
   RooAbsPdf * model = w.pdf("model");

   // create RooDataSet
   RooDataSet data("data","data",*w.var("x"),Import(tree) );
   
   RooFitResult * r = model->fitTo(data, Minimizer("Minuit2"),Save(true), Offset(true));

   // plot data and function

   RooPlot * plot = w.var("x")->frame();
   data.plotOn(plot);
   model->plotOn(plot);
   model->plotOn(plot, Components("bmodel"),LineStyle(kDashed));
   model->plotOn(plot, Components("smodel"),LineColor(kRed));

   plot->Draw();

}

  • Hgg Fit Result

fitHgg.pdf
Fit of the H->gg CMS data

Pointing handDo a scan of the likelihood (or profile likelihood) as function of the Higgs mass parameter.

Exercise 3: Fit to a Combined model

In this exercise we will create a combined model based on two channels and we will perform a simultaneous fit on both channels. We will learn how to create a RooSimultaneous p.d.f. and a combined data set, which can be fit for one (or more) common parameters.

  • The first channel is a Gaussian signal over an Exponential decay background. We assume that the location and width of the signal are known. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.3. The exponential decay of the background is 2. We assume we have 1000 background events.
  • The second channel will be similar to the first one, we will have the same signal model, but a different background model. We assume we have less and a flatter background. We have an exponential decay of 4 and 100 background events in total.
  • We assume the number of signal events in the first channel is 50 and in the second channel is 30. We introduce a common rate parameter mu expressing the signal strength.

We will so the following steps:

  • Create the models for the two channels
  • Create a combined model using simultaneous pdf (class RooSimultaneous).
  • Generate a combined data set, fit and plot the result

  • Create first the two separate background models and a common Gaussian signal model
  • Express the number of signal events in term of the signal strength mu: the number of signal events is the product of mu with a fixed number (50 for the first channel and 30 for the second).
RooWorkspace w("w"); 
w.factory("Exponential:bkg1_pdf(x[0,10], a1[-0.5,-2,-0.2])");
w.factory("Exponential:bkg2_pdf(x, a2[-0.25,-2,-0.2])");
w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.3])");
// express number of events as the    
w.factory("prod:nsig1(mu[1,0,5],xsec1[50])");
w.factory("prod:nsig2(mu,xsec2[30])");

w.factory("SUM:model1(nsig1*sig_pdf, nbkg1[1000,0,10000]*bkg1_pdf)");  // for extended model
w.factory("SUM:model2(nsig2*sig_pdf, nbkg2[100,0,10000]*bkg2_pdf)");  // for extended model
  • Make a simultaneous pdf * create first the category label to identify the channels (RooCategory) * use the factory operator SIMUL for creating the RooSimultaneous pdf.

// Create discrete observable to label channels
w.factory("index[channel1,channel2]");
// Create joint pdf (RooSimultaneous)
w.factory("SIMUL:jointModel(index,channel1=model1,channel2=model2)");
  • Generate the events. We can generate directly a combined data sets from the model, with the number of events fluctuating around the number of expected events in each channel.
  • Note that for creating a combined data set we need to add the Category variable in the data set as an additional observable
// generate combined data set
RooDataSet * data = pdf->generate( RooArgSet(*x,*index)); 
  • Plot the data. We make two plots representing the two channels. We need then to tell to the combined data set to select the channel we want
RooPlot * plot1 = x->frame();
RooPlot * plot2 = x->frame();
data->plotOn(plot1,RooFit::Cut("index==index::channel1"));
data->plotOn(plot2,RooFit::Cut("index==index::channel2")); 
  • Fit the data is as before using the fitTo method.
  • Plot the resulting model on the two data sets. We need to project the simultaneous pdf on the corresponding channel. This is done as following:
// fit 
 RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad"));
 r->Print();

 // plotting (draw the two separate pdf's )
 pdf->plotOn(plot1,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel1"));
 pdf->plotOn(plot2,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel2"));

 // draw the two plots in two separate pads:
 c1 = new TCanvas();
 c1->Divide(1,2);
 c1->cd(1); plot1->Draw(); 
 c1->cd(2); plot2->Draw();

Code solution:

#include "RooWorkspace.h"
#include "RooAbsPdf.h"
#include "RooArgSet.h"
#include "RooCategory.h"
#include "RooDataSet.h"
#include "RooFitResult.h"
#include "RooPlot.h"
#include "RooRealVar.h"
#include "RooRandom.h"
#include "TCanvas.h"

#include "RooStats/ModelConfig.h"

using namespace RooFit; 

void CombinedModel() { 

   RooWorkspace w("w"); 

   w.factory("Exponential:bkg1_pdf(x[0,10], a1[-0.5,-2,-0.2])");
   w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.3])");
   
   w.factory("prod:nsig1(mu[1,0,5],xsec1[50])");

   w.factory("SUM:model1(nsig1*sig_pdf, nbkg1[1000,0,10000]*bkg1_pdf)");  // for extended model

   w.factory("Exponential:bkg2_pdf(x, a2[-0.25,-2,-0.2])");

   w.factory("prod:nsig2(mu,xsec2[30])");
   w.factory("SUM:model2(nsig2*sig_pdf, nbkg2[100,0,10000]*bkg2_pdf)");  // for extended model

   // Create discrete observable to label channels
   w.factory("index[channel1,channel2]");

   // Create joint pdf (RooSimultaneous)
   w.factory("SIMUL:jointModel(index,channel1=model1,channel2=model2)");

   RooAbsPdf * pdf = w.pdf("jointModel");
   RooRealVar * x = w.var("x");  // the observable
   RooCategory * index = w.cat("index");  // the category


   // generate the data (since it is exetended we can generate the global model
 
   // use fixed random numbers for reproducibility
   // use 0 for changing every time
   RooRandom::randomGenerator()->SetSeed(111);
   // generate binned 
   // plot the generate data in 50 bins (default is 100) 
   x->setBins(50);

   // generate events of joint model 
   // NB need to add also category as observable
   RooDataSet * data = pdf->generate( RooArgSet(*x,*index));  // will generate accordint to total S+B events
   data->SetName("data");
   w.import(*data);

   data->Print(); 


   RooPlot * plot1 = x->frame(Title("Channel 1"));
   RooPlot * plot2 = x->frame(Title("Channel 2"));
   data->plotOn(plot1,RooFit::Cut("index==index::channel1"));
   data->plotOn(plot2,RooFit::Cut("index==index::channel2"));
    // plot->Draw();

   RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad"));
   r->Print();

   pdf->plotOn(plot1,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel1"));
   pdf->plotOn(plot2,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel2"));
  //draw the two separate pdf's
   pdf->paramOn(plot1,Layout(0.65,0.85,0.85),Parameters(RooArgSet(*w.var("a1"),*w.var("nbkg1"))));
   pdf->paramOn(plot2,Layout(0.65,0.85,0.85),Parameters(RooArgSet(*w.var("a2"),*w.var("nbkg2"))));
   pdf->paramOn(plot2,Layout(0.65,0.85,0.7),Parameters(RooArgSet(*w.var("mu"))));
   TCanvas * c1 = new TCanvas();
   c1->Divide(1,2);
   c1->cd(1); plot1->Draw(); 
   c1->cd(2); plot2->Draw();



  RooStats::ModelConfig mc("ModelConfig",&w);
   mc.SetPdf(*pdf);
   mc.SetParametersOfInterest(*w.var("mu"));
   mc.SetObservables(RooArgSet(*w.var("x"),*w.cat("index")));

   // define set of nuisance parameters
   w.defineSet("nuisParams","a1,nbkg1,a2,nbkg2");

   mc.SetNuisanceParameters(*w.set("nuisParams"));

   // set also the snapshot
   mc.SetSnapshot(*w.var("mu"));

   // import model in the workspace 
   w.import(mc);

   // write the workspace in the file
   TString fileName = "CombinedModel.root";
   w.writeToFile(fileName,true);
   cout << "model written to file " << fileName << endl;
}

  • Combined Model fit result:

CombinedModel.pdf
Fit of the Combined model of two channels using RooFit

-- LorenzoMoneta - 2015-03-23

Topic attachments
I Attachment History Action Size Date Who Comment
C source code filec CombinedModel.C r1 manage 3.3 K 2015-03-24 - 12:26 LorenzoMoneta Files for exercises 3 (CombinedModel)
PDFpdf CombinedModel.pdf r1 manage 23.7 K 2015-03-24 - 12:26 LorenzoMoneta Files for exercises 3 (CombinedModel)
C source code filec GaussianFit.C r1 manage 2.5 K 2015-03-24 - 09:05 LorenzoMoneta Files for exercises 1 (Gaussian fit)
PDFpdf GaussianFit_ROOT.pdf r1 manage 16.2 K 2015-03-24 - 09:05 LorenzoMoneta Files for exercises 1 (Gaussian fit)
PDFpdf GaussianRooFit.pdf r1 manage 22.5 K 2015-03-24 - 09:05 LorenzoMoneta Files for exercises 1 (Gaussian fit)
C source code filec GaussianToyStudy.C r1 manage 1.8 K 2015-03-24 - 16:38 LorenzoMoneta Toy study of the Gaussian fit (Exercise 1)
PDFpdf GaussianToyStudy.pdf r1 manage 29.8 K 2015-03-24 - 16:42 LorenzoMoneta Gaussian Toy study
Texttxt Hgg.txt r1 manage 330.5 K 2015-03-24 - 06:36 LorenzoMoneta Higgs data (text file)
C source code filec fitHgg.C r1 manage 1.7 K 2015-03-24 - 09:16 LorenzoMoneta  
PDFpdf fitHgg.pdf r1 manage 25.2 K 2015-03-24 - 09:18 LorenzoMoneta  
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2015-03-26 - LorenzoMoneta
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    RooStats All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback