RooStats tutorials

Set of RooStats tutorials given at CERN for LHCb the 26th April 2012: (agenda: Below is given some materials used during the RooStats hands-on sessions. More material has been showed at the Desy statistical school (see

Introduction to RooStats

See slides attached to the agenda in Indico:

Getting started with the software

A few advice words before starting. The tutorials requires the ROOT version 5.32.02.

  • work locally on your laptop (if possible) to reduce the WLAN usage,
  • if cannot work locally use it on lxplus. See later how to setup for the ROOT version on lxplus;
  • add in your macro (this will load automatically the Roofit and Roostats libraries):
           using namespace RooFit;
           using namespace RooStats;
  • avoid running a macro multiple times in the same ROOT CINT sessions (it might unfortunately crash sometimes if you do it).
  • note that Roostats methods start with capital letter (as in ROOT) while Roofit ones start with lower letter *Roostats calculator are quite verbose, you can suppress the output by doing:
RooMsgService::instance().setGlobalKillBelow(RooFit::WARNING) ;

Setting ROOT 5.32.02 using the CERN AFS installation on lxplus:

  • for bash and zsh shell :
    . /afs/
   cd  /afs/
   . bin/
  • for []tcsh shell:
   source /afs/
   source /afs/

Some links to documentation:

Terminology and Conventions

Here we give pragmatic definitions for a few basic concepts that we will use.
  • observable - something you measure in an experiment, for example a particle's momentum. Often, a function of measured quantities, for example an invariant mass of several particles
  • global observable or auxiliary observable - an observable from another measurement, for example, the integrated luminosity
  • model - a set of probability density functions (PDFs), which describe distributions of the observables or functions of observables
  • model parameter - any variable in your PDF expression, which is not an observable
  • parameter of interest (POI) - a model parameter that you study, for example a cross section of your signal process
  • nuisance parameter - every other model parameter, which is not your parameter of interest
  • data or dataset - a set of values of observables, either measured in an experiment, or simulated
  • likelihood - a model computed for a particular dataset
  • hypothesis - a particular model, with specified observables, POI, nuisance parameters, and prior PDFs (in case of Bayesian inference)
  • prior PDF - a probability density for an observable or a model parameter, which is known a priori, i.e. before a measurement is considered. This is a Bayesian concept exclusively. Prior has no meaning or place in a frequentist type of inference
  • Bayesian - a type of statistical inference that usually produces probability of the hypothesis given the data. Requires a prior.
  • frequentist - a type of statistical inference that usually produces probability of the data given the hypothesis.

Exercises for this tutorial

Below will given code snippets showing how to build some different analysis models and then on how to run some of the Roostats calculators

  • Poisson model: let's assume you observe n events in data while expecting b background event. We consider some systematic uncertainties in the background model value, b. We express this uncertainty as Gaussian.
  • Observable-based analysis: 30 Gaussian-distributed signal event over ~ 1000 exponential-distributed background. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.5. We assume that the location and width of the signal are known.


  • 68% CL 2-sided confidence interval and significance from the (profiled-) likelihood ratio
  • 68% credible interval using the BayesianCalculator or the MCMCCalculator
  • 95% CL upper-limit with the inverter method and CLs, first using the frequentist calculator (with toys) then asymptotically
  • A significance test using the FrequentistCalculator

We provide code snippet for creating the models, save them in a file and then for running the calculator after having read the workspace from the file. All the Roostats calculators take as input a data set and a Model Config object. It is therefore convenient to save the workspace (with model and data) in a ROOT file and then read it afterwards for running the calculators.

If you have problem creating the code for running the calculators, you can run the Standard Roostats tutorials (in $ROOTSYS/tutorials/roostats)

Tutorials you could use are:

They can be run as following from the Root prompt by passing the workspace file name (e.g. "SPlusBExpoModel.root" ), workspace name, model config name and data set name:

   .L $ROOTSYS/tutorials/roostats/StandardProfileLikelihoodDemo.C
   StandardProfileLikelihoodDemo("SPlusBExpoModel.root", "w", "ModelConfig", "", "data" )

Running StandardHypoTestInvDemo.C is a bit more complicated and is explained later.

Other tutorials go into more details to create various models and run the calculators. See them all at

Some of the following code here is extracted from exercises macro showed at the Desy Statistical school in April 2012. A tar file with the exercises can be downloaded from here

Create Poisson Counting model

Make a counting model: Poisson(nobs | s + b). Assume b is known with un uncertainty of 20%

Make first the workspace and write in the file. Use RooFit Factory to create pdf and variables

   int nobs = 3;   // number of observed events
   double b = 1; // number of background events
   double sigmab = 0.2;   // relative uncertainty in b

   RooWorkspace w("w",true);

   // make Poisson model * Gaussian constraint
   // Poisson of (n | s+b)

Set the variables values for the model (nobs, b and error on b)

   RooRealVar * obs = w.var("nobs");
   w.var("b0")->setConstant(true); // needed for being treated as global observables

   // use this to avoid too large ranges 
   //set value of observed events

make now data set and import it in the workspace

   // make data set with the namber of observed events
   RooDataSet data("data","", *obs );
   data.add(*obs );

Make now the ModelConfig needed for RooStats. Add also the snapshot of the parameter of interest defining the teasing hypothesis (needed later) and the global observables for the frequentist calculators and the prior pdf for Bayesian calculator

   ModelConfig mc("ModelConfig",&w);

   // these are needed for the hypothesis tests

   // make a prior pdf for Bayesian calculators

   // import model in the workspace 

And save now the workspace for further use

   TString fileName = "PoissonModelWithBackg.root"; 

   // write workspace in the file (recreate file if already existing)
   w.writeToFile(fileName, true);

A possible variant to the model is to use a Gamma (Poisson) constraint, as in the on/off problem. See the paper: "Evaluation of three methods for calculating statistical significance when incorporating a systematic uncertainty into a test of the background-only hypothesis for a Poisson process Authors: Robert D. Cousins, James T. Linnemann, Jordan Tucker NIM A 595 (2008) 480--50.

In this case the constraint is expressed as a Poisson pdf representing the sideband measurement (or equivantly by a Gamma pdf, if thinking Bayesian):

   // make model for on/off problem
   // Poisson of (n | s+ b)
   // use tau :  y = tau*b

Another possible option is to write instead the constraint as a log-normal. We could first use as parameter of interest instead of s, mu, the ratio of the observed cross section divided by the nominal one. We introduce also another nuisance parameter the efficiency. The model is then:

   // make model for on/off problem
   w.factory("prod:nsig( mu[0,3], xsec[2], eff[0.5,0,1])" );
   // Poisson of (n | s+ b)

   // add Gaussian constraint in b 
   // add a log-normal constraint in eff

  w.factory( "eff_nominal[0.5]" );  // nominal efficiency
  w.factory( "eff_kappa[1.20]" );  // relative error in efficiency ( log-normal parameter (1+ error)
  w.factory( "cexpr:eff('eff_nominal*pow(eff_kappa,beta_eff)',eff_kappa,beta_eff[0,-5,5])" );
  w.factory( "Gaussian:constraint_eff(glob_eff[0,-5,5],beta_eff,1)" );  
  // glob_efficiency is the global observables      

  // make global model


Run Profile Likelihood Calculator

After we have create a workspace and saved in a file we can run the Roostats calculator. We start using the ProfileLikelihoodCalculator. First we need to retrieve the workspace from the file:

  const char* infile =  "PoissonModelWithBackg.root"; 
  const char* workspaceName = "w",
  const char* modelConfigName = "ModelConfig";
  const char* dataName = "data";

  // First part is just to access the workspace file 

  // Check if example input file exists
  TFile *file = TFile::Open(infile);

  // get the workspace out of the file
  RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);

  // get the modelConfig out of the file
  ModelConfig* mc = (ModelConfig*) w->obj(modelConfigName);

  // get the modelConfig out of the file
  RooAbsData* data = w->data(dataName);

Now we can start using the ProfileLikelihoodCalculator. We create the class and run it to find and plot the 1-sigma (68.3%) confidence interval on the parameter of interest as specified in the model config

  ProfileLikelihoodCalculator pl(*data,*mc);
  pl.SetConfidenceLevel(0.683); // 68.3% interval
  LikelihoodInterval* interval = pl.GetInterval();

   // find the iterval on the first Parameter of Interest
  RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();

  double lowerLimit = interval->LowerLimit(*firstPOI);
  double upperLimit = interval->UpperLimit(*firstPOI);

  cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
    lowerLimit << ", "<<
    upperLimit <<"] "<<endl;

We can make a plot of the profile likelihood function, to speedup in same case is useful to use the Draw option "tf1" and set a low number of points
  LikelihoodIntervalPlot * plot = new LikelihoodIntervalPlot(interval);
  //plot->SetRange(0,100);  // possible eventually to change ranges
  plot->SetNPoints(50);  // do not use too many points, it could become very slow for some models
  plot->Draw("");  // use option TF1 if too slow (plot.Draw("tf1")

Run Bayesian Calculator

On the same model we can run the Bayesian calculator. We create the class and configure it. We can set the type of interval (central, upper/lower limit or shortest) and the type of integration method

  BayesianCalculator bayesianCalc(*data,*mc);
  bayesianCalc.SetConfidenceLevel(0.683); // 68% interval

  // set the type of interval (not really needed for central which is the default)
  bayesianCalc.SetLeftSideTailFraction(0.5); // for central interval
  //bayesianCalc.SetLeftSideTailFraction(0.); // for upper limit
  //bayesianCalc.SetShortestInterval(); // for shortest interval

  // set the integration type (not really needed for the default ADAPTIVE)
  // possible alternative values are  "VEGAS" , "MISER", or "PLAIN"  (MC integration from libMathMore) 
  // "TOYMC" (toy MC integration, work when nuisances exist and they have a constraints pdf)
  TString integrationType = "";

  // this is needed if using TOYMC
  if (integrationType.Contains("TOYMC") ) { 
    RooAbsPdf * nuisPdf = RooStats::MakeNuisancePdf(*mc, "nuisance_pdf");
    if (nuisPdf) bayesianCalc.ForceNuisancePdf(*nuisPdf);


  // compute interval by scanning the posterior function
  // it is done by default when computing shortest intervals

  RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();

  SimpleInterval* interval = bayesianCalc.GetInterval();
  if (!interval) { 
     cout << "Error computing Bayesian interval - exit " << endl;

  double lowerLimit = interval->LowerLimit();
  double upperLimit = interval->UpperLimit();

  cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
    lowerLimit << ", "<<
    upperLimit <<"] "<<endl;

  // draw plot of posterior function

  RooPlot * plot = bayesianCalc.GetPosteriorPlot();
  if (plot) plot->Draw();  

One can try to run using instead of uniform prior also a different prior. For example the Jeffrey prior 1/sqrt(s) or 1/s. To create Jeffrey prior do:



  • numerical integration can fail if range of integration is too large. Sometimes is better to reduce range of integration of nuisance parameter and also of the poi to a region where the posterior is non-zero. This is typically around the MLE. For example one can set a range on the nuisance of +/- n sigma of their MLE estimate.
  • sometimes is useful to avoid some singular point of interrogations

Create Model with shape: Gaussian peak plus an exponential background (Unbinned Model)

Here we make an unbinned model containing a Gaussian peak and an exponential background: First we create the pdf:

   RooWorkspace w("w"); 
   // use cexpr for compiled expressions
   w.factory("Exponential:bkg_pdf(x[0,10], a)");
   w.factory("Gaussian:sig_pdf(x, mass[2,0,10], sigma[0.5,0,10])");

   w.factory("SUM:model(s[0,100]*sig_pdf, b[0,1000000]*bkg_pdf)");  // for extended model


   RooAbsPdf * pdf = w.pdf("model");
   RooRealVar * x = w.var("x");  // the observable

Then we generate the data and we create the ModelConfig and then we write all in a file
   // generate the data (N = 1000)
   int N = 1000;
   // use fixed random numbers for reproducibility
   RooDataSet * data = pdf->generate( *x);  // will generate accordint to total S+B events


   ModelConfig mc("ModelConfig",&w);
   // define set of nuisance parameters


   // import model in the workspace 

   // plot the generate data
   RooPlot * pl = x->frame();

   // write the workspace in the file
   TString fileName = "SPlusBExpoModel.root";
   cout << "model written to file " << fileName << endl;

Optionally we can make the model more complex by adding for example some systematics in the sigma of the gaussian peak, by adding a Gaussian constraint on it as we have done for the previous model.

Run Bayesian MCMC Calculator

On this model we run the MCMC calculator. Reduce the number of iterations if it is too slow

  // this proposal function seems fairly robust
  SequentialProposal sp(0.1);

  MCMCCalculator mcmc(*data,*mc);
  mcmc.SetConfidenceLevel(0.683); // 683% interval
  //  mcmc.SetProposalFunction(*pf);
  mcmc.SetNumIters(100000);         // Metropolis-Hastings algorithm iterations
  mcmc.SetNumBurnInSteps(50);       // first N steps to be ignored as burn-in

  // default is the shortest interval.  here use central
  mcmc.SetLeftSideTailFraction(0.5); // for central Bayesian interval
  //mcmc.SetLeftSideTailFraction(0); // for one-sided Bayesian interval

  RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();

  MCMCInterval* interval = mcmc.GetInterval();

  // print out the iterval on the first Parameter of Interest
  cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
    interval->LowerLimit(*firstPOI) << ", "<<
    interval->UpperLimit(*firstPOI) <<"] "<<endl;

  // make a plot of posterior function
  //TCanvas* c1 = 
  new TCanvas("IntervalPlot");
  MCMCIntervalPlot plot(*interval);

Run Hypothesis test Inverter

We run the hypothesis tests on the previous model (or the number counting). But first we need to create a background only model. We can copy it from the S+B model by setting the S value to zero:

      ModelConfig * sbModel  = (ModelConfig*) w->obj(modelSBName);
      RooRealVar* poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first();

      ModelConfig * bModel = (ModelConfig*) sbModel->Clone();
      double oldval = poi->getVal();
      bModel->SetSnapshot( *poi  );

Then we create the Hypo Test calculator class. For example the frequentist calculator and we can set the number of toys to use. *NB. The class takes first alternate model then null model. In this case null is S+B

   FrequentistCalculator *  fc  = new FrequentistCalculator(*data, *bModel, *sbModel);
   fc->SetToys(1000,500);    // 1000 for null (S+B) , 50 for alt (B)

This is for the Hybrid Calculator, which requires a prior pdf if there are nuisance parameters:

   // // Hybrid calculator 
   HybridCalculator *  hc = new HybridCalculator(*data, *bModel, *sbModel);
   hc->SetToys(1000,500);    // 1000 for null (S+B) , 50 for alt (B)
   // check for nuisance prior pdf in case of nuisance parameters 
   if (sbModel->GetNuisanceParameters() && sbModel->GetNuisanceParameters()->getSize() > 0) {
      // for hybrid need nuisance pdf 
      RooAbsPdf * nuisPdf = RooStats::MakeNuisancePdf(*sbModel,"nuisancePdf_sbmodel");

This is for the Asymptotic Calculator

   // asymptotic calculator
   AsymptoticCalculator * ac = new AsymptoticCalculator(*data, *bModel, *sbModel);
   ac->SetOneSided(true);  // for one-side tests (limits)

We can now create the Inverter class

   // create hypotest inverter 
   // passing the desired calculator 
   HypoTestInverter calc(*ac);

   // set confidence level (e.g. 95% upper limits)
   // for CLS
   bool useCLs = true;

We need also to configure the ToyMCSampler and set the test statistics

   ToyMCSampler *toymcs = (ToyMCSampler*)calc.GetHypoTestCalculator()->GetTestStatSampler();
   // for number counting (extended pdf do not need this)
   // toymcs->SetNEventsPerToy(1);

   // profile likelihood test statistics 
   ProfileLikelihoodTestStat profll(*sbModel->GetPdf());
   // for CLs (bounded intervals) use one-sided profile likelihood
   if (useCLs) profll.SetOneSided(true);

   // ratio of profile likelihood - need to pass snapshot for the alt 
   // RatioOfProfiledLikelihoodsTestStat ropl(*sbModel->GetPdf(), *bModel->GetPdf(), bModel->GetSnapshot());

   // set the test statistic to use 
We can run now the inverter
   int npoints = 10;  // number of points to scan
   // min and max (better to choose smaller intervals)
   double poimin = poi->getMin();
   double poimax = poi->getMax();
   //poimin = 0; poimax=10;

   std::cout << "Doing a fixed scan  in interval : " << poimin << " , " << poimax << std::endl;

   HypoTestInverterResult * r = calc.GetInterval();
And finally we can query the result and make some plots:
   double upperLimit = r->UpperLimit();
   double ulError = r->UpperLimitEstimatedError();
   // double lowerLimit = r->LowerLimit();
   // double llError = r->LowerLimitEstimatedError();
   // if (lowerLimit < upperLimit*(1.- 1.E-4)) 
   //    std::cout << "The computed lower limit is: " << lowerLimit << " +/- " << llError << std::endl;
   std::cout << "The computed upper limit is: " << upperLimit << " +/- " << ulError << std::endl;
   // compute expected limit
   std::cout << "Expected upper limits, using the B (alternate) model : " << std::endl;
   std::cout << " expected limit (median) " << r->GetExpectedUpperLimit(0) << std::endl;
   std::cout << " expected limit (-1 sig) " << r->GetExpectedUpperLimit(-1) << std::endl;
   std::cout << " expected limit (+1 sig) " << r->GetExpectedUpperLimit(1) << std::endl;

   // plot now the result of the scan 

   HypoTestInverterPlot *plot = new HypoTestInverterPlot("HTI_Result_Plot","Feldman-Cousins Interval",r);

   // plot in a new canvas with style
   TCanvas * c1 = new TCanvas("HypoTestInverter Scan"); 

   plot->Draw("CLb 2CL");  // plot also CLb and CLs+b 
   //plot->Draw("OBS");  // plot only observed p-value

   // plot also in a new canvas the test statistics distributions 
   // plot test statistics distributions for the two hypothesis
   // when distribution is generated (case of FrequentistCalculators)
   const int n = r->ArraySize();
   if (n> 0 &&  r->GetResult(0)->GetNullDistribution() ) { 
      TCanvas * c2 = new TCanvas("Test Statistic Distributions","",2);
      if (n > 1) {
         int ny = TMath::CeilNint( sqrt(n) );
         int nx = TMath::CeilNint(double(n)/ny);
         c2->Divide( nx,ny);
      for (int i=0; i<n; i++) {
         if (n > 1) c2->cd(i+1);
         SamplingDistPlot * pl = plot->MakeTestStatPlot(i);


This is a macro in $ROOTSYS/tutorials/roostats which can run the hypothesis test inversion with all these options. From the root prompt you can do, for running the AsymptoticCalculator on the parametric unbinned model:

   .L $ROOTSYS/tutorials/roostats/StandardHypoTestInvDemo.C+

   StandardHypoTestInvDemo("SPlusBExpoModel.root", "w", "ModelConfig", "", "data", 2, 3, true, 20, 0, 100 )
You can look at the printed out observed and expected limits, and the CLs scan plot. Now run the full CLs calculation (it will take considerably more time but should still be manageable):
   .L $ROOTSYS/tutorials/roostats/StandardHypoTestInvDemo.C+

   StandardHypoTestInvDemo("SPlusBExpoModel.root", "w", "ModelConfig", "", "data", 0, 3, true, 10, 0, 100, 500, false )
We can look at the sampling distribution plots obtained.

If running on the number counting model you need to set to trye the last argument

   StandardHypoTestInvDemo("SPlusBExpoModel.root", "w", "ModelConfig", "", "data", 0, 3, true, 10, 0, 100, 500, true )
Here is the list of input arguments to the macro:
StandardHypoTestInvDemo(const char * infile = 0,
                        const char * wsName = "combined",
                        const char * modelSBName = "ModelConfig",
                        const char * modelBName = "",
                        const char * dataName = "obsData",                 
                        int calculatorType = 0,
                        int testStatType = 0, 
                        bool useCLs = true ,  
                        int npoints = 6,   
                        double poimin = 0,  
                        double poimax = 5, 
                        int ntoys=1000,
                        bool useNumberCounting = false,
                        const char * nuisPriorName = 0){
Where we have for the calculatorType:
  • type = 0 Freq calculator
  • type = 1 Hybrid calculator
  • type = 2 Asymptotic calculator

  • testStatType = 0 LEP
  • = 1 Tevatron
  • = 2 Profile Likelihood
  • = 3 Profile Likelihood one sided (i.e. = 0 if mu < mu_hat)

  • useCLs : scan for CLs (otherwise for CLs+b)

  • npoints: number of points to scan , for autoscan set npoints = -1

  • poimin,poimax: min/max value to scan in case of fixed scans (if min > max, try to find automatically)

  • ntoys: number of toys to use

  • useNumberCounting: set to true when using number counting events

  • nuisPriorName: name of prior for the nnuisance. This is often expressed as constraint term in the global model

Run Hypothesis test (Significance test)

On the same model we can run an hypothesis test to get the signal significance (number of sigma). In this case the null hypothesis is the background only model and the alternate is the S+B model.

Example running the frequentist calculator:

   FrequentistCalculator *  fcalc  = new FrequentistCalculator(*data, *sbModel, *bModel);
   fcalc->SetToys(1000,1000);    // 1000 for null (B) and 1000 for alt (S+B) 

   // create the test statistics
   ProfileLikelihoodTestStat profll(*sbModel->GetPdf());
   // use one-sided profile likelihood

   // configure  ToyMCSampler and set the test statistics
   ToyMCSampler *toymcs = (ToyMCSampler*)fcalc->GetTestStatSampler();

    // run the test
    HypoTestResult * r = fcalc->GetHypoTest();

    // plot test statistic distributions
    HypoTestPlot * plot = new HypoTestPlot(*r);

Optionally you can make, when using the shape model (Exponential background + Gaussian peak) a mass scan to get the p0 values as function of the mass. You loop on different mass values and then you run for each one of them the hypothesis test and you plot the p0 value (the null p-value) obtained from the hypothesis test.

We can run also the asymptotic calculator to get the asymptotic p-value but also the expected one.

   vector<double> masses;
   vector<double> p0values;
   vector<double> p0valuesAsymptotic;
   for( double mass=massMin; mass<=massMax; mass += (massMax-massMin)/5.0 )
      cout << endl << endl << "Running for mass: " << mass << endl << endl;
      w->var("mass")->setVal( mass );

      HypoTestResult* freqCalcResult = fcalc->GetHypoTest();
      HypoTestPlot p( *freqCalcResult );
      p.SetLogYaxis( true );
      c->SaveAs( "p0Plot.pdf" );
      masses.push_back( mass );
      p0values.push_back( freqCalcResult->NullPValue() );

      // asymptotic p-value and significance
      HypoTestResult* asymCalcResult = asymCalc->GetHypoTest();
      p0valuesAsymptotic.push_back( asymCalcResult->NullPValue()  );
      double expectedP0 = AsymptoticCalculator::GetExpectedPValues(  asymCalcResult->NullPValue(),  asymCalcResult->AlternatePValue(), 0, false);
      p0valuesExpected.push_back( expectedP0 );


-- LorenzoMoneta - 26-Apr-2012

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2012-08-16 - LorenzoMoneta
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    RooStats All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback