RooStats tutorials

A first set of RooStats tutorials were given at CERN on October 15-16th 2009: (agenda: http://indico.cern.ch/conferenceDisplay.py?confId=68912). Below is given some materials used during the RooStats hands-on sessions. The RooFit tutorials are available here.

Short introduction to RooStats

RooStats is a software package relying on RooFit for the description of model and data which in turn provides utilities for fitting, plotting, Monte-Carlo studies, etc. On top of RooFit, the RooStats package provides:

  • tools to perform a statistical data analysis with various approaches coded in:
    • ProfileLikelihoodCalculator,
    • HybridCalculator,
    • HypoTestInvertor,
    • BayesianCalculator,
    • MCMCCalculator,
    • NeymanConstruction and FeldmanCousins,
    • etc.
  • and utilities such as:
    • HLFactory, ModelConfig, BernsteinCorrection, SPlot, etc.

The RooStats package has been designed in common between Atlas (K. Cranmer), CMS (G. Schott), RooFit (W. Verkerke) and ROOT (L. Moneta) and is available from the ROOT releases since version 5.22. As of October 2009, the project is getting maturity, most of the for seen methods are included in ROOT version 5.25.02 while a few more developments are still planned. The link to the project web page is: https://twiki.cern.ch/twiki/bin/view/RooStats/WebHome

Getting started with the software

A few advice words before starting. Remember to:

  • work locally on your laptop (if possible) to reduce the WLAN usage,
  • compile your macros and run with: .L macro.C+ then macro();,
  • include: using namespace RooFit; using namespace RooStats;,
  • avoid running a macro multiple times in the same ROOT CINT sessions (it might unfortunately crash sometimes if you do).

Setting ROOT 5.25.02 using the CERN AFS installation:

  • on SLC4: /afs/cern.ch/sw/lcg/app/releases/ROOT/5.25.02/slc4_amd64_gcc34/root/bin/thisroot.csh or .sh
  • on SLC5: /afs/cern.ch/sw/lcg/app/releases/ROOT/5.25.02/x86_64-slc5-gcc34-opt/root/bin/thisroot.csh or .sh

Some links to documentation:

Suggestions of exercises for this tutorial

Below will given code snippets showing how to try different analysis models:

  • Poisson likelihood model: let's assume you observe 3 events in data while expecting 1 background event. We do not consider systematic uncertainties to start with. This problem could in principle be solved analytically but we'll use RooStats to solve it (we'll later make this analysis more complicated and then getting an analytical solution is not anymore easy while it is still with RooStats).
  • Observable-based analysis: 10 Gaussian-distributed signal event over 100 flat-distributed background. The "mass" observable is distributed in range [0 - 500] and the signal has mean 250 and sigma 15. The prior is assumed Gaussian distributed.
  • Assume on the above model a systematic uncertainty: 50% uncertainty on the background yield and/or 33% uncertainty on the sigma in the signal Gaussian model
  • Covered tomorrow: combination of analyses

Compute:

  • 68% CL 2-sided confidence interval and significance from the (profiled-) likelihood ratio
  • 95% CL upper-limit with the (profile-) likelihood method
  • Frequentist p-value in the S+B and B-only hypotheses, signal significance, CL_S ratio
  • Covered tomorrow: HypoTestInvertor and BayesianCalculator

RooStats tutorials macro (an updated version is provided below) give the solution. They work in two steps:

  • First specify the model; run for example with: rs500d_PrepareWorkspace_GaussOverFlat_withSystematics("WS_GaussOverFlat_withSystematics.root",0)
    • Different types of data generation are covered: binned/unbinned distributions, with Poisson-fluctuating or fixed (total or bin-by-bin) number of events
  • Secondly choose one of the macros applying statistical methods; for example: rs501_ProfileLikelihoodCalculator_limit("WS_GaussOverFlat_withSystematics.root")

Tutorials you could use are:

  • rs500a to rs500d for the model PDF specification,
  • rs501 and rs502 for the likelihood approach,
  • rs505 for the classical hypothesis test,
  • rs701 for the Bayesian credibility interval (tomorrow),
  • rs801 for classical upper limit (HypoTestInvertor class).
Other tutorials go into more details but those above have been optimized to make them straightforward.

The updated version of the above tutorials is available from: http://cern.ch/schott/public/updated_roostats_tutorials.zip On linux, unzip the files with: unzip updated_roostats_tutorials.zip

Specifying your analysis model

The model is handled by RooFit and in the examples below, the RooFit::RooWorkspace factory is used for that. A few examples are given:

Poisson likelihood model

// Use a RooWorkspace to store the PDF models, prior informations, list of parameters, ...
RooWorkspace myWS("myWS");

// Number of signal and background events
myWS.factory("S[2,0,10]"); // default value 2 and range [0,10]
myWS.factory("B[1]"); // value fixed to 1

// Observable (dummy observable used in case of Poisson likelihood function)
myWS.factory("x[0,1]"); // arbitrary range [0,1] 
myWS.var("x")->setBins(1);

// Signal and background distribution of the observable
myWS.factory("Uniform::sigPdf(x)");
myWS.factory("Uniform::bkgPdf(x)");

// S+B and B-only models (both extended PDFs)
myWS.factory("SUM::model(S*sigPdf,B*bkgPdf");
myWS.factory("ExtendPdf::modelBkg(bkgPdf,B)");

// Set random seed to make the generation reproducible
(RooRandom::randomGenerator())->SetSeed(4357);

// generate binned data with fixed number of events
RooAbsData* data = myWS.pdf("model")->generateBinned(*myWS.var("x"),myWS.var("S")->getVal()+myWS.var("B")->getVal(),Name("data"));

RooDataSetPoisson.png
Data generated from the Poisson likelihood model

Model with one discriminating variable: Gaussian signal over flat background

// Observable
myWS.factory("mass[0,500]"); // range [0,500]

// Signal and background distribution of the observable
myWS.factory("Gaussian::sigPdf(mass,250,15)");
myWS.factory("Uniform::bkgPdf(mass)");

RooDataSetGaussOverFlat.png
Data generated from the GaussOverFlat likelihood model

Adding nuisance parameters

// set B and sigma as a non-constant variable ans specify a range
myWS->var("B")->setConstant(false);
myWS->var("B")->setRange(0,200);
myWS->var("sigSigma")->setConstant(false);
myWS->var("sigSigma")->setRange(0,30);

// Other prior PDFs
myWS.factory("Gaussian::prior_sigSigma(sigSigma,15,5)") ;
myWS.factory("Gaussian::prior_B(B,100,50)");
myWS.factory("PROD::priorNuisance(prior_sigSigma,prior_B)");

Using the RooStats statistical methods

ProfileLikelihoodCalculator

Implement an instance of ProfileLikelihoodCalculator

Specify the data, the model and a list of parameters of interest. Set the size of the confidence interval:

// Set up the ProfileLikelihoodCalculator
ProfileLikelihoodCalculator plc(*data,*model,*POI);
// ProfileLikelihoodCalculator usually make intervals: the 95% CL one-sided upper-limit is the same as the two-sided upper-limit of a 90% CL interval
plc.SetTestSize(0.10);

Run and retrieve the results

Confidence interval:

model->fitTo(*data,SumW2Error(kFALSE));
LikelihoodInterval* interval = plc.GetInterval();

model->fitTo(*data,SumW2Error(kFALSE));
const double lowerLimit = interval->LowerLimit(*parameterOfInterest);
const double upperLimit = interval->UpperLimit(*parameterOfInterest);

Significance:

// Create a copy of the POI parameters to set the values to zero
RooArgSet nullparams;
nullparams.addClone(*parameterOfInterest);
((RooRealVar *) (nullparams.first()))->setVal(0);
plc.SetNullParameters(nullparams);

HypoTestResult* plcResult = plc.GetHypoTest();
const double significance = plcResult->Significance();

Making a plot of the results

LikelihoodIntervalPlot plot(interval);
plot.Draw();

There is currently a problem with profile-likelihood plots in presence of systematic uncertainty; the interval gets out right but the plot doesn't. Below is a patch to get the correct plot until this is fixed in RooStats:

// patch to profile-likelihood plot with systematics
TF1 * tmp = interval->GetLikelihoodRatio()->asTF(*parameterOfInterest);
// clone to sample the PL calculation
TF1 * f1 = (TF1 *) tmp->Clone();
f1->Draw("SAME");

ProfileLikelihoodCalculator UpperLimitPoisson.png
ProfileLikelihoodCalculator 95% CL upper limit

HybridCalculator

HybridCalculator performs sampling of the selected test statistics in both S+B and B-only hypotheses. Then it integrates this distribution to compute the p-values or the CL_S ratio.

Bayesian prior on nuisance parameters are integrated out (using a Monte-Carlo technique).

Other features are not documented here, such as: switching test statistics, merging results run independently on batch, etc.

Implement an instance of HybridCalculator

Specify the data and the null and alternative hypotheses models. More options can be specified later. See the rs201 macro for more details (but not today).

HybridCalculator hc("hc","HybridCalculator",*data,*modelSB,*modelB);
hc.SetNumberOfToys(5000);
hs.SetTestStatistics(1);

Run and retrieve the results

HybridResult* hcResult = hc.GetHypoTest();
double p_value_sb = hcResult->AlternatePValue();
double p_value_b = hcResult->NullPValue();
double cl_s = hcResult->CLs();
double significance = hcResult->Significance();

Making a plot of the results

HybridPlot* hcPlot = hcResult->GetPlot("hcPlot","p-Values plot",100);
hcPlot->Draw();

HybridResultPlot Poisson.png
Distribution of test statistics (likelihood-ratio) with HybridCalculator from the Poisson likelihood model example

HybridResultPlot GaussOverFlat.png
Distribution of test statistics (likelihood-ratio) with HybridCalculator from the GaussOverFlat likelihood model example

HypoTestInvertor

The HypoTestInvertor in ROOT 5.25.02 is using HybridCalculatorResult for different value of the parameter of interest until it identifies the value of the parameter of interest such as the alternate (ie. S+B) hypothesis is excluded at a given confidence level. In the next ROOT version HypoTestInvertor will be extended to also invert hypothesis tests from ProfileLikelihoodCalculator. The inversion is based on the CL_S ratio estimated by each hypothesis test. Usage of CL_SB has been make configurable for the next future release.

Patch needed for ROOT 5.25.02

In the rush for the ROOT release one line was forgotten in the constructor. A minor change fixes it for ROOT 5.25.02 (it has been fixed for the next release).

namespace Temp {
  class HypoTestInvertor : public RooStats::HypoTestInvertor {
  public:
    HypoTestInvertor(const char * name, const char * title, RooStats::HypoTestCalculator * calc, RooRealVar * var) :
      RooStats::HypoTestInvertor(name, title, calc, var){}
    virtual void SetModel( const RooStats::ModelConfig & ){}
    ClassDef(Temp::HypoTestInvertor,1);
  };
}
This patch requires to run macros in compiled mode exclusively.

Configuring the HypoTestInvertor

One starts by instantiating and configuring a HybridCalculator object:

HybridCalculator myhc("myhc","HybridCalculator",*data, totPdf, bkgPdf);
myhc.SetTestStatistics(2);
myhc.SetNumberOfToys(1000);

Implement an instance of HypoTestInvertor

Use the patch version of HypoTestInvertor in the Temp namespace.

Temp::HypoTestInvertor myInvertor("myInvertor","",&myhc,&parameterOfInterest);
myInvertor.SetTestSize(0.05);

Three types of running modes are currently available:

  • run hypothesis tests point-by-point: myInvertor.RunOnePoint(3.9);
  • run hypothesis in an interval for equally-spaced points: myInvertor.RunFixedScan(5,1,6); here the interval is [1,6] and 5 points will be tested
  • run hypothesis in auto mode until a target value is found close to the test size that had been set: myInvertor.RunAutoScan(1,6,0.005); here the interval is [1,6] and points will be tested until one is found with CL_S = 0.050 +/- 0.005.
It is possible to refine a result by running manually further points once you have a rough idea of the upper limit. At least two scan points are needed in order to start estimating the location of the upper/lower limit.

Perform the hypothesis test inversion and plot the results

HypoTestInvertorResult* results = myInvertor.GetInterval();
HypoTestInvertorPlot myInvertorPlot("myInvertorPlot","",results);
TGraph* gr1 = myInvertorPlot.MakePlot();

The following plot show an example of hypothesis test inversion based on CL_SB assuming 1 observed event and 1 expected background event. From the PDG 2008 table 32.3, the upper limit is expected to be 4.74 as was obtained with RooStats (substracting the 1 background event you get 3.74 on the plot below).

HypoTestInvertorPlot.png
HypoTestInvertorPlot overlayed to expected p-value as a function of the signal yield

BayesianCalculator

The BayesianCalculator class implements the Bayesian theorem. It also return a confidence interval for one single parameter of interest

Suggestions for the exercises:

  • You can compare the limits obtained by BayesianCalculator to those from HybridCalculator on the same model for both upper and lower limits.
  • You may also like to compare results with different prior on the parameter of interest (Uniform in S or 1/S, ...)

Implement an instance of HypoTestInvertor

The BayesianCalculator constructor inputs: the data, the PDF model, the list of parameters of interest and their prior PDF on the parameter of interest and nuisance parameters to integrate out in the pdf model.

BayesianCalculator bc(data,*model,RooArgSet(*POI),*priorPOI,&nuisanceParameters);

Compute the credibility interval

Set the confidence level of the credibility interval and compute it. Returns a SimpleInterval.

bc.SetTestSize(0.05);
SimpleInterval* interval = bc.GetInterval();
double lowerLimit = interval->LowerLimit();
double upperLimit = interval->UpperLimit();

Plotting the posterior probability

// In ROOT >5.25.02:
// bc.GetPosteriorPlot()->Draw();

// This is the current plot:
// bcalc.PlotPosterior();

// The code below produce a much nicer plot:
RooAbsPdf* fPosteriorPdf = bcalc.GetPosteriorPdf();
RooPlot* plot = POI->frame();
plot->SetTitle(TString("Posterior probability of parameter \"")+TString(POI->GetName())+TString("\""));  
fPosteriorPdf->plotOn(plot,RooFit::Range(interval->LowerLimit(),interval->UpperLimit(),kFALSE),RooFit::VLines(),RooFit::DrawOption("F"),RooFit::MoveToBack(),RooFit::FillColor(kGray));
fPosteriorPdf->plotOn(plot);
plot->GetYaxis()->SetTitle("posterior probability");
plot->Draw();

BayesianCalculatorPlot.png
BayesianPlot and 90% CL credibility interval

Comparing different methods

Profile Likelihood, Feldman Cousins and MCMC

Download exercises from here: http://cern.ch/cranmer/public/RooStats_CompareIntervals.tar or on lxplus ~cranmer/public/RooStats_CompareIntervals.tar

We can modify the 500d tutorial to have a floating mass, and fit a 2-d scan. This is already done in

  • rs500e_PrepareWorkspace_GaussOverFlat_withSystematics_floatingMass.C
and then we compare three types of intervals (profile likelihood, Bayesian using Markov Chain Monte Carlo, and a Feldman Cousins technique that deals with nuisance parameters via the 'profile construction').
  • rs501_ThreeTypesOfLimits.C
The blue contour is from profile likelihood, the other contour is from Markov Chain Monte Carlo, and the squares are for the points tested by the Feldman Cousins tool.
ThreeLimitsComparison.gif
Comparison of Profile Likelihood (blue), Feldman Cousins (red squares), and MCMC (black contour)

Running the same problem with BAT

A http://www-ekp.physik.uni-karlsruhe.de/~schott/BCRooInterface/ has been developed to the http://www.mppmu.mpg.de/bat/ (BAT). It is possible with it to run on the same problem as described above with Bayesian method using Markov Chain MC. Here is how to install BAT (on lxplus using the tcsh shell):

wget http://www.mppmu.mpg.de/bat/source/BAT-0.3.1.tar.gz
tar xvfz BAT-0.3.1.tar.gz
cd BAT-0.3.1
./configure --prefix=${PWD}
make
make install
setenv BATINSTALLDIR ${PWD}
setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:${BATINSTALLDIR}/lib
setenv CPATH ${BATINSTALLDIR}/include
cd models/RooInterface
make

Then copy the input ROOT file (containing the workspace, ie. WS_GaussOverFlat_withSystematics_floatingMass.root) in the RooInterface directory and run with:

./runRooInterface.exe WS_GaussOverFlat_withSystematics_floatingMass.root myWS
gv bat_plots.ps

We are lucky, it worked out-of-the-box since we have been using the the workspace preparation the same name conventions as the defaults sets in the BAT/RooInterface. Here is one of the plots of the results:

BAT ContourPlot.png
Posterior projection obtained with the BAT package after MCMC integration.

-- GregorySchott - 15-Oct-2009 -- KyleCranmer - 16-Oct-2009 -- GregorySchott - 29-Oct-2009

Topic attachments
I AttachmentSorted ascending History Action Size Date Who Comment
PNGpng BAT_ContourPlot.png r1 manage 138.7 K 2009-10-29 - 15:16 UnknownUser  
PNGpng BayesianCalculatorPlot.png r1 manage 15.6 K 2009-10-15 - 20:53 UnknownUser  
PNGpng HybridResultPlot_GaussOverFlat.png r1 manage 9.1 K 2009-10-16 - 08:57 UnknownUser  
PNGpng HybridResultPlot_Poisson.png r1 manage 8.8 K 2009-10-16 - 08:56 UnknownUser  
PNGpng HypoTestInvertorPlot.png r1 manage 11.4 K 2009-10-15 - 20:10 UnknownUser  
PNGpng ProfileLikelihoodCalculator_UpperLimitPoisson.png r1 manage 9.5 K 2009-10-16 - 08:56 UnknownUser  
PNGpng RooDataSetGaussOverFlat.png r1 manage 10.2 K 2009-10-16 - 08:56 UnknownUser  
PNGpng RooDataSetPoisson.png r1 manage 5.1 K 2009-10-16 - 08:56 UnknownUser  
GIFgif ThreeLimitsComparison.gif r1 manage 8.4 K 2009-10-16 - 13:38 KyleCranmer  
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2009-10-29 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    RooStats All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback