RooStats Exercises

In these exercises we will learn to use the RooStats calculators to perform a statistical analysis on a model. We will use two types of models:

  • Observable-based analysis. This is the model we have used in the RooFit exercise 2. It is composed of 50 Gaussian-distributed signal event over ~ 1000 exponential-distributed background. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.5. We assume that the location and width of the signal are known. We can re-use the macro from the RooFit exercise to create the model.
  • Poisson model: let's assume you observe n events in data while expecting b background event. We consider some systematic uncertainties in the background model value, b. We express this uncertainty as Gaussian.
Optionally we can use a log-normal or a a Poisson/Gamma distribution for expressing this uncertainty. We could also extend the model adding other systematic uncertainties.

For these model we compute:

  • 68% CL 2-sided confidence interval from the (profiled-) likelihood ratio.
  • 68% credible interval using the MCMCCalculator or the Bayesian Calculator (based on numerical integration). On the Poisson model we will compute the 95% Bayesian upper limit.

For running the RooStats calculators we read the workspace from the ROOT file. All the Roostats calculators take as input a data set and a ModelConfig object. It is therefore convenient to save the workspace (with model and data) in a ROOT file and then read it afterwards for running the calculators.

If you have problem creating the code for running the calculators, you can run the provided solution or just the standard Roostats tutorials (in $ROOTSYS/tutorials/roostats)

Tutorials you could use are:

  • StandardProfileLikelihoodDemo.C
  • StandardBayesianNumericalDemo.C
  • StandardBayesianMCMCDemo.C

They can be run as following from the Root prompt by passing the workspace file name (e.g. GausExpModel.root ), workspace name, model config name and data set name:

 
root [0] .L $ROOTSYS/tutorials/roostats/StandardProfileLikelihoodDemo.C
root [1] StandardProfileLikelihoodDemo("SPlusBExpoModel.root", "w", "ModelConfig", "data" )

Other tutorials go into more details to create various models and run the calculators. See them all at http://root.cern.ch/root/html/tutorials/roostats/index.html.

Exercise 1: Create a ModelConfig from the Gaussian plus Exponential Model used in the RooFit Exercise 2.

We show first how to create the ModelConfig object which we need to import into the workspace in order to be able to run later the RooStats calculators. We need to specify in the ModelConfig:

  • the p.d.f of the model
  • the parameters of interest (in this case the number of signal events)
  • the nuisance parameters (the number of background events and the exponential shape parameter )
  • the observables (in this case x, the reconstructed mass)

We will see later, that if we have a model where some external knowledge on a parameter is introduced, we would need to define the model global observables.

Below is how we create a ModelConfig for the Gaussian signal plus Exponential background model:

RooStats::ModelConfig mc("ModelConfig",&w);
mc.SetPdf(*pdf);
mc.SetParametersOfInterest(*w.var("nsig"));
mc.SetObservables(*w.var("x"));
// define set of nuisance parameters
w.defineSet("nuisParams","a,nbkg");

mc.SetNuisanceParameters(*w.set("nuisParams"));

// import the ModelConfig in the workspace
w.import(mc); 

// write workspace in the file 
w.writeToFile("GausExpModel.root"); 
These lines must be included at the end of the macro we used for exercise 2 (they are already present in the solution).

Exercise 2: Profile Likelihood Calculator

In this exercise we will see how to run the RooStats Profile Likelihood calculator on the Gaussian signal plus Exponential background model. We will create a ROOT macro, doing the following:

  • Read the workspace from the file and get a pointer to the ModelConfig and data set objects.
  • Create the Profile likelihood calculator class
  • Run the calculator by getting the interval at the corresponding desired confidence level
  • Plot the interval and the profile likelihood function

Here is the code to read the workspace from the file. Remember to add at the beginning of the macro using namespace RooStats;.
using namespace RooStats;

void SimplePL(  
   const char* infile =  "GausExpModel.root", 
   const char* workspaceName = "w",
   const char* modelConfigName = "ModelConfig",
   const char* dataName = "data" )
{

  // open input file 
  TFile *file = TFile::Open(infile);
  if (!file) return;

  // get the workspace out of the file
  RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);


  // get the modelConfig out of the file
  RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName);

  // get the modelConfig out of the file
  RooAbsData* data = w->data(dataName);
We create now the ProfileLikelihoodCalculator class passing the =ModelConfig object and a data set object (RooAbsData):
ProfileLikelihoodCalculator pl(*data,*mc);
Now the calculator has all information and can compute the interval. We need to set first the desired confidence level and then call GetInterval(…):
pl.SetConfidenceLevel(0.683); // 68% interval
LikelihoodInterval* interval = pl.GetInterval();
We can then retrieve the actual interval lower/upper value for the desired parameter (e.g. the parameter of interest):
// find the iterval on the first Parameter of Interest
RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();

double lowerLimit = interval->LowerLimit(*firstPOI);
double upperLimit = interval->UpperLimit(*firstPOI);


cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
    lowerLimit << ", "<<
    upperLimit <<"] "<<endl;
At the end we can plot the interval and the profiled likelihood function in a ROOT Canvas using the LikelihoodIntervalPlot class:
LikelihoodIntervalPlot * plot = new LikelihoodIntervalPlot(interval);
plot->SetRange(0,150);  // change range to have interval visible (default range is variable range)
plot->Draw(); 
Note that for some complex model, which require time to evaluate, it can be convenient to use less points and also use the "tf1" option
plot->SetPoints(50);
plot->Draw("tf1"); 

using namespace RooStats;

void SimplePL(  
   const char* infile =  "HggModel.root", 
   const char* workspaceName = "w",
   const char* modelConfigName = "ModelConfig",
   const char* dataName = "obsData" )
{
  /////////////////////////////////////////////////////////////
  // First part is just to access the workspace file 
  ////////////////////////////////////////////////////////////

  // open input file 
  TFile *file = TFile::Open(infile);
  if (!file) return;

  // get the workspace out of the file
  RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);


  // get the modelConfig out of the file
  RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName);

  // get the modelConfig out of the file
  RooAbsData* data = w->data(dataName);


  ProfileLikelihoodCalculator pl(*data,*mc);
  pl.SetConfidenceLevel(0.683); // 68% interval
  LikelihoodInterval* interval = pl.GetInterval();

   // find the iterval on the first Parameter of Interest
  RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();

  double lowerLimit = interval->LowerLimit(*firstPOI);
  double upperLimit = interval->UpperLimit(*firstPOI);


  cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<     lowerLimit << ", "<<     upperLimit <<"] "<<endl;   LikelihoodIntervalPlot * plot = new LikelihoodIntervalPlot(interval); //   if (TString(infile) == "GausExpModel.root") //      plot->SetRange(0,150);  // possible eventually to change ranges
  //plot->SetNPoints(50);  // do not use too many points, it could become very slow for some models
  plot->Draw("tf1");  // use option TF1 if too slow (plot.Draw("tf1")
  
}

  • Profile Likelihood result on Gaussian signal plus Exponential background model:

68% interval on nsig is : [60.3382, 97.0782] 
Error: (3) can't find PLInterval.png in RooStats

Exercise 3a: Bayesian MCMC Calculator

In this exercise we will learn to use the RooStats Bayesian calculators. We have the choice between using the MCMC calculator or the numerical one. We will start using the MCMC and optionally in the next exercise we will show how to use the numerical Bayesian calculator. We will use the same model used before for the Profile Likelihood calculator. What we need to do is very similar to the previous exercise:

  • Read the model from the file as in the previous exercise
  • Create the MCMCCalculator class
  • Configure the calculator setting the proposal function, number of iterations, etc..
  • Call the GetInterval function to compute the desired interval. The returned MCMCInterval class will contained the resulting Markov-Chain.
  • Plot the result using the MCMCIntervalPlot class

Assuming we have already the code for reading a model from the previous exercise, we create the MCMCCalculator class passing the =ModelConfig object and a data set object (RooAbsData) exactly as we did for the ProfileLikelihoodCalculator:
MCMCCalculator mcmc(*data,*mc);
mcmc.SetConfidenceLevel(0.683); // 683% interval
Configure the calculator.
  • We set the proposal function and the number of iterations. As proposal we use the SequentialProposal function. This function samples each parameter independently using a gaussian pdf with a sigma given by a fraction of the parameter range. We use in this case a value of 0.1 for the fraction. It is important to check that the acceptance of the proposal is not too low. In that case we are wasting iterations.
// Define proposal function. This proposal function seems fairly robust
SequentialProposal sp(0.1);
mcmc.SetProposalFunction(sp);
Set then the number of iterations. We use a not too large number of iterations to avoid waiting too long in this tutorial example.
// set number of iterations and initial burning steps 
mcmc.SetNumIters(100000);         // Metropolis-Hastings algorithm iterations
mcmc.SetNumBurnInSteps(1000);       // first N steps to be ignored as burn-in
Define the interval we want to compute. We will compute a central interval but we can compute
  • a shortest interval (default and the only one defined in a multi-dimensional posterior)
  • central interval (equal probability on both the left and right side)
  • one sided (lower or upper limits)
mcmc.SetLeftSideTailFraction(0.5); // 0.5 for central Bayesian interval  and 0 for upper limit
Run the calculator (obtain the Markov chain and retrieve the actual interval lower/upper value for the desired parameter (e.g. the parameter of interest) by integrating the posterior function:
MCMCInterval* interval = mcmc.GetInterval();

// print out the iterval on the first Parameter of Interest
RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();
cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
             interval->LowerLimit(*firstPOI) << ", "<<
             interval->UpperLimit(*firstPOI) <<"] "<< endl;
At the end we plot the interval and the posterior function in a ROOT Canvas using the MCMCIntervalPlot class:
// make a plot of posterior function
new TCanvas("IntervalPlot");
MCMCIntervalPlot plot(*interval);
plot.Draw();

using namespace RooFit;
using namespace RooStats;

void SimpleMCMC( const char* infile =  "GausExpModel.root", 
                 const char* workspaceName = "w",
                 const char* modelConfigName = "ModelConfig",
                 const char* dataName = "data" )
{
  /////////////////////////////////////////////////////////////
  // First part is just to access the workspace file 
  ////////////////////////////////////////////////////////////

  // open input file 
  TFile *file = TFile::Open(infile);
  if (!file) return;

  // get the workspace out of the file
  RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);


  // get the modelConfig out of the file
  RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName);

  // get the modelConfig out of the file
  RooAbsData* data = w->data(dataName);

  RooRealVar * nsig = w->var("nsig"); 
  if (nsig) nsig->setRange(0,200);

  MCMCCalculator mcmc(*data,*mc);
  mcmc.SetConfidenceLevel(0.683); // 68.3% interval

  // Define proposal function. This proposal function seems fairly robust
  SequentialProposal sp(0.1);
  mcmc.SetProposalFunction(sp);

  // set number of iterations and initial burning steps 
  mcmc.SetNumIters(100000);         // Metropolis-Hastings algorithm iterations
  mcmc.SetNumBurnInSteps(1000);       // first N steps to be ignored as burn-in

  // default is the shortest interval
  // here we use central interval
  mcmc.SetLeftSideTailFraction(0.5); // for central Bayesian interval
  //mcmc.SetLeftSideTailFraction(0); // for one-sided Bayesian interval

  // run the calculator
  MCMCInterval* interval = mcmc.GetInterval();

  // print out the iterval on the first Parameter of Interest
  RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();
  cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<     interval->LowerLimit(*firstPOI) << ", "<<     interval->UpperLimit(*firstPOI) <<"] "<<endl;


  // make a plot of posterior function
  new TCanvas("IntervalPlot");
  MCMCIntervalPlot plot(*interval);
  plot.Draw();

}

  • MCMC Bayesian result on Gaussian signal plus Exponential background model:

Metropolis-Hastings progress: ....................................................................................................
[#1] INFO:Eval -- Proposal acceptance rate: 10.469%
[#1] INFO:Eval -- Number of steps in chain: 10469
68% interval on nsig is : [61.245, 98.2044] 
Error: (3) can't find MCMCInterval.png in RooStats

Exercise 3b: Bayesian Numerical Calculator

We can try now to use also the numerical Bayesian calculator on the same model. As before we create the class and configure it setting for example the integration type or the number of iterations. Since the numerical integration can be tricky, it is better in this case to set before a reasonable range from the nuisance parameters, which will be integrated to obtain the marginalised posterior function. A sensible choice is to use for example an interval using 10-sigma around the best fit values for the nuisance parameters.

Note that we have not defined a prior distribution for the parameter of interest (the number of signal events). If a prior is not defined in the model, it is assumed that the prior distribution for the parameter of interest is the uniform distribution in the parameter range.

Again we create the BayesianCalculator class passing the =ModelConfig object and a data set object, RooAbsData:
BayesianCalculator bayesianCalc(*data,*mc);

// define reasanable range for nuisance parameters ( +/- 10 sigma around fitted values) to facilitate the integral
RooArgList nuisPar(*mc->GetNuisanceParameters()); 
for (int i = 0; i < nuisPar.getSize(); ++i) { 
   RooRealVar & par = (RooRealVar&) nuisPar[i];
   par.setRange(par.getVal()-10*par.getError(), par.getVal()+10*par.getError());
}

We can then configure the calculator:

// set the type of interval (not really needed for central which is the default)
bayesianCalc.SetLeftSideTailFraction(0.5); // for central interval
  //bayesianCalc.SetShortestInterval(); // for shortest interval
bayesianCalc.SetConfidenceLevel(0.683); // 68% interval

// set the integration type (default is "ADAPTIVE")
// possible alternative values are  "VEGAS" , "MISER", or "PLAIN"  but they require  libMathMore 
// or  "TOYMC" (toy MC integration, work when nuisances exist and they have a constraints pdf)
bayesianCalc.SetIntegrationType("TOYMC"); 

// limit the number of iterations to speed-up the integral 
bayesianCalc.SetNumIters(1000);

// compute interval by scanning the posterior function
// it is done by default when computing shortest intervals
bayesianCalc.SetScanOfPosterior(50);
We can then run the calculator and retrieve the result. The plot of the posterior can be retrieved directly from the class as a RooPlot object:
SimpleInterval* interval = bayesianCalc.GetInterval();
if (!interval) { 
     cout << "Error computing Bayesian interval - exit " << endl;
     return;
}

double lowerLimit = interval->LowerLimit();
double upperLimit = interval->UpperLimit();

RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();
cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
             lowerLimit << ", "<<
             upperLimit <<"] "<<endl;

// draw plot of posterior function
RooPlot * plot = bayesianCalc.GetPosteriorPlot();
if (plot) plot->Draw();  

<span class='twikiAlert'>
     Error: File attachment at https://twiki.cern.ch/twiki/pub/RooStats/RooStatsExercisesMarch2015/SimpleBayes.C,%PARAM2% does not exist 
</span>

  • Numerical Bayesian result on Gaussian signal plus Exponential background model:

[#1] INFO:Eval -- BayesianCalculator::GetInterval - found a valid interval : [61.01 , 97.9247 ]

68% interval on nsig is : [61.01, 97.9247] 
Error: (3) can't find BayesInterval.png in RooStats

Exercise 4: Create a Poisson Counting Model

We create now a counting model based on Poisson Statistics. Let's suppose we observe nobs events when we expect nexp, where nexp = s+b (s is the number of signal events and b is the number of background events. The expected distribution for nexp is a Poisson : Poisson( nobs | s+b). s is the parameter we want to estimate or set a limit (the parameter of interest), while b is the nuisance parameter. b is not known exactly, but has uncertainty sigmab. The nominal value of b is = b0=. To express this uncertainty we add in the model a Gaussian constraint. We can interpret this term as having an additional measurement b0 with an uncertainty sigmab: i.e. we have a likelihood for that measurement =Gaussian( b0 | b, sigma). In case of Bayesian statistics we can interpret the constraint as a prior knowledge on the parameter b.

  • Make first the RooFit workspace using its factory syntax. Let's assume nobs=3, b0=1, sigmab=0.2.
  • Create the ModelConfig object and import in the workspace. We need to add in the ModelConfig also b0 as a "global observable=. The reason is that b0 needs to be treated as an auxiliary observable in case of frequentist statistics and varied when tossing pseudo-experiments.

We make first the RooFit workspace using its factory syntax. We have nobs=3, b0=1, sigmab=0.2.
int nobs = 3;   // number of observed events
double b0 = 1; // number of background events
double sigmab = 0.2;   // relative uncertainty in b

RooWorkspace w("w");

// make Poisson model * Gaussian constraint
w.factory("sum:nexp(s[3,0,15],b[1,0,10])");
// Poisson of (n | s+b)
w.factory("Poisson:pdf(nobs[0,50],nexp)");
// Gaussian constraint to express uncertainty in b
w.factory("Gaussian:constraint(b0[1,0,10],b,sigmab[0.2])");
// the total model p.d.f will be the product of the two
w.factory("PROD:model(pdf,constraint)");
We need also to define b0, the global observable, as a constant variable (this is needed when we fit the model). However, b0 often needs to have a range. w.var("b0")->setConstant(true);

After creating the model p.d.f we create the ModelConfig object and we set b0 as the "global observable=

ModelConfig mc("ModelConfig",&w);
mc.SetPdf(*w.pdf("model"));
mc.SetParametersOfInterest(*w.var("s"));
mc.SetObservables(*w.var("nobs"));
mc.SetNuisanceParameters(*w.var("b"));
// need now to set the global observable
mc.SetGlobalObservables(*w.var("b0"));
// this is needed for the hypothesis tests
mc.SetSnapshot(*w.var("s"));

// import model in the workspace 
w.import(mc);

We generate then an hypothetical observed data set. Since we have a counting model, the data set will contain only one event and the observable nobs will have our desired value (i.e. 3). Remember to import the data set in the workspace and then save the workspace in a ROOT file.

// make data set with the namber of observed events
RooDataSet data("data","", *w.var("nobs"));
w.var("nobs")->setVal(3);
data.add(*w.var("nobs") );
// import data set in workspace and save it in a file
w.import(data);
w.writeToFile("ComtingModel.root", true);

using namespace RooFit;
using namespace RooStats;

void CountingModel(  int nobs = 3,           // number of observed events
                     double b = 1,           // number of background events
                     double sigmab = 0.2 )   // relative uncertainty in b
{
   RooWorkspace w("w");
   
// make Poisson model * Gaussian constraint
   w.factory("sum:nexp(s[3,0,15],b[1,0,10])");
// Poisson of (n | s+b)
   w.factory("Poisson:pdf(nobs[0,50],nexp)");
   w.factory("Gaussian:constraint(b0[0,10],b,sigmab[1])");
   w.factory("PROD:model(pdf,constraint)");


   w.var("b0")->setVal(b);
   w.var("b0")->setConstant(true); // needed for being treated as global observables
   w.var("sigmab")->setVal(sigmab*b);  
   

   ModelConfig mc("ModelConfig",&w);
   mc.SetPdf(*w.pdf("model"));
   mc.SetParametersOfInterest(*w.var("s"));
   mc.SetObservables(*w.var("nobs"));
   mc.SetNuisanceParameters(*w.var("b"));

   // these are needed for the hypothesis tests
   mc.SetSnapshot(*w.var("s"));
   mc.SetGlobalObservables(*w.var("b0"));

   mc.Print();
   // import model in the workspace 
   w.import(mc);

   // make data set with the namber of observed events
   RooDataSet data("data","", *w.var("nobs"));
   w.var("nobs")->setVal(3);
   data.add(*w.var("nobs") );
   // import data set in workspace and save it in a file
   w.import(data);

   w.Print();

   TString fileName = "CountingModel.root"; 

   // write workspace in the file (recreate file if already existing)
   w.writeToFile(fileName, true);

}

Pointing hand Suppose you want to define the uncertainty in the background as log-normal. How do you do this ?

Pointing hand Suppose instead the background is estimated from another counting experiments. How do you model this ? (This model is often call on/off model)

You can look at this past RooStats exercise (RooStatsTutorialsAugust2012) for these solutions.

Exercise 5: Compute a 95% Upper Limit with the Counting Model

Using the previously defined Counting model we can compute a 95% upper limit on the signal parameter s. We could use :

  • The Bayesian Calculator or the MCMC Calculator
  • The Profile Likelihood Calculator

Suppose we use the Bayesian calculator:

In this case we can re-use the macro SimpleBayes.C. We just need to change these lines, setting the tip elf interval and give as input the right ROOT file containing the workspace.
//bayesianCalc.SetConfidenceLevel(0.68); // 68% interval
bayesianCalc.SetConfidenceLevel(0.95); // 95% interval

// set the type of interval 
//bayesianCalc.SetLeftSideTailFraction(0.5); // for central interval
bayesianCalc.SetLeftSideTailFraction(0.); // for upper limit

This is the result we will obtain on the Bayesian calculator:

  • Numerical Bayesian 95% Upper limit result on Counting Poisson model

[#1] INFO:Eval -- BayesianCalculator::GetInterval - found a valid interval : [0 , 6.80149 ]

95% interval on s is : [0, 6.80149] 
BayesUpperLimit.png
Posterior function for the counting model (nobs = 3, b = 1) obtained using the BayesianCalculator

Pointing hand Try to use the Profile Likelihood Calculator. Which result do you obtain ?

Pointing hand We can try to use a different prior on the signal instead of the default uniform in the case of the Bayesian calculator. For example we can try the Jeffrey reference prior, P(s) = 1./sqrt(s).

For using a different signal prior than the uniform, we need to define the prior pdf in the workspace and add it also in the ModelConfig so the Bayesian RooStats are aware of its existence:
// define as a pdf based on an expression
w.factory("EXPR::prior_s('1./sqrt(s)',s)");
// define prior in ModelConfig
mc.SetPriorPdf(*w.pdf("prior_s"));

Exercise 6: Compute the significance (Hypothesis Test)

The aim of this exercise is to compute the significance of observing a signal in the Gaussian plus Exponential model. It is better to re-generate the model using a lower value for the number of signal events (e.g. 50), in order not to have a too high significance, which will then be difficult to estimate if we use pseudo-experiments. To estimate the significance, we need to perform an hypothesis test. We want to disprove the null model, i.e the background only model against the alternate model, the background plus the signal. In RooStats, we do this by defining two two ModelConfig objects, one for the null model (the background only model in this case) and one for the alternate model (the signal plus background).

We have defined in the workspace saved in the ROOT file only the signal plus background model. We can create the background model from the signal plus background model by setting the value of nsig=0. We do this by cloning the S+B model and by defining a snapshot in the ModelConfig for nsig with a different value:

ModelConfig*  sbModel = (RooStats::ModelConfig*) w->obj(modelConfigName);
RooRealVar* poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first();
// define the S+B snapshot (this is used for computing the expected significance)
poi->setVal(50);
sbModel->SetSnapshot(*poi);
// create B only model
ModelConfig * bModel = (ModelConfig*) sbModel->Clone();
bModel->SetName("B_only_model");      
poi->setVal(0);
bModel->SetSnapshot( *poi  );

Using the given models plus the observed data set, we can create a RooStats hypothesis test calculator to compute the significance. We have the choice between

  • Frequentist calculator
  • Hybrid Calculator
  • Asymptotic calculator

The first two require generate pseudo-experiment, while the Asymptotic calculator, requires only a fit to the two models. For the Frequentist and the Hybrid calculators we need to choose also the test statistics. We have various choices in RooStats. Since it is faster, we will use for the exercise the Asymptotic calculator. The Asymptotic calculator it is based on the Profile Likelihood test statistics, the one used at LHC (see slides for its definition). For testing the significance, we must use the one-side test statistic (we need to configure this explicitly in the class). Summarising, we will do:

  • create the AsymptoticCalculator class using the two models and the data set;
  • run the test of hypothesis using the GetHypoTest function.
  • Look at the result obtained as a HypoTestResult object
AsymptoticCalculator   ac(*data, *sbModel, *bModel);
ac.SetOneSidedDiscovery(true);  // for one-side discovery test

// this run the hypothesis test and print the result
HypoTestResult * result = ac.GetHypoTest();
result->Print();

<span class='twikiAlert'>
     Error: File attachment at https://twiki.cern.ch/twiki/pub/RooStats/RooStatsExercisesMarch2015/SimpleHypoTest.C,%PARAM2% does not exist 
</span>

  • Result of the significance test on the Gaussian Signal plus Background model (nsig=50, nbckg=1000):

Results HypoTestAsymptotic_result: 
 - Null p-value = 0.00118838
 - Significance = 3.0386
 - CL_b: 0.00118838
 - CL_s+b: 0.502432
 - CL_s: 422.787

Pointing hand Run the Frequentist calculator

For the Frequentist calculator we need to :
  • set the test statistic we want to use
  • set the number of toys for the null and alternate model
FrequentistCalculator   fc(*data, *sbModel, *bModel);
fc.SetToys(2000,500);    // 2000 for null (B) and 500 for alt (S+B) 

// create the test statistics
ProfileLikelihoodTestStat profll(*sbModel->GetPdf());
// use one-sided profile likelihood
profll.SetOneSidedDiscovery(true);

// configure  ToyMCSampler and set the test statistics
ToyMCSampler *toymcs = (ToyMCSampler*)fc.GetTestStatSampler();
toymcs->SetTestStatistic(&profll);
  
if (!sbModel->GetPdf()->canBeExtended())
   toymcs->SetNEventsPerToy(1);
 
We run now the hypothesis test
HypoTestResult * fqResult = fc.GetHypoTest();
fqResult->Print();
And we can also plot the test statistic distributions obtained from the pseudo-experiments for the two models. We use the HypTestPlot class for this.
// plot test statistic distributions
HypoTestPlot * plot = new HypoTestPlot(*fqResult);
plot->SetLogYaxis();
plot->Draw();

Use the macro SimpleHypoTest.C remove the return statement in the middle and run it again

  • Result of the significance test on the Gaussian Signal plus Background model (nsig=50, nbckg=1000) using the Frequentist calculator:

Results HypoTestCalculator_result: 
 - Null p-value = 0.001 +/- 0.000706753
 - Significance = 3.09023 +/- 0.2099 sigma
 - Number of Alt toys: 500
 - Number of Null toys: 2000
 - Test statistic evaluated on data: 4.61654
 - CL_b: 0.001 +/- 0.000706753
 - CL_s+b: 0.496 +/- 0.02236
 - CL_s: 496 +/- 351.262
Error: (3) can't find FrequentistHTResult.png in RooStats

Exercise 6b: Compute significance (p-value) as function of the signal mass

As an optional exercise to lear more about computing significance in RooStats, we will study the significance (or equivalent the p-value) we obtain in our Gaussian signal plus exponential background model as function of the signal mass hypothesis. For doing this we perform the test of hypothesis, using the AsymptoticCalculator, for several mass points. We can then plot the obtained p-value for the null hypothesis (p0). We will also estimate the expected significance for our given S+B model as function of its mass. The expected significance can be obtained using a dedicated function in the AsymptoticCalculator: AsymptoticCalculator::GetExpectedPValues using as input the observed p-values for the null and the alternate models. See the Asymptotic formulae paper for the details.

using namespace RooStats;
using namespace RooFit;

void p0Plot(const char* infile, const char * workspaceName = "w", const char * modelConfigName = "ModelConfig", const char * dataName= "obsData" )  { 


  /////////////////////////////////////////////////////////////
  // First part is just to access the workspace file 
  ////////////////////////////////////////////////////////////

  // Check if example input file exists
  TFile *file = TFile::Open(infile);

  // get the workspace out of the file
  RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);


  // get the modelConfig out of the file
  RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName);

  // get the modelConfig out of the file
  RooAbsData* data = w->data(dataName);

  // get the modelConfig (S+B) out of the file
  // and create the B model from the S+B model
  ModelConfig*  sbModel = (RooStats::ModelConfig*) w->obj(modelConfigName);
  sbModel->SetName("S+B Model");      
  RooRealVar* poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first();
  poi->setVal(200);  // set POI snapshot in S+B model for expected significance
  sbModel->SetSnapshot(*poi);
  ModelConfig * bModel = (ModelConfig*) sbModel->Clone();
  bModel->SetName("B Model");      
  poi->setVal(0);
  bModel->SetSnapshot( *poi  );

  vector<double> masses;
  vector<double> p0values;
  vector<double> p0valuesExpected;

  double massMin = 112;
  double massMax = 158;

  // loop on the mass values 

  int npoints = 30;
  for( double mass=massMin; mass<=massMax; mass += (massMax-massMin)/double(npoints) )   {                    cout << endl << endl << "Running for mass: " << mass << endl << endl;      w->var("mass")->setVal( mass );

     AsymptoticCalculator *  ac  = new AsymptoticCalculator(*data, *sbModel, *bModel);
     ac->SetOneSidedDiscovery(true);  // for one-side discovery test                                      
     AsymptoticCalculator::SetPrintLevel(-1);


     HypoTestResult* asymCalcResult = ac->GetHypoTest();
 
     asymCalcResult->Print();
     
     masses.push_back( mass );
     p0values.push_back( asymCalcResult->NullPValue() );
  
     double expectedP0 = AsymptoticCalculator::GetExpectedPValues(  asymCalcResult->NullPValue(),  asymCalcResult->AlternatePValue(), 0, false);
     p0valuesExpected.push_back( expectedP0 );
     std::cout << "expected p0 = " << expectedP0 << std::endl;   }   TGraph * graph1  = new TGraph(masses.size(),&masses[0],&p0values[0]);   TGraph * graph2  = new TGraph(masses.size(),&masses[0],&p0valuesExpected[0]);   graph1->SetMarkerStyle(20);
  graph1->Draw("APC");
  graph2->SetLineStyle(2);
  graph2->Draw("C");
  graph1->GetXaxis()->SetTitle("mass");
  graph1->GetYaxis()->SetTitle("p0 value");
  graph1->SetTitle("Significance vs Mass");
  graph1->SetMinimum(graph2->GetMinimum());
  graph1->SetLineColor(kBlue);
  graph2->SetLineColor(kRed);
  gPad->SetLogy(true);
  
     
}
Error: (3) can't find p0Plot.png in RooStats

Exercise 7: Compute Limits using Hypothesis Test Inversion (CLs Limits).

In our last exercise we will compute an upper limit using the frequentist hypothesis test inversion method. We need to perform the hypothesis test for different parameter of interest points and compute the corresponding p-values. Since we are interesting in computing a limit, the test null hypothesis, that we want to disprove, is the in this case the S+B model, while the alternate hypothesis is the B only model. It is important to remember this, when we construct the hypothesis test calculator.

For doing the inversion RooStats provides an helper class, HypoTestInverter, which is constructed using one of the HypoTest calculator (Frequentist, Hybrid or Asymptotic). To compute the limit we need to do:

  • Create/read a workspace with S+B and B only model. If the B model does not exist we can create as in the previous exercise (setting the poi to zero).
  • Create an HypoTest calculator but using as null the S+B model and as alt the B only model
  • Configure the calculator if needed (set test statistics, number of toys, etc..)
  • Create the HypoTestInverter class and configure it (e.g. to set the number and range of points to scan, to use CLs for computing the limit, etc…)
  • Run the Inverter using GetInterval which returns an HypoTestInverterResult object.
  • Look at the result (e.g. limit, expected limits and bands)
  • Plot the result of the scan and the test statistic distribution obtained at every point.

Suppose we use the AsymptotiCalculator. If we use the Frequentist is the same, just we need to set also the test statistics and number of toys
// asymptotic calculator
AsymptoticCalculator ac(*data, *bModel, *sbModel);
ac.SetOneSided(true);  // for one-side tests (limits)
AsymptoticCalculator::SetPrintLevel(-1);  // to avoid to much verbosity
We create now the Inverter class and configure it for CLs
// create hypotest inverter 
// passing the desired calculator 
HypoTestInverter calc(ac);

// set confidence level (e.g. 95% upper limits)
calc.SetConfidenceLevel(0.95);
  
// for CLS
calc.UseCLs(true);

We run now the Inverter:

int npoints = 10;  // number of points to scan
// min and max (better to choose smaller intervals)
double poimin = poi->getMin();
double poimax = poi->getMax();
//poimin = 0; poimax=10;

std::cout << "Doing a fixed scan  in interval : " << poimin << " , " << poimax << std::endl;
calc.SetFixedScan(npoints,poimin,poimax);

// run inverter  
HypoTestInverterResult * r = calc.GetInterval();

We can then retrieve from the obtained result object the observed limit value and also the expected limit value and bands (median +/- 1 sigma bands):

double upperLimit = r->UpperLimit();
// estimate of the error on the upper limit
double ulError = r->UpperLimitEstimatedError();

std::cout << "The computed upper limit is: " << upperLimit << " +/- " << ulError << std::endl;
  
// compute expected limit and bands 
std::cout << "Expected upper limits, using the B (alternate) model : " << std::endl;
std::cout << " expected limit (median) " << r->GetExpectedUpperLimit(0) << std::endl;
std::cout << " expected limit (-1 sig) " << r->GetExpectedUpperLimit(-1) << std::endl;
std::cout << " expected limit (+1 sig) " << r->GetExpectedUpperLimit(1) << std::endl;

Plot now the result of the scan using the HypoTestInverterPlot class:

HypoTestInverterPlot *plot = new HypoTestInverterPlot("HTI_Result_Plot","Feldman-Cousins Interval",r);

// plot in a new canvas 
TCanvas * c1 = new TCanvas("HypoTestInverter Scan"); 
c1->SetLogy(false);

plot->Draw("CLb 2CL");  // plot also CLb and CLs+b 
 //plot->Draw("OBS");  // plot only observed p-value

If we were using toys (e.g. FrequentistCalculator), we could plot in a different ROOT Canvas the test statistics distributions.

// plot test statistics distributions for the two hypothesis
// when distribution is generated (case of FrequentistCalculators)
const int n = r->ArraySize();
if (n> 0 &&  r->GetResult(0)->GetNullDistribution() ) {  
      TCanvas * c2 = new TCanvas("Test Statistic Distributions","",2);
      if (n > 1) {
         int ny = TMath::CeilNint( sqrt(n) );
         int nx = TMath::CeilNint(double(n)/ny);
         c2->Divide( nx,ny);
      }
      for (int i=0; i<n; i++) {
         if (n > 1) c2->cd(i+1);
         SamplingDistPlot * pl = plot->MakeTestStatPlot(i);
         pl->SetLogYaxis(true);
         pl->Draw();
      }
   }

<span class='twikiAlert'>
     Error: File attachment at https://twiki.cern.ch/twiki/pub/RooStats/RooStatsExercisesMarch2015/SimpleHypoTestInv.C,%PARAM2% does not exist 
</span>
Error: (3) can't find AsymptoticCls.png in RooStats

Exercise 7b: Using the StandardHypoTestInvDemo.C tutorial

This is a macro in $ROOTSYS/tutorials/roostats which can run the hypothesis test inversion with several possible options. We can find the tutorial code also here. Suppose we want to compute the 95% CLs limit on the parametric unbinned model (Gaussian plus Exponential background), that we have used in the significance exercise:

 
root [0] .L $ROOTSYS/tutorials/roostats/StandardHypoTestInvDemo.C+
root [1] StandardHypoTestInvDemo("GausExpModel.root", "w", "ModelConfig", "", "data", 2, 3, true, 20, 0, 150 )
The macro will print out the observed limit, expected limit and +/1,2 sigma bands/ It will also produce a p-value (CLs in this case) scan plot.

Next we can run the Frequentist calculation (it will take considerably more time, but we use only 500 toys):

 
root [1] StandardHypoTestInvDemo("GausExpModel.root", "w", "ModelConfig", "", "data", 0, 3, true, 10, 0, 100, 500, false )
Since we are using toys to estimate the test statistics distribution, the macro will produce the sampling distribution plots for each scanned point.

Note that, since we are running on the number counting model, we need to set to true the last option

 
root [1] StandardHypoTestInvDemo("CountingModel.root", "w", "ModelConfig", "", "data", 0, 3, true, 10, 0, 10, 500, true )
For reference, here is the list of input arguments to the macro:
StandardHypoTestInvDemo(const char * infile = 0,
                        const char * wsName = "combined",
                        const char * modelSBName = "ModelConfig",
                        const char * modelBName = "",
                        const char * dataName = "obsData",                 
                        int calculatorType = 0,
                        int testStatType = 0, 
                        bool useCLs = true ,  
                        int npoints = 6,   
                        double poimin = 0,  
                        double poimax = 5, 
                        int ntoys=1000,
                        bool useNumberCounting = false,
                        const char * nuisPriorName = 0){
Where we have for the calculatorType:
  • type = 0 Freq calculator
  • type = 1 Hybrid calculator
  • type = 2 Asymptotic calculator

  • testStatType = 0 LEP
  • = 1 Tevatron
  • = 2 Profile Likelihood
  • = 3 Profile Likelihood one sided (i.e. = 0 if mu < mu_hat)

  • useCLs : scan for CLs (otherwise for CLs+b)

  • npoints: number of points to scan , for autoscan set npoints = -1

  • poimin,poimax: min/max value to scan in case of fixed scans (if min > max, try to find automatically)

  • ntoys: number of toys to use

  • useNumberCounting: set to true when using number counting events

  • nuisPriorName: name of prior for the nnuisance. This is often expressed as constraint term in the global model

-- LorenzoMoneta - 2015-03-23

Edit | Attach | Watch | Print version | History: r5 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2015-03-23 - LorenzoMoneta
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    RooStats All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback