RooStats tutorials

Set of RooStats tutorials given at Bonn (20-22 August 2012) at the school and workshop for limit settings at the LHC, see https://indico.desy.de/conferenceDisplay.py?confId=6083.

Below is given the required materials used during the RooStats tutorials.

Additional material has been showed at the Desy statistical school (see https://indico.desy.de/conferenceOtherViews.py?view=standard&confId=5065)

Introduction to RooStats

See slides attached to the agenda in Indico: https://indico.desy.de/contributionDisplay.py?contribId=1&confId=6083

Getting started with the software

A few advice words before starting. The tutorials requires the ROOT version 5.34.01 which is installed in the Virtual Machine of the school.

           using namespace RooFit;
           using namespace RooStats;
  • avoid running a macro multiple times in the same ROOT CINT sessions (it might unfortunately crash sometimes if you do it).
  • note that Roostats methods start with capital letter (as in ROOT) while Roofit ones start with lower letter *Roostats calculator are quite verbose, you can suppress the output by doing:
RooMsgService::instance().setGlobalKillBelow(RooFit::WARNING) ;

Some links to documentation:

Terminology and Conventions

Here we give pragmatic definitions for a few basic concepts that we will use.
  • observable - something you measure in an experiment, for example a particle's momentum. Often, a function of measured quantities, for example an invariant mass of several particles
  • global observable or auxiliary observable - an observable from another measurement, for example, the integrated luminosity
  • model - a set of probability density functions (PDFs), which describe distributions of the observables or functions of observables
  • model parameter - any variable in your PDF expression, which is not an observable
  • parameter of interest (POI) - a model parameter that you study, for example a cross section of your signal process
  • nuisance parameter - every other model parameter, which is not your parameter of interest
  • data or dataset - a set of values of observables, either measured in an experiment, or simulated
  • likelihood - a model computed for a particular dataset
  • hypothesis - a particular model, with specified observables, POI, nuisance parameters, and prior PDFs (in case of Bayesian inference)
  • prior PDF - a probability density for an observable or a model parameter, which is known a priori, i.e. before a measurement is considered. This is a Bayesian concept exclusively. Prior has no meaning or place in a frequentist type of inference
  • Bayesian - a type of statistical inference that usually produces probability of the hypothesis given the data. Requires a prior.
  • frequentist - a type of statistical inference that usually produces probability of the data given the hypothesis.

Exercises for this tutorial (Part 1)

Below will given code snippets showing how to build some different analysis models and then on how to run some of the Roostats calculators. FIrst it is suggested to build a model (a RooFit workspace) save it in a file and then read it back and run the RooStats code on it. Using the snippets of code, one can build a full ROOT macro which can be run interactively in ROOT or compiled using ACLIC. The models we are going to build are:

  • Observable-based analysis: 30 Gaussian-distributed signal event over ~ 1000 exponential-distributed background. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.5. We assume that the location and width of the signal are known. W
  • Poisson model: let's assume you observe n events in data while expecting b background event. We consider some systematic uncertainties in the background model value, b. We start expressing this uncertainty as Gaussian. We will see also how we can use a log-normal distribution for expressing this uncertainty or extend the model adding other systematic uncertainties.

Compute:

  • 68% CL 2-sided confidence interval from the (profiled-) likelihood ratio. As option one could compute also the significance using the profiled likelihood.
  • 68% credible interval using the MCMCCalculator or the Bayesian Calculator (based on numerical integration). Compute as well the 95% Bayesian upper limit.

We provide code snippet for creating the models, save them in a file and then for running the calculator after having read the workspace from the file. All the Roostats calculators take as input a data set and a Model Config object. It is therefore convenient to save the workspace (with model and data) in a ROOT file and then read it afterwards for running the calculators.

If you have problem creating the code for running the calculators, you can run the Standard Roostats tutorials (in $ROOTSYS/tutorials/roostats)

Tutorials you could use are:

They can be run as following from the Root prompt by passing the workspace file name (e.g. "SPlusBExpoModel.root" ), workspace name, model config name and data set name:

 
   .L $ROOTSYS/tutorials/roostats/StandardProfileLikelihoodDemo.C
   StandardProfileLikelihoodDemo("SPlusBExpoModel.root", "w", "ModelConfig", "data" )

Other tutorials go into more details to create various models and run the calculators. See them all at http://root.cern.ch/root/html/tutorials/roostats/index.html.

Some of the following code here is extracted from exercises macro showed at the Desy Statistical school in April 2012. You can look also at the tar file with the exercises shown at that school,

Create Model with shape: Gaussian peak plus an exponential background (Unbinned Model)

We start by making a model containing a Gaussian peak and an exponential background. The data are unbinned.

First we create the workspace and then the pdf (gaussian and exponential) using the factory.

   RooWorkspace w("w"); 
   w.factory("Exponential:bkg_pdf(x[0,10], a[-0.5,-1000,0])");
   w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.5])");
The mass and the sigma parameter of the signal Gaussian are kept constant. They do not have a range while the exponential parameter is not. Note that if you want, you can re-parametrize the exponential parameter as following (before creating the exponential pdf):
   // use cexpr for compiled expressions
   w.factory("expr:a('(-1./tau)',tau[2,0,100])");
   w.factory("Exponential:bkg_pdf(x[0,10], a)");

We can then sum the two pdf, according to the number of expected events for the signal and the background. Note that the resulting model will be extended, i.e. the total number of events will fluctuate according to a Poisson distribution

   w.factory("SUM:model(nsig[0,1000]*sig_pdf, nbkg[0,1000000]*bkg_pdf)");  // for extended model
Given the model, we can now generate some pseudo data, assuming that nsig = 30 and nbkg=1000. Afterwards we that we can fit this generated data set.

   RooAbsPdf * pdf = w.pdf("model");
   RooRealVar * x = w.var("x");  // the observable


   // generate the data (nsig = 30, nbkg=1000)
   w.var("nsig")->setVal(30);
   w.var("nbkg")->setVal(1000);
   // use fixed random numbers for reproducibility
   RooRandom::randomGenerator()->SetSeed(111);
   RooDataSet * data = pdf->generate( *x);  // will generate accordint to total S+B events
   data->SetName("data");
   w.import(*data);

   data->Print(); 
We can plot the generated data:

   // plot the generate data in 50 bins (default is 100) 
   x->setBins(50);
   RooPlot * plot = x->frame();
   data->plotOn(plot);
   plot->Draw();

We can now fit the data using RooFit

  pdf->fitTo(*data);
If you want to look at the result of the fit, and for example use a different minimizer, you can do:
RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad"));
 r->Print();
We can then plot the fitted function and optionally draw also the two components separately and add the fit parameters in the plot
   pdf->plotOn(plot);
   pdf->plotOn(plot, RooFit::Components("bkg_pdf"), RooFit::LineStyle(kDashed) );
   pdf->plotOn(plot, RooFit::Components("sig_pdf"), RooFit::LineColor(kRed), RooFit::LineStyle(kDashed) );
   // add also the fit parameters
   pdf->paramOn(plot);
   plot->Draw();

Before using RooStats we need to define the configuration of the model, thus create the RooStats::ModelConfig class, specifying what is the global pdf, the parameter of interest, the nuisance parameters and the observables.


   RooStats::ModelConfig mc("ModelConfig",&w);
   mc.SetPdf(*pdf);
   mc.SetParametersOfInterest(*w.var("nsig"));
   mc.SetObservables(*w.var("x"));
   // define set of nuisance parameters
   w.defineSet("nuisParams","a,nbkg");

   mc.SetNuisanceParameters(*w.set("nuisParams"));
We then import the ModelConfig object in the workspace and we save it all the workspace in a file so we can re-use it later

   // import model in the workspace 
   w.import(mc);

   // write the workspace in the file
   TString fileName = "SPlusBExpoModel.root";
   w.writeToFile(fileName,true);
   cout << "model written to file " << fileName << endl;

Optionally we can make the model more complex by adding for example some systematics in the sigma of the gaussian peak and add a Gaussian constraint on this parameter. We will see this later, how we can add it in case of similar models, and we can leave this extension as future exercise.

Run Profile Likelihood Calculator

After we have create a workspace and saved it in a file we can run the Roostats calculators. We start using the ProfileLikelihoodCalculator. First we need to retrieve the workspace from the file, using some code which can be used also later when running different RooStats calculators


  const char* infile =  "SPlusBExpoModel.root"; 
  const char* workspaceName = "w";
  const char* modelConfigName = "ModelConfig";
  const char* dataName = "data";

  /////////////////////////////////////////////////////////////
  // First part is just to access the workspace file 
  ////////////////////////////////////////////////////////////

  // Check if example input file exists
  TFile *file = TFile::Open(infile);

  // get the workspace out of the file
  RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);


  // get the modelConfig out of the file
  ModelConfig* mc = (ModelConfig*) w->obj(modelConfigName);

  // get the modelConfig out of the file
  RooAbsData* data = w->data(dataName);


Now we can start using the ProfileLikelihoodCalculator. We create the calculator class and then run it to find the 1-sigma (68.3%) confidence interval on the parameter of interest as specified in the model config


  ProfileLikelihoodCalculator pl(*data,*mc);
  pl.SetConfidenceLevel(0.683); // 68.3% interval
  LikelihoodInterval* interval = pl.GetInterval();

   // find the iterval on the first Parameter of Interest (nsig)
  RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();

  double lowerLimit = interval->LowerLimit(*firstPOI);
  double upperLimit = interval->UpperLimit(*firstPOI);


  cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
    lowerLimit << ", "<<
    upperLimit <<"] "<<endl;

We can make a plot of the profile likelihood function. In case of complex model to evaluate, we can speedup the drawing by using the Draw option "tf1" and setting a low number of points to sample the function to be plotted
  
  LikelihoodIntervalPlot * plot = new LikelihoodIntervalPlot(interval);
  //plot->SetRange(0,100);  // possible eventually to change ranges
  plot->SetNPoints(50);  // do not use too many points, it could become very slow for some models
  plot->Draw("");  // use option TF1 if too slow (plot.Draw("tf1")

Optionally you can try to generate a new model with the same shape, but with nsig=0 and compute in this case a 95% upper limit.

In addition one can try to compute with the model generate with a value of nsig>0 a discovery significance using the profile likelihood. For doing this, you need to use the ProfileLikelihoodCalculator as an hypothesis test calculator. We need to define in this case the value of the parameter of interest we want to test (i.e. nsig = 0) and save this value as a snapshot and provide this value to the calculator.

  // compute the significance w.r.t. to zero
  firstPOI->setVal(0);
  RooArgSet * nullSnap = (mc->GetParametersOfInterest())->snapshot(); 
  pl.SetNullParameters(*nullSnap);
  HypoTestResult * hr = pl.GetHypoTest();
  if (hr) {
     hr->Print();
     cout << "significance = " << hr->Significance() << endl;
     cout << "null pvalue  = " << hr->NullPValue() << endl;
     cout << "alt p value = " << hr->AlternatePValue() << endl;
}
We will see later that these parameter values used to identify the model to be tested can be saved in the ModelConfig using ModelConfig::SetSnaphot.

Run Bayesian MCMC Calculator

On the same model used before we can also run a Bayesian analysis using the MCMC calculator. You can try to reduce the number of iterations if it is too slow (or increase them if you see a not-stable result).


  // this proposal function seems fairly robust
  SequentialProposal sp(0.1);
 

  MCMCCalculator mcmc(*data,*mc);
  mcmc.SetConfidenceLevel(0.683); // 683% interval
  //  mcmc.SetProposalFunction(*pf);
  mcmc.SetProposalFunction(sp);
  mcmc.SetNumIters(100000);         // Metropolis-Hastings algorithm iterations
  mcmc.SetNumBurnInSteps(50);       // first N steps to be ignored as burn-in

  // default is the shortest interval.  here use central
  mcmc.SetLeftSideTailFraction(0.5); // for central Bayesian interval
  //mcmc.SetLeftSideTailFraction(0); // for one-sided Bayesian interval


  RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();
  //firstPOI->setMax(100);


  MCMCInterval* interval = mcmc.GetInterval();

  // print out the iterval on the first Parameter of Interest
  cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
    interval->LowerLimit(*firstPOI) << ", "<<
    interval->UpperLimit(*firstPOI) <<"] "<<endl;


  // make a plot of posterior function
  //TCanvas* c1 = 
  new TCanvas("IntervalPlot");
  MCMCIntervalPlot plot(*interval);
  plot.Draw();

You can compare the obtained result, with what obtained using the Profile Likelihood.

To speed-up the computation time, we can re-generate the model with less events or generate the data binned, by passing the option RooFit::AllBinned(true) to the RooAbsPdf::generate method:

   RooDataSet * data = pdf->generate( *x, RooFit::AllBinned());

Run Bayesian Calculator

On the same model we could also run the Bayesian calculator. We create the class and configure it. Since the numerical integration can be tricky, it is better in this case to set a reasonable range from the nuisance parameters which will be integrated to obtain the marginalised posterior function. A sensible choice is to use for example an interval using 10-sigma around the best fit values for the nuisance parameters,

  RooRealVar *nuisPar = w->var("...");
  nuisPar->setRange(nuisPar->getVal() - 10*nuisPar->getError(), nuisPar->getVal() + 10* nuisPar->getError() \
);
It is also recommended to use this class on a model which has a binned dataset or not having too many events (<~ 100), otherwise it will take a long time to obtain the result.

We create the class can set the type of interval (central, upper/lower limit or shortest) and the type of integration method


  BayesianCalculator bayesianCalc(*data,*mc);
  bayesianCalc.SetConfidenceLevel(0.683); // 68% interval

  // set the type of interval (not really needed for central which is the default)
  bayesianCalc.SetLeftSideTailFraction(0.5); // for central interval
  //bayesianCalc.SetLeftSideTailFraction(0.); // for upper limit
  //bayesianCalc.SetShortestInterval(); // for shortest interval


  // set the integration type (not really needed for the default ADAPTIVE)
  // possible alternative values are  "VEGAS" , "MISER", or "PLAIN"  (MC integration from libMathMore) 
  // "TOYMC" (toy MC integration, work when nuisances exist and they have a constraints pdf)
  TString integrationType = "";

  // this is needed if using TOYMC
  if (integrationType.Contains("TOYMC") ) { 
    RooAbsPdf * nuisPdf = RooStats::MakeNuisancePdf(*mc, "nuisance_pdf");
    if (nuisPdf) bayesianCalc.ForceNuisancePdf(*nuisPdf);
  }

  bayesianCalc.SetIntegrationType(integrationType); 

  // compute interval by scanning the posterior function
  // it is done by default when computing shortest intervals
  bayesianCalc.SetScanOfPosterior(100);

  RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();

  SimpleInterval* interval = bayesianCalc.GetInterval();
  if (!interval) { 
     cout << "Error computing Bayesian interval - exit " << endl;
     return;
  }


  double lowerLimit = interval->LowerLimit();
  double upperLimit = interval->UpperLimit();


  cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
    lowerLimit << ", "<<
    upperLimit <<"] "<<endl;

  // draw plot of posterior function

  RooPlot * plot = bayesianCalc.GetPosteriorPlot();
  if (plot) plot->Draw();  

Create Poisson Counting model

Make a counting model: Poisson(nobs | s + b). Assume b is known with un uncertainty of 20%

Make first the workspace and write in the file. Use RooFit Factory to create pdf and variables


   int nobs = 3;   // number of observed events
   double b = 1; // number of background events
   double sigmab = 0.2;   // relative uncertainty in b

   RooWorkspace w("w",true);

   // make Poisson model * Gaussian constraint
   w.factory("sum:nexp(s[3,0,15],b[1,0,10])");
   // Poisson of (n | s+b)
   w.factory("Poisson:pdf(nobs[0,50],nexp)");
   w.factory("Gaussian:constraint(b0[0,10],b,sigmab[1])");
   w.factory("PROD:model(pdf,constraint)");

 
Set the variables values for the model (nobs, b and error on b)

   RooRealVar * obs = w.var("nobs");
   w.var("b0")->setVal(b);
   w.var("b0")->setConstant(true); // needed for being treated as global observables
   w.var("sigmab")->setVal(sigmab*b);  

   // use this to avoid too large ranges 
   w.var("b")->setMax(b+10*sigmab);
   
   //set value of observed events
   obs->setVal(nobs);

make now data set and import it in the workspace


   // make data set with the namber of observed events
   RooDataSet data("data","", *obs );
   data.add(*obs );
   w.import(data);

   w.Print();
Make now the ModelConfig needed for RooStats. Add also the snapshot of the parameter of interest needed to define the null or alternate hypothesis (needed later) and the global observables for the frequentist calculators and the prior pdf for Bayesian calculator

   ModelConfig mc("ModelConfig",&w);
   mc.SetPdf(*w.pdf("model"));
   mc.SetParametersOfInterest(*w.var("s"));
   mc.SetObservables(*w.var("nobs"));
   mc.SetNuisanceParameters(*w.var("b"));

   // these are needed for the hypothesis tests
   mc.SetSnapshot(*w.var("s"));
   mc.SetGlobalObservables(*w.var("b0"));

   mc.Print();
   // import model in the workspace 
   w.import(mc);

And save now the workspace for further use

   TString fileName = "PoissonModelWithBackg.root"; 

   // write workspace in the file (recreate file if already existing)
   w.writeToFile(fileName, true);

On this Poisson model we can run then the Profile Likelihood and the Bayesian for computing the 68% interval and/or 95% upper limit. Note that when running the Bayesian calculator is also useful in this case to reduce the range of integration of nuisance parameter and also of the poi to a region where the posterior is non-zero. Optionally we can try in case of Bayesian method to use also a different prior for the signal instead of the default uniform one, like the Jeffrey prior for the Poisson model with background (P(s) = 1./sqrt(s+b) ). For defining a different prior, one needs to define in the ModelConfig the prior pdf using ModelConfig::SetPriorPdf.

A possible variant to the model is to use a Gamma (Poisson) constraint, as in the on/off problem. See the paper: "Evaluation of three methods for calculating statistical significance when incorporating a systematic uncertainty into a test of the background-only hypothesis for a Poisson process Authors: Robert D. Cousins, James T. Linnemann, Jordan Tucker http://arxiv.org/abs/physics/0702156 NIM A 595 (2008) 480--50.

In this case the constraint is expressed as a Poisson pdf representing the sideband measurement (or equivantly by a Gamma pdf, if thinking Bayesian):

   // make model for on/off problem

   int nobs = 3;   // number of observed events
   double b = 1; // number of background events
   double sigmab = 0.2;   // relative uncertainty in b

   RooWorkspace w("w",true);

   w.factory("sum:nexp(s[3,0,15],b[1,0,5])");
   // Poisson of (n | s+ b)
   w.factory("Poisson:pdf(nobs[0,50],nexp)");

   // use tau :  y = tau*b                                                                             
   w.factory("prod:y(tau[10],b)");
   w.factory("prod:y0(tau,b0[1,0,10])");
   w.factory("Poisson:off_pdf(y0,y)");
   w.factory("PROD:model(pdf,off_pdf)");

   RooRealVar * obs = w.var("nobs");
   w.var("b0")->setVal(b);
   w.var("b0")->setConstant(true); // needed for being treated as global observables   

   //set value of observed events                                                                      
   obs->setVal(nobs);

   // make data set with the namber of observed events                                                 
   RooDataSet data("data","", *obs );
   data.add(*obs );
   w.import(data);

   w.Print();                

 

Another possible option is to write instead the constraint as a log-normal. We could first use as parameter of interest instead of s, mu, the ratio of the observed cross section divided by the nominal one. We introduce also another nuisance parameter the efficiency. The model is then built as below. We need to start defining the efficiency constraint, using an extra variable beta_eff, which is defined as Gaussian, and we use as efficiency as the exp( beta_eff), using the fact that if a variable t is normal distributed, exp(t) is log-normal.



   w.factory( "eff_nominal[0.5]" );  // nominal efficiency                                             
   w.factory( "eff_kappa[1.20]" );  // relative error in efficiency ( log-normal parameter (1+ error)  
   w.factory( "expr:eff('eff_nominal*pow(eff_kappa,beta_eff)',eff_nominal,eff_kappa,beta_eff[0,-5,5])"\
 );
   w.factory( "Gaussian:constraint_eff(glob_eff[0,-5,5],beta_eff,1)" );
   // glob_efficiency is the global observables                                                        
   w.var("glob_eff")->setConstant(true);


   // make model for on/off problem                                                                    
   w.factory("prod:nsig( mu[0,15], xsec[2], eff)" );
   w.factory("sum:nexp(nsig,b[1,0,10])");
   // Poisson of (n | s+ b)                                                                            

   w.factory("Poisson:pdf(nobs[0,50],nexp)");

   // add off term                                                                                     
   // use tau :  y = tau*b                                                                             
   w.factory("prod:y(tau[10],b)");
   w.factory("prod:y0(tau,b0[1,0,10])");
   w.factory("Poisson:off_pdf(y0,y)");

   // make global model                                                                                
   w.factory("PROD:model(pdf,off_pdf,constraint_eff)");

   w.Print();

   RooRealVar * obs = w.var("nobs");
   w.var("b0")->setVal(b);
   w.var("b0")->setConstant(true); // needed for being treated as global observables                   



 
We need also to define the ModelConfig by adding beta_eff as extra nuisance parameters

   ModelConfig mc("ModelConfig",&w);
   mc.SetPdf(*w.pdf("model"));
   mc.SetParametersOfInterest(*w.var("mu"));
   mc.SetObservables(*w.var("nobs"));
   mc.SetNuisanceParameters(RooArgSet(*w.var("b"),*w.var("beta_eff")));

   // these are needed for the hypothesis tests                                                        
   mc.SetSnapshot(*w.var("mu"));
   mc.SetGlobalObservables(*w.var("b0"));


   mc.Print();
   // import model in the workspace                                                                    
   w.import(mc);

Exercises for this tutorial (Part 2)

Compute in this case using the same models generated before:

  • 95% CL upper-limit with the inverter method and CLs, first using the frequentist calculator (with toys) then using the asymptotic formulae
  • A significance test using the FrequentistCalculator

It is recommended to build also the ROOT macro in this case. Alternatively, one could use the standard Roostats tutorials.

*StandardHypoTestInvDemo.C for computing limit using the inverter *StandardHypoTestDemo.C for computing the significance

As for the other Standard tutorials, we can run these macros, by passing workspace file, workspace name, ModelConfig name and data set and in addition some extra options, which are documented at the end of this page.

Run Hypothesis test Inverter

We run the hypothesis tests on the previous model (or the number counting). But first we need to create a background only model. We can copy it from the S+B model by setting the S value to zero:


      ModelConfig*  sbModel = (ModelConfig*) w->obj(modelConfigName);
      RooRealVar* poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first();
      ModelConfig * bModel = (ModelConfig*) sbModel->Clone();
      bModel->SetName(TString(sbModel->GetName())+TString("_with_poi_0"));
      double oldval = poi->getVal();
      poi->setVal(0);
      bModel->SetSnapshot( *poi  );
      poi->setVal(oldval);

Then we create the Hypo Test calculator class. We can choose between the FrequentistCalculator, the HybridCalculator or the AsymptoticCalculator. We give the code for both three cases: The Frequentist and the Hybrid uses toys top get the sampling distribution. You can than set the number of toys to be used. Note also that the hypo test calculator class takes first alternate model then null model. In this case, when we compute a limit the null is S+B.

   FrequentistCalculator *  fc  = new FrequentistCalculator(*data, *bModel, *sbModel);
   fc->SetToys(1000,500);    // 1000 for null (S+B) , 50 for alt (B)

  • Code for the HybridCalculator (in this case we need to provides pdf for the nuisance parameters)

   // // Hybrid calculator 
   HybridCalculator *  hc = new HybridCalculator(*data, *bModel, *sbModel);
   hc->SetToys(1000,500);    // 1000 for null (S+B) , 50 for alt (B)
   // check for nuisance prior pdf in case of nuisance parameters 
   if (sbModel->GetNuisanceParameters() && sbModel->GetNuisanceParameters()->getSize() > 0) {
      // for hybrid need nuisance pdf 
      RooAbsPdf * nuisPdf = RooStats::MakeNuisancePdf(*sbModel,"nuisancePdf_sbmodel");
      hc->ForcePriorNuisanceAlt(*nuisPdf);
      hc->ForcePriorNuisanceNull(*nuisPdf);
   }

  • Code for the Asymptotic Calculator
   // asymptotic calculator
   AsymptoticCalculator * ac = new AsymptoticCalculator(*data, *bModel, *sbModel);
   ac->SetOneSided(true);  // for one-side tests (limits)
   AsymptoticCalculator::SetPrintLevel(-1);

We can now create the Inverter class

   // create hypotest inverter 
   // passing the desired calculator 
   HypoTestInverter calc(*ac);

   // set confidence level (e.g. 95% upper limits)
   calc.SetConfidenceLevel(0.95);
  
   // for CLS
   bool useCLs = true;
   calc.UseCLs(useCLs);
   calc.SetVerbose(false);
 

We need also to configure the ToyMCSampler and set the test statistics. Note that in case we use the Poisson model, the pdf is not extended and we need to specify the number of events for each toy we generate, which in the Poisson model is 1.

  
   ToyMCSampler *toymcs = (ToyMCSampler*)calc.GetHypoTestCalculator()->GetTestStatSampler();
   // for number counting (extended pdf do not need this)
   // toymcs->SetNEventsPerToy(1);
   

   // profile likelihood test statistics 
   ProfileLikelihoodTestStat profll(*sbModel->GetPdf());
   // for CLs (bounded intervals) use one-sided profile likelihood
   if (useCLs) profll.SetOneSided(true);

   // ratio of profile likelihood - need to pass snapshot for the alt 
   // RatioOfProfiledLikelihoodsTestStat ropl(*sbModel->GetPdf(), *bModel->GetPdf(), bModel->GetSnapshot());
   

   // set the test statistic to use 
   toymcs->SetTestStatistic(&profll);

  // if the pdf is not extended (e.g. in the Poisson model)                                            
  // we need to set the number of events  
  if (!sbModel->GetPdf()->canBeExtended())
     toymcs->SetNEventsPerToy(1);
                                                             

 

We can run now the inverter

 
   int npoints = 10;  // number of points to scan
   // min and max (better to choose smaller intervals)
   double poimin = poi->getMin();
   double poimax = poi->getMax();
   //poimin = 0; poimax=10;

   std::cout << "Doing a fixed scan  in interval : " << poimin << " , " << poimax << std::endl;
   calc.SetFixedScan(npoints,poimin,poimax);
  

   HypoTestInverterResult * r = calc.GetInterval();
And finally we can query the result and make some plots:
 
  
   double upperLimit = r->UpperLimit();
   double ulError = r->UpperLimitEstimatedError();
   // double lowerLimit = r->LowerLimit();
   // double llError = r->LowerLimitEstimatedError();
   // if (lowerLimit < upperLimit*(1.- 1.E-4)) 
   //    std::cout << "The computed lower limit is: " << lowerLimit << " +/- " << llError << std::endl;
   std::cout << "The computed upper limit is: " << upperLimit << " +/- " << ulError << std::endl;
  
   // compute expected limit
   std::cout << "Expected upper limits, using the B (alternate) model : " << std::endl;
   std::cout << " expected limit (median) " << r->GetExpectedUpperLimit(0) << std::endl;
   std::cout << " expected limit (-1 sig) " << r->GetExpectedUpperLimit(-1) << std::endl;
   std::cout << " expected limit (+1 sig) " << r->GetExpectedUpperLimit(1) << std::endl;
  

   // plot now the result of the scan 

   HypoTestInverterPlot *plot = new HypoTestInverterPlot("HTI_Result_Plot","Feldman-Cousins Interval",r);

   // plot in a new canvas with style
   TCanvas * c1 = new TCanvas("HypoTestInverter Scan"); 
   c1->SetLogy(false);

   plot->Draw("CLb 2CL");  // plot also CLb and CLs+b 
   //plot->Draw("OBS");  // plot only observed p-value


   // plot also in a new canvas the test statistics distributions 
  
   // plot test statistics distributions for the two hypothesis
   // when distribution is generated (case of FrequentistCalculators)
   const int n = r->ArraySize();
   if (n> 0 &&  r->GetResult(0)->GetNullDistribution() ) { 
      TCanvas * c2 = new TCanvas("Test Statistic Distributions","",2);
      if (n > 1) {
         int ny = TMath::CeilNint( sqrt(n) );
         int nx = TMath::CeilNint(double(n)/ny);
         c2->Divide( nx,ny);
      }
      for (int i=0; i<n; i++) {
         if (n > 1) c2->cd(i+1);
         SamplingDistPlot * pl = plot->MakeTestStatPlot(i);
         pl->SetLogYaxis(true);
         pl->Draw();
      }
   }

StandardHypoTestInvDemo.C

This is a macro in $ROOTSYS/tutorials/roostats which can run the hypothesis test inversion with all these options. From the root prompt you can do, for running the AsymptoticCalculator on the parametric unbinned model:

 
   .L $ROOTSYS/tutorials/roostats/StandardHypoTestInvDemo.C+

   StandardHypoTestInvDemo("SPlusBExpoModel.root", "w", "ModelConfig", "", "data", 2, 3, true, 20, 0, 100 )
You can look at the printed out observed and expected limits, and the CLs scan plot. Now run the full CLs calculation (it will take considerably more time but should still be manageable):
 
   .L $ROOTSYS/tutorials/roostats/StandardHypoTestInvDemo.C+

   StandardHypoTestInvDemo("SPlusBExpoModel.root", "w", "ModelConfig", "", "data", 0, 3, true, 10, 0, 100, 500, false )
We can look at the sampling distribution plots obtained.

If running on the number counting model you need to set to trye the last argument

 
   StandardHypoTestInvDemo("SPlusBExpoModel.root", "w", "ModelConfig", "", "data", 0, 3, true, 10, 0, 100, 500, true )
Here is the list of input arguments to the macro:
StandardHypoTestInvDemo(const char * infile = 0,
                        const char * wsName = "combined",
                        const char * modelSBName = "ModelConfig",
                        const char * modelBName = "",
                        const char * dataName = "obsData",                 
                        int calculatorType = 0,
                        int testStatType = 0, 
                        bool useCLs = true ,  
                        int npoints = 6,   
                        double poimin = 0,  
                        double poimax = 5, 
                        int ntoys=1000,
                        bool useNumberCounting = false,
                        const char * nuisPriorName = 0){
Where we have for the calculatorType:
  • type = 0 Freq calculator
  • type = 1 Hybrid calculator
  • type = 2 Asymptotic calculator

  • testStatType = 0 LEP
  • = 1 Tevatron
  • = 2 Profile Likelihood
  • = 3 Profile Likelihood one sided (i.e. = 0 if mu < mu_hat)

  • useCLs : scan for CLs (otherwise for CLs+b)

  • npoints: number of points to scan , for autoscan set npoints = -1

  • poimin,poimax: min/max value to scan in case of fixed scans (if min > max, try to find automatically)

  • ntoys: number of toys to use

  • useNumberCounting: set to true when using number counting events

  • nuisPriorName: name of prior for the nnuisance. This is often expressed as constraint term in the global model

Run Hypothesis test (Significance test)

On the same model we can run an hypothesis test to get the signal significance (number of sigma). In this case the null hypothesis is the background only model and the alternate is the S+B model.

Example running the frequentist calculator:

   FrequentistCalculator *  fc  = new FrequentistCalculator(*data, *sbModel, *bModel);
   fc->SetToys(1000,1000);    // 1000 for null (B) and 1000 for alt (S+B) 

   // create the test statistics
   ProfileLikelihoodTestStat profll(*sbModel->GetPdf());
   // use one-sided profile likelihood
    profll.SetOneSidedDiscovery(true);

   // configure  ToyMCSampler and set the test statistics
   ToyMCSampler *toymcs = (ToyMCSampler*)fc->GetTestStatSampler();
    toymcs->SetTestStatistic(&profll);
  
   if (!sbModel->GetPdf()->canBeExtended())
     toymcs->SetNEventsPerToy(1);
 
   // run the test
    HypoTestResult * r = fc->GetHypoTest();
    r->Print();

    // plot test statistic distributions
    HypoTestPlot * plot = new HypoTestPlot(*r);
    plot->Draw();

Optionally you can make, when using the shape model (Exponential background + Gaussian peak) a mass scan to get the p0 values as function of the mass. You loop on different mass values and then you run for each one of them the hypothesis test and you plot the p0 value (the null p-value) obtained from the hypothesis test.

We can run also the asymptotic calculator to get the asymptotic p-value and the expected one to be faster:

   vector<double> masses;
  vector<double> p0values;
  vector<double> p0valuesExpected;

  double massMin = 0.5;
  double massMax = 8.5;


  for( double mass=massMin; mass<=massMax; mass += (massMax-massMin)/40.0 )
  {


     cout << endl << endl << "Running for mass: " << mass << endl << endl;
     w->var("mass")->setVal( mass );

     AsymptoticCalculator *  ac  = new AsymptoticCalculator(*data, *sbModel, *bModel);
     ac->SetOneSidedDiscovery(true);  // for one-side discovery test                                  \
                                                                                                       
     AsymptoticCalculator::SetPrintLevel(-1);


     HypoTestResult* asymCalcResult = ac->GetHypoTest();

     asymCalcResult->Print();

     masses.push_back( mass );
     p0values.push_back( asymCalcResult->NullPValue() );

     double expectedP0 = AsymptoticCalculator::GetExpectedPValues(  asymCalcResult->NullPValue(),  asy\
mCalcResult->AlternatePValue(), 0, false);
     p0valuesExpected.push_back( expectedP0 );
  }

  
You can then display in a graph the obtained observed and expected p-values:

  TGraph * graph1  = new TGraph(masses.size(),&masses[0],&p0values[0]);
  TGraph * graph2  = new TGraph(masses.size(),&masses[0],&p0valuesExpected[0]);

  graph1->SetMarkerStyle(20);
  graph1->Draw("APC");
  graph2->SetLineStyle(2);
  graph2->Draw("C");
  gPad->SetLogy(true);

Combination of Model

As last example we make a combined model from two different channels. We assume we have two gaussian signal plus exponential background in two different channels, where the signal parameters are common, while the background parameters are different for the two channels. We use the factory tool to build the combined pdf, the RooSimultaneous for the two channel. We will generate some pseudo-data for the two channels and we will make a combined fit and then save this workspace in a file. The workspace can then be used by any of the RooStats calculator we have described before.

We start making the pdf for each single channel. Note we redefine nsig = mu * xsec where xsec is different for the two channel and is given, while mu is the parameter of interest of the model

   RooWorkspace w("w");

   w.factory("Exponential:bkg1_pdf(x[0,10], a1[-0.5,-2,-0.2])");
   w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.5])");

   w.factory("prod:nsig1(mu[1,0,5],xsec1[30])");

   w.factory("SUM:model1(nsig1*sig_pdf, nbkg1[1000,0,10000]*bkg1_pdf)");  // for extended model        

   w.factory("Exponential:bkg2_pdf(x, a2[-0.5,-2,-0.2])");

   w.factory("prod:nsig2(mu,xsec2[10])");
   w.factory("SUM:model2(nsig2*sig_pdf, nbkg2[100,0,10000]*bkg2_pdf)");  // for extended model         

Now we can generate the joint model


   // Create discrete observable to label channels                                                     
   w.factory("index[channel1,channel2]");

   // Create joint pdf (RooSimultaneous)                                                               
   w.factory("SIMUL:jointModel(index,channel1=model1,channel2=model2)");

We generate also a data set from the created model. Note that we need to add to the observables also the category


   RooAbsPdf * pdf = w.pdf("jointModel");
   RooRealVar * x = w.var("x");  // the observable                                                     
   RooCategory * index = w.cat("index");  // the category                                              


   // generate the data (since it is exetended we can generate the global model                        

   // use fixed random numbers for reproducibility                                                     
   RooRandom::randomGenerator()->SetSeed(111);
   // generate binned                                                                                  
   // plot the generate data in 50 bins (default is 100)                                               
   x->setBins(50);

   // generate events of joint model                                                                   
   // NB need to add also category as observable                                                       
   RooDataSet * data = pdf->generate( RooArgSet(*x,*index), AllBinned());  // will generate accordint \
to total S+B events                                                                                    
   data->SetName("data");
   w.import(*data);

Here is an example on how we can fit the model and also plot the resulting pdf for each channel separately


   RooPlot * plot1 = x->frame();
   RooPlot * plot2 = x->frame();
   data->plotOn(plot1,RooFit::Cut("index==index::channel1"));
   data->plotOn(plot2,RooFit::Cut("index==index::channel2"));
                                                                               
   RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad"));
   r->Print();

   pdf->plotOn(plot1,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel1"));
   pdf->plotOn(plot2,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel2"));
  //draw the two separate pdf's                                                                        
   pdf->paramOn(plot1);
   c1 = new TCanvas();
   c1->Divide(1,2);
   c1->cd(1); plot1->Draw();
   c1->cd(2); plot2->Draw();

We then create ModelConfig and save all workspace in a file to be used later

  RooStats::ModelConfig mc("ModelConfig",&w);
   mc.SetPdf(*pdf);
   mc.SetParametersOfInterest(*w.var("mu"));
   mc.SetObservables(RooArgSet(*w.var("x"),*w.cat("index")));

   // define set of nuisance parameters                                                                
   w.defineSet("nuisParams","a1,nbkg1,a2,nbkg2");

   mc.SetNuisanceParameters(*w.set("nuisParams"));

   // set also the snapshot                                                                            
   mc.SetSnapshot(*w.var("mu"));

   // import model in the workspace                                                                    
   w.import(mc);

   // write the workspace in the file                                                                  
   TString fileName = "CombinedModel.root";
   w.writeToFile(fileName,true);
   cout << "model written to file " << fileName << endl;

We can then use this workspace on the different RooStats calculators.

-- LorenzoMoneta - 26-Apr-2012

Topic revision: r9 - 2012-08-21 - LorenzoMoneta
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    RooStats All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback