TWiki> RooStats Web>RooStats>RooStatsTutorialsAugust2012 (2012-08-21, LorenzoMoneta)

# RooStats tutorials

Set of RooStats tutorials given at Bonn (20-22 August 2012) at the school and workshop for limit settings at the LHC, see https://indico.desy.de/conferenceDisplay.py?confId=6083 .

Below is given the required materials used during the RooStats tutorials.

Additional material has been showed at the Desy statistical school (see https://indico.desy.de/conferenceOtherViews.py?view=standard&confId=5065 )

## Introduction to RooStats

See slides attached to the agenda in Indico: https://indico.desy.de/contributionDisplay.py?contribId=1&confId=6083 ### Getting started with the software

A few advice words before starting. The tutorials requires the ROOT version 5.34.01 which is installed in the Virtual Machine of the school.

```           using namespace RooFit;
using namespace RooStats;
```
• avoid running a macro multiple times in the same ROOT CINT sessions (it might unfortunately crash sometimes if you do it).
• note that Roostats methods start with capital letter (as in ROOT) while Roofit ones start with lower letter *Roostats calculator are quite verbose, you can suppress the output by doing:
```RooMsgService::instance().setGlobalKillBelow(RooFit::WARNING) ;
```

### Terminology and Conventions

Here we give pragmatic definitions for a few basic concepts that we will use.
• observable - something you measure in an experiment, for example a particle's momentum. Often, a function of measured quantities, for example an invariant mass of several particles
• global observable or auxiliary observable - an observable from another measurement, for example, the integrated luminosity
• model - a set of probability density functions (PDFs), which describe distributions of the observables or functions of observables
• model parameter - any variable in your PDF expression, which is not an observable
• parameter of interest (POI) - a model parameter that you study, for example a cross section of your signal process
• nuisance parameter - every other model parameter, which is not your parameter of interest
• data or dataset - a set of values of observables, either measured in an experiment, or simulated
• likelihood - a model computed for a particular dataset
• hypothesis - a particular model, with specified observables, POI, nuisance parameters, and prior PDFs (in case of Bayesian inference)
• prior PDF - a probability density for an observable or a model parameter, which is known a priori, i.e. before a measurement is considered. This is a Bayesian concept exclusively. Prior has no meaning or place in a frequentist type of inference
• Bayesian - a type of statistical inference that usually produces probability of the hypothesis given the data. Requires a prior.
• frequentist - a type of statistical inference that usually produces probability of the data given the hypothesis.

## Exercises for this tutorial (Part 1)

Below will given code snippets showing how to build some different analysis models and then on how to run some of the Roostats calculators. FIrst it is suggested to build a model (a RooFit workspace) save it in a file and then read it back and run the RooStats code on it. Using the snippets of code, one can build a full ROOT macro which can be run interactively in ROOT or compiled using ACLIC. The models we are going to build are:

• Observable-based analysis: 30 Gaussian-distributed signal event over ~ 1000 exponential-distributed background. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.5. We assume that the location and width of the signal are known. W
• Poisson model: let's assume you observe n events in data while expecting b background event. We consider some systematic uncertainties in the background model value, b. We start expressing this uncertainty as Gaussian. We will see also how we can use a log-normal distribution for expressing this uncertainty or extend the model adding other systematic uncertainties.

Compute:

• 68% CL 2-sided confidence interval from the (profiled-) likelihood ratio. As option one could compute also the significance using the profiled likelihood.
• 68% credible interval using the MCMCCalculator or the Bayesian Calculator (based on numerical integration). Compute as well the 95% Bayesian upper limit.

We provide code snippet for creating the models, save them in a file and then for running the calculator after having read the workspace from the file. All the Roostats calculators take as input a data set and a Model Config object. It is therefore convenient to save the workspace (with model and data) in a ROOT file and then read it afterwards for running the calculators.

If you have problem creating the code for running the calculators, you can run the Standard Roostats tutorials (in \$ROOTSYS/tutorials/roostats)

Tutorials you could use are:

They can be run as following from the Root prompt by passing the workspace file name (e.g. "SPlusBExpoModel.root" ), workspace name, model config name and data set name:

```
.L \$ROOTSYS/tutorials/roostats/StandardProfileLikelihoodDemo.C
StandardProfileLikelihoodDemo("SPlusBExpoModel.root", "w", "ModelConfig", "data" )
```

Other tutorials go into more details to create various models and run the calculators. See them all at http://root.cern.ch/root/html/tutorials/roostats/index.html .

Some of the following code here is extracted from exercises macro showed at the Desy Statistical school in April 2012. You can look also at the tar file with the exercises shown at that school,

### Create Model with shape: Gaussian peak plus an exponential background (Unbinned Model)

We start by making a model containing a Gaussian peak and an exponential background. The data are unbinned.

First we create the workspace and then the pdf (gaussian and exponential) using the factory.

```   RooWorkspace w("w");
w.factory("Exponential:bkg_pdf(x[0,10], a[-0.5,-1000,0])");
w.factory("Gaussian:sig_pdf(x, mass, sigma[0.5])");
```
The mass and the sigma parameter of the signal Gaussian are kept constant. They do not have a range while the exponential parameter is not. Note that if you want, you can re-parametrize the exponential parameter as following (before creating the exponential pdf):
```   // use cexpr for compiled expressions
w.factory("expr:a('(-1./tau)',tau[2,0,100])");
w.factory("Exponential:bkg_pdf(x[0,10], a)");
```

We can then sum the two pdf, according to the number of expected events for the signal and the background. Note that the resulting model will be extended, i.e. the total number of events will fluctuate according to a Poisson distribution

```   w.factory("SUM:model(nsig[0,1000]*sig_pdf, nbkg[0,1000000]*bkg_pdf)");  // for extended model
```
Given the model, we can now generate some pseudo data, assuming that nsig = 30 and nbkg=1000. Afterwards we that we can fit this generated data set.

```   RooAbsPdf * pdf = w.pdf("model");
RooRealVar * x = w.var("x");  // the observable

// generate the data (nsig = 30, nbkg=1000)
w.var("nsig")->setVal(30);
w.var("nbkg")->setVal(1000);
// use fixed random numbers for reproducibility
RooRandom::randomGenerator()->SetSeed(111);
RooDataSet * data = pdf->generate( *x);  // will generate accordint to total S+B events
data->SetName("data");
w.import(*data);

data->Print();
```
We can plot the generated data:

```   // plot the generate data in 50 bins (default is 100)
x->setBins(50);
RooPlot * plot = x->frame();
data->plotOn(plot);
plot->Draw();
```

We can now fit the data using RooFit

```  pdf->fitTo(*data);
```
If you want to look at the result of the fit, and for example use a different minimizer, you can do:
```RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad"));
r->Print();
```
We can then plot the fitted function and optionally draw also the two components separately and add the fit parameters in the plot
```   pdf->plotOn(plot);
pdf->plotOn(plot, RooFit::Components("bkg_pdf"), RooFit::LineStyle(kDashed) );
pdf->plotOn(plot, RooFit::Components("sig_pdf"), RooFit::LineColor(kRed), RooFit::LineStyle(kDashed) );
// add also the fit parameters
pdf->paramOn(plot);
plot->Draw();

```

Before using RooStats we need to define the configuration of the model, thus create the RooStats::ModelConfig class, specifying what is the global pdf, the parameter of interest, the nuisance parameters and the observables.

```
RooStats::ModelConfig mc("ModelConfig",&w);
mc.SetPdf(*pdf);
mc.SetParametersOfInterest(*w.var("nsig"));
mc.SetObservables(*w.var("x"));
// define set of nuisance parameters
w.defineSet("nuisParams","a,nbkg");

mc.SetNuisanceParameters(*w.set("nuisParams"));
```
We then import the ModelConfig object in the workspace and we save it all the workspace in a file so we can re-use it later

```   // import model in the workspace
w.import(mc);

// write the workspace in the file
TString fileName = "SPlusBExpoModel.root";
w.writeToFile(fileName,true);
cout << "model written to file " << fileName << endl;
```

Optionally we can make the model more complex by adding for example some systematics in the sigma of the gaussian peak and add a Gaussian constraint on this parameter. We will see this later, how we can add it in case of similar models, and we can leave this extension as future exercise.

### Run Profile Likelihood Calculator

After we have create a workspace and saved it in a file we can run the Roostats calculators. We start using the ProfileLikelihoodCalculator. First we need to retrieve the workspace from the file, using some code which can be used also later when running different RooStats calculators

```
const char* infile =  "SPlusBExpoModel.root";
const char* workspaceName = "w";
const char* modelConfigName = "ModelConfig";
const char* dataName = "data";

/////////////////////////////////////////////////////////////
// First part is just to access the workspace file
////////////////////////////////////////////////////////////

// Check if example input file exists
TFile *file = TFile::Open(infile);

// get the workspace out of the file
RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName);

// get the modelConfig out of the file
ModelConfig* mc = (ModelConfig*) w->obj(modelConfigName);

// get the modelConfig out of the file
RooAbsData* data = w->data(dataName);

```

Now we can start using the ProfileLikelihoodCalculator. We create the calculator class and then run it to find the 1-sigma (68.3%) confidence interval on the parameter of interest as specified in the model config

```
ProfileLikelihoodCalculator pl(*data,*mc);
pl.SetConfidenceLevel(0.683); // 68.3% interval
LikelihoodInterval* interval = pl.GetInterval();

// find the iterval on the first Parameter of Interest (nsig)
RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();

double lowerLimit = interval->LowerLimit(*firstPOI);
double upperLimit = interval->UpperLimit(*firstPOI);

cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
lowerLimit << ", "<<
upperLimit <<"] "<<endl;

```
We can make a plot of the profile likelihood function. In case of complex model to evaluate, we can speedup the drawing by using the Draw option "tf1" and setting a low number of points to sample the function to be plotted
```
LikelihoodIntervalPlot * plot = new LikelihoodIntervalPlot(interval);
//plot->SetRange(0,100);  // possible eventually to change ranges
plot->SetNPoints(50);  // do not use too many points, it could become very slow for some models
plot->Draw("");  // use option TF1 if too slow (plot.Draw("tf1")
```

Optionally you can try to generate a new model with the same shape, but with nsig=0 and compute in this case a 95% upper limit.

In addition one can try to compute with the model generate with a value of nsig>0 a discovery significance using the profile likelihood. For doing this, you need to use the ProfileLikelihoodCalculator as an hypothesis test calculator. We need to define in this case the value of the parameter of interest we want to test (i.e. nsig = 0) and save this value as a snapshot and provide this value to the calculator.

```  // compute the significance w.r.t. to zero
firstPOI->setVal(0);
RooArgSet * nullSnap = (mc->GetParametersOfInterest())->snapshot();
pl.SetNullParameters(*nullSnap);
HypoTestResult * hr = pl.GetHypoTest();
if (hr) {
hr->Print();
cout << "significance = " << hr->Significance() << endl;
cout << "null pvalue  = " << hr->NullPValue() << endl;
cout << "alt p value = " << hr->AlternatePValue() << endl;
}
```
We will see later that these parameter values used to identify the model to be tested can be saved in the ModelConfig using ModelConfig::SetSnaphot.

### Run Bayesian MCMC Calculator

On the same model used before we can also run a Bayesian analysis using the MCMC calculator. You can try to reduce the number of iterations if it is too slow (or increase them if you see a not-stable result).

```
// this proposal function seems fairly robust
SequentialProposal sp(0.1);

MCMCCalculator mcmc(*data,*mc);
mcmc.SetConfidenceLevel(0.683); // 683% interval
//  mcmc.SetProposalFunction(*pf);
mcmc.SetProposalFunction(sp);
mcmc.SetNumIters(100000);         // Metropolis-Hastings algorithm iterations
mcmc.SetNumBurnInSteps(50);       // first N steps to be ignored as burn-in

// default is the shortest interval.  here use central
mcmc.SetLeftSideTailFraction(0.5); // for central Bayesian interval
//mcmc.SetLeftSideTailFraction(0); // for one-sided Bayesian interval

RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();
//firstPOI->setMax(100);

MCMCInterval* interval = mcmc.GetInterval();

// print out the iterval on the first Parameter of Interest
cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
interval->LowerLimit(*firstPOI) << ", "<<
interval->UpperLimit(*firstPOI) <<"] "<<endl;

// make a plot of posterior function
//TCanvas* c1 =
new TCanvas("IntervalPlot");
MCMCIntervalPlot plot(*interval);
plot.Draw();

```

You can compare the obtained result, with what obtained using the Profile Likelihood.

To speed-up the computation time, we can re-generate the model with less events or generate the data binned, by passing the option RooFit::AllBinned(true) to the RooAbsPdf::generate method:

```   RooDataSet * data = pdf->generate( *x, RooFit::AllBinned());
```

### Run Bayesian Calculator

On the same model we could also run the Bayesian calculator. We create the class and configure it. Since the numerical integration can be tricky, it is better in this case to set a reasonable range from the nuisance parameters which will be integrated to obtain the marginalised posterior function. A sensible choice is to use for example an interval using 10-sigma around the best fit values for the nuisance parameters,

```  RooRealVar *nuisPar = w->var("...");
nuisPar->setRange(nuisPar->getVal() - 10*nuisPar->getError(), nuisPar->getVal() + 10* nuisPar->getError() \
);
```
It is also recommended to use this class on a model which has a binned dataset or not having too many events (<~ 100), otherwise it will take a long time to obtain the result.

We create the class can set the type of interval (central, upper/lower limit or shortest) and the type of integration method

```
BayesianCalculator bayesianCalc(*data,*mc);
bayesianCalc.SetConfidenceLevel(0.683); // 68% interval

// set the type of interval (not really needed for central which is the default)
bayesianCalc.SetLeftSideTailFraction(0.5); // for central interval
//bayesianCalc.SetLeftSideTailFraction(0.); // for upper limit
//bayesianCalc.SetShortestInterval(); // for shortest interval

// set the integration type (not really needed for the default ADAPTIVE)
// possible alternative values are  "VEGAS" , "MISER", or "PLAIN"  (MC integration from libMathMore)
// "TOYMC" (toy MC integration, work when nuisances exist and they have a constraints pdf)
TString integrationType = "";

// this is needed if using TOYMC
if (integrationType.Contains("TOYMC") ) {
RooAbsPdf * nuisPdf = RooStats::MakeNuisancePdf(*mc, "nuisance_pdf");
if (nuisPdf) bayesianCalc.ForceNuisancePdf(*nuisPdf);
}

bayesianCalc.SetIntegrationType(integrationType);

// compute interval by scanning the posterior function
// it is done by default when computing shortest intervals
bayesianCalc.SetScanOfPosterior(100);

RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first();

SimpleInterval* interval = bayesianCalc.GetInterval();
if (!interval) {
cout << "Error computing Bayesian interval - exit " << endl;
return;
}

double lowerLimit = interval->LowerLimit();
double upperLimit = interval->UpperLimit();

cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<<
lowerLimit << ", "<<
upperLimit <<"] "<<endl;

// draw plot of posterior function

RooPlot * plot = bayesianCalc.GetPosteriorPlot();
if (plot) plot->Draw();

```

### Create Poisson Counting model

Make a counting model: Poisson(nobs | s + b). Assume b is known with un uncertainty of 20%

Make first the workspace and write in the file. Use RooFit Factory to create pdf and variables

```
int nobs = 3;   // number of observed events
double b = 1; // number of background events
double sigmab = 0.2;   // relative uncertainty in b

RooWorkspace w("w",true);

// make Poisson model * Gaussian constraint
w.factory("sum:nexp(s[3,0,15],b[1,0,10])");
// Poisson of (n | s+b)
w.factory("Poisson:pdf(nobs[0,50],nexp)");
w.factory("Gaussian:constraint(b0[0,10],b,sigmab)");
w.factory("PROD:model(pdf,constraint)");

```
Set the variables values for the model (nobs, b and error on b)

```   RooRealVar * obs = w.var("nobs");
w.var("b0")->setVal(b);
w.var("b0")->setConstant(true); // needed for being treated as global observables
w.var("sigmab")->setVal(sigmab*b);

// use this to avoid too large ranges
w.var("b")->setMax(b+10*sigmab);

//set value of observed events
obs->setVal(nobs);
```

make now data set and import it in the workspace

```
// make data set with the namber of observed events
RooDataSet data("data","", *obs );
w.import(data);

w.Print();
```
Make now the ModelConfig needed for RooStats. Add also the snapshot of the parameter of interest needed to define the null or alternate hypothesis (needed later) and the global observables for the frequentist calculators and the prior pdf for Bayesian calculator
```
ModelConfig mc("ModelConfig",&w);
mc.SetPdf(*w.pdf("model"));
mc.SetParametersOfInterest(*w.var("s"));
mc.SetObservables(*w.var("nobs"));
mc.SetNuisanceParameters(*w.var("b"));

// these are needed for the hypothesis tests
mc.SetSnapshot(*w.var("s"));
mc.SetGlobalObservables(*w.var("b0"));

mc.Print();
// import model in the workspace
w.import(mc);
```

And save now the workspace for further use

```   TString fileName = "PoissonModelWithBackg.root";

// write workspace in the file (recreate file if already existing)
w.writeToFile(fileName, true);

```
On this Poisson model we can run then the Profile Likelihood and the Bayesian for computing the 68% interval and/or 95% upper limit. Note that when running the Bayesian calculator is also useful in this case to reduce the range of integration of nuisance parameter and also of the poi to a region where the posterior is non-zero. Optionally we can try in case of Bayesian method to use also a different prior for the signal instead of the default uniform one, like the Jeffrey prior for the Poisson model with background (P(s) = 1./sqrt(s+b) ). For defining a different prior, one needs to define in the ModelConfig the prior pdf using ModelConfig::SetPriorPdf.

A possible variant to the model is to use a Gamma (Poisson) constraint, as in the on/off problem. See the paper: "Evaluation of three methods for calculating statistical significance when incorporating a systematic uncertainty into a test of the background-only hypothesis for a Poisson process Authors: Robert D. Cousins, James T. Linnemann, Jordan Tucker http://arxiv.org/abs/physics/0702156 NIM A 595 (2008) 480--50.

In this case the constraint is expressed as a Poisson pdf representing the sideband measurement (or equivantly by a Gamma pdf, if thinking Bayesian):

```   // make model for on/off problem

int nobs = 3;   // number of observed events
double b = 1; // number of background events
double sigmab = 0.2;   // relative uncertainty in b

RooWorkspace w("w",true);

w.factory("sum:nexp(s[3,0,15],b[1,0,5])");
// Poisson of (n | s+ b)
w.factory("Poisson:pdf(nobs[0,50],nexp)");

// use tau :  y = tau*b
w.factory("prod:y(tau,b)");
w.factory("prod:y0(tau,b0[1,0,10])");
w.factory("Poisson:off_pdf(y0,y)");
w.factory("PROD:model(pdf,off_pdf)");

RooRealVar * obs = w.var("nobs");
w.var("b0")->setVal(b);
w.var("b0")->setConstant(true); // needed for being treated as global observables

//set value of observed events
obs->setVal(nobs);

// make data set with the namber of observed events
RooDataSet data("data","", *obs );
w.import(data);

w.Print();

```

Another possible option is to write instead the constraint as a log-normal. We could first use as parameter of interest instead of s, mu, the ratio of the observed cross section divided by the nominal one. We introduce also another nuisance parameter the efficiency. The model is then built as below. We need to start defining the efficiency constraint, using an extra variable beta_eff, which is defined as Gaussian, and we use as efficiency as the exp( beta_eff), using the fact that if a variable t is normal distributed, exp(t) is log-normal.

```

w.factory( "eff_nominal[0.5]" );  // nominal efficiency
w.factory( "eff_kappa[1.20]" );  // relative error in efficiency ( log-normal parameter (1+ error)
w.factory( "expr:eff('eff_nominal*pow(eff_kappa,beta_eff)',eff_nominal,eff_kappa,beta_eff[0,-5,5])"\
);
w.factory( "Gaussian:constraint_eff(glob_eff[0,-5,5],beta_eff,1)" );
// glob_efficiency is the global observables
w.var("glob_eff")->setConstant(true);

// make model for on/off problem
w.factory("prod:nsig( mu[0,15], xsec, eff)" );
w.factory("sum:nexp(nsig,b[1,0,10])");
// Poisson of (n | s+ b)

w.factory("Poisson:pdf(nobs[0,50],nexp)");

// use tau :  y = tau*b
w.factory("prod:y(tau,b)");
w.factory("prod:y0(tau,b0[1,0,10])");
w.factory("Poisson:off_pdf(y0,y)");

// make global model
w.factory("PROD:model(pdf,off_pdf,constraint_eff)");

w.Print();

RooRealVar * obs = w.var("nobs");
w.var("b0")->setVal(b);
w.var("b0")->setConstant(true); // needed for being treated as global observables

```
We need also to define the ModelConfig by adding beta_eff as extra nuisance parameters

```   ModelConfig mc("ModelConfig",&w);
mc.SetPdf(*w.pdf("model"));
mc.SetParametersOfInterest(*w.var("mu"));
mc.SetObservables(*w.var("nobs"));
mc.SetNuisanceParameters(RooArgSet(*w.var("b"),*w.var("beta_eff")));

// these are needed for the hypothesis tests
mc.SetSnapshot(*w.var("mu"));
mc.SetGlobalObservables(*w.var("b0"));

mc.Print();
// import model in the workspace
w.import(mc);
```

## Exercises for this tutorial (Part 2)

Compute in this case using the same models generated before:

• 95% CL upper-limit with the inverter method and CLs, first using the frequentist calculator (with toys) then using the asymptotic formulae
• A significance test using the FrequentistCalculator

It is recommended to build also the ROOT macro in this case. Alternatively, one could use the standard Roostats tutorials.

*StandardHypoTestInvDemo.C for computing limit using the inverter *StandardHypoTestDemo.C for computing the significance

As for the other Standard tutorials, we can run these macros, by passing workspace file, workspace name, ModelConfig name and data set and in addition some extra options, which are documented at the end of this page.

### Run Hypothesis test Inverter

We run the hypothesis tests on the previous model (or the number counting). But first we need to create a background only model. We can copy it from the S+B model by setting the S value to zero:

```
ModelConfig*  sbModel = (ModelConfig*) w->obj(modelConfigName);
RooRealVar* poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first();
ModelConfig * bModel = (ModelConfig*) sbModel->Clone();
bModel->SetName(TString(sbModel->GetName())+TString("_with_poi_0"));
double oldval = poi->getVal();
poi->setVal(0);
bModel->SetSnapshot( *poi  );
poi->setVal(oldval);

```

Then we create the Hypo Test calculator class. We can choose between the FrequentistCalculator, the HybridCalculator or the AsymptoticCalculator. We give the code for both three cases: The Frequentist and the Hybrid uses toys top get the sampling distribution. You can than set the number of toys to be used. Note also that the hypo test calculator class takes first alternate model then null model. In this case, when we compute a limit the null is S+B.

```   FrequentistCalculator *  fc  = new FrequentistCalculator(*data, *bModel, *sbModel);
fc->SetToys(1000,500);    // 1000 for null (S+B) , 50 for alt (B)
```

• Code for the HybridCalculator (in this case we need to provides pdf for the nuisance parameters)

```   // // Hybrid calculator
HybridCalculator *  hc = new HybridCalculator(*data, *bModel, *sbModel);
hc->SetToys(1000,500);    // 1000 for null (S+B) , 50 for alt (B)
// check for nuisance prior pdf in case of nuisance parameters
if (sbModel->GetNuisanceParameters() && sbModel->GetNuisanceParameters()->getSize() > 0) {
// for hybrid need nuisance pdf
RooAbsPdf * nuisPdf = RooStats::MakeNuisancePdf(*sbModel,"nuisancePdf_sbmodel");
hc->ForcePriorNuisanceAlt(*nuisPdf);
hc->ForcePriorNuisanceNull(*nuisPdf);
}
```

• Code for the Asymptotic Calculator
```   // asymptotic calculator
AsymptoticCalculator * ac = new AsymptoticCalculator(*data, *bModel, *sbModel);
ac->SetOneSided(true);  // for one-side tests (limits)
AsymptoticCalculator::SetPrintLevel(-1);
```

We can now create the Inverter class

```   // create hypotest inverter
// passing the desired calculator
HypoTestInverter calc(*ac);

// set confidence level (e.g. 95% upper limits)
calc.SetConfidenceLevel(0.95);

// for CLS
bool useCLs = true;
calc.UseCLs(useCLs);
calc.SetVerbose(false);
```

We need also to configure the ToyMCSampler and set the test statistics. Note that in case we use the Poisson model, the pdf is not extended and we need to specify the number of events for each toy we generate, which in the Poisson model is 1.

```
ToyMCSampler *toymcs = (ToyMCSampler*)calc.GetHypoTestCalculator()->GetTestStatSampler();
// for number counting (extended pdf do not need this)
// toymcs->SetNEventsPerToy(1);

// profile likelihood test statistics
ProfileLikelihoodTestStat profll(*sbModel->GetPdf());
// for CLs (bounded intervals) use one-sided profile likelihood
if (useCLs) profll.SetOneSided(true);

// ratio of profile likelihood - need to pass snapshot for the alt
// RatioOfProfiledLikelihoodsTestStat ropl(*sbModel->GetPdf(), *bModel->GetPdf(), bModel->GetSnapshot());

// set the test statistic to use
toymcs->SetTestStatistic(&profll);

// if the pdf is not extended (e.g. in the Poisson model)
// we need to set the number of events
if (!sbModel->GetPdf()->canBeExtended())
toymcs->SetNEventsPerToy(1);

```

```

```

We can run now the inverter

```
int npoints = 10;  // number of points to scan
// min and max (better to choose smaller intervals)
double poimin = poi->getMin();
double poimax = poi->getMax();
//poimin = 0; poimax=10;

std::cout << "Doing a fixed scan  in interval : " << poimin << " , " << poimax << std::endl;
calc.SetFixedScan(npoints,poimin,poimax);

HypoTestInverterResult * r = calc.GetInterval();
```
And finally we can query the result and make some plots:
```

double upperLimit = r->UpperLimit();
double ulError = r->UpperLimitEstimatedError();
// double lowerLimit = r->LowerLimit();
// double llError = r->LowerLimitEstimatedError();
// if (lowerLimit < upperLimit*(1.- 1.E-4))
//    std::cout << "The computed lower limit is: " << lowerLimit << " +/- " << llError << std::endl;
std::cout << "The computed upper limit is: " << upperLimit << " +/- " << ulError << std::endl;

// compute expected limit
std::cout << "Expected upper limits, using the B (alternate) model : " << std::endl;
std::cout << " expected limit (median) " << r->GetExpectedUpperLimit(0) << std::endl;
std::cout << " expected limit (-1 sig) " << r->GetExpectedUpperLimit(-1) << std::endl;
std::cout << " expected limit (+1 sig) " << r->GetExpectedUpperLimit(1) << std::endl;

// plot now the result of the scan

HypoTestInverterPlot *plot = new HypoTestInverterPlot("HTI_Result_Plot","Feldman-Cousins Interval",r);

// plot in a new canvas with style
TCanvas * c1 = new TCanvas("HypoTestInverter Scan");
c1->SetLogy(false);

plot->Draw("CLb 2CL");  // plot also CLb and CLs+b
//plot->Draw("OBS");  // plot only observed p-value

// plot also in a new canvas the test statistics distributions

// plot test statistics distributions for the two hypothesis
// when distribution is generated (case of FrequentistCalculators)
const int n = r->ArraySize();
if (n> 0 &&  r->GetResult(0)->GetNullDistribution() ) {
TCanvas * c2 = new TCanvas("Test Statistic Distributions","",2);
if (n > 1) {
int ny = TMath::CeilNint( sqrt(n) );
int nx = TMath::CeilNint(double(n)/ny);
c2->Divide( nx,ny);
}
for (int i=0; i<n; i++) {
if (n > 1) c2->cd(i+1);
SamplingDistPlot * pl = plot->MakeTestStatPlot(i);
pl->SetLogYaxis(true);
pl->Draw();
}
}

```

### StandardHypoTestInvDemo.C

This is a macro in \$ROOTSYS/tutorials/roostats which can run the hypothesis test inversion with all these options. From the root prompt you can do, for running the AsymptoticCalculator on the parametric unbinned model:

```
.L \$ROOTSYS/tutorials/roostats/StandardHypoTestInvDemo.C+

StandardHypoTestInvDemo("SPlusBExpoModel.root", "w", "ModelConfig", "", "data", 2, 3, true, 20, 0, 100 )
```
You can look at the printed out observed and expected limits, and the CLs scan plot. Now run the full CLs calculation (it will take considerably more time but should still be manageable):
```
.L \$ROOTSYS/tutorials/roostats/StandardHypoTestInvDemo.C+

StandardHypoTestInvDemo("SPlusBExpoModel.root", "w", "ModelConfig", "", "data", 0, 3, true, 10, 0, 100, 500, false )
```
We can look at the sampling distribution plots obtained.

If running on the number counting model you need to set to trye the last argument

```
StandardHypoTestInvDemo("SPlusBExpoModel.root", "w", "ModelConfig", "", "data", 0, 3, true, 10, 0, 100, 500, true )
```
Here is the list of input arguments to the macro:
```StandardHypoTestInvDemo(const char * infile = 0,
const char * wsName = "combined",
const char * modelSBName = "ModelConfig",
const char * modelBName = "",
const char * dataName = "obsData",
int calculatorType = 0,
int testStatType = 0,
bool useCLs = true ,
int npoints = 6,
double poimin = 0,
double poimax = 5,
int ntoys=1000,
bool useNumberCounting = false,
const char * nuisPriorName = 0){
```
Where we have for the calculatorType:
• type = 0 Freq calculator
• type = 1 Hybrid calculator
• type = 2 Asymptotic calculator

• testStatType = 0 LEP
• = 1 Tevatron
• = 2 Profile Likelihood
• = 3 Profile Likelihood one sided (i.e. = 0 if mu < mu_hat)

• useCLs : scan for CLs (otherwise for CLs+b)

• npoints: number of points to scan , for autoscan set npoints = -1

• poimin,poimax: min/max value to scan in case of fixed scans (if min > max, try to find automatically)

• ntoys: number of toys to use

• useNumberCounting: set to true when using number counting events

• nuisPriorName: name of prior for the nnuisance. This is often expressed as constraint term in the global model

### Run Hypothesis test (Significance test)

On the same model we can run an hypothesis test to get the signal significance (number of sigma). In this case the null hypothesis is the background only model and the alternate is the S+B model.

Example running the frequentist calculator:

```   FrequentistCalculator *  fc  = new FrequentistCalculator(*data, *sbModel, *bModel);
fc->SetToys(1000,1000);    // 1000 for null (B) and 1000 for alt (S+B)

// create the test statistics
ProfileLikelihoodTestStat profll(*sbModel->GetPdf());
// use one-sided profile likelihood
profll.SetOneSidedDiscovery(true);

// configure  ToyMCSampler and set the test statistics
ToyMCSampler *toymcs = (ToyMCSampler*)fc->GetTestStatSampler();
toymcs->SetTestStatistic(&profll);

if (!sbModel->GetPdf()->canBeExtended())
toymcs->SetNEventsPerToy(1);

// run the test
HypoTestResult * r = fc->GetHypoTest();
r->Print();

// plot test statistic distributions
HypoTestPlot * plot = new HypoTestPlot(*r);
plot->Draw();
```

Optionally you can make, when using the shape model (Exponential background + Gaussian peak) a mass scan to get the p0 values as function of the mass. You loop on different mass values and then you run for each one of them the hypothesis test and you plot the p0 value (the null p-value) obtained from the hypothesis test.

We can run also the asymptotic calculator to get the asymptotic p-value and the expected one to be faster:

```   vector<double> masses;
vector<double> p0values;
vector<double> p0valuesExpected;

double massMin = 0.5;
double massMax = 8.5;

for( double mass=massMin; mass<=massMax; mass += (massMax-massMin)/40.0 )
{

cout << endl << endl << "Running for mass: " << mass << endl << endl;
w->var("mass")->setVal( mass );

AsymptoticCalculator *  ac  = new AsymptoticCalculator(*data, *sbModel, *bModel);
ac->SetOneSidedDiscovery(true);  // for one-side discovery test                                  \

AsymptoticCalculator::SetPrintLevel(-1);

HypoTestResult* asymCalcResult = ac->GetHypoTest();

asymCalcResult->Print();

masses.push_back( mass );
p0values.push_back( asymCalcResult->NullPValue() );

double expectedP0 = AsymptoticCalculator::GetExpectedPValues(  asymCalcResult->NullPValue(),  asy\
mCalcResult->AlternatePValue(), 0, false);
p0valuesExpected.push_back( expectedP0 );
}

```
You can then display in a graph the obtained observed and expected p-values:

```  TGraph * graph1  = new TGraph(masses.size(),&masses,&p0values);
TGraph * graph2  = new TGraph(masses.size(),&masses,&p0valuesExpected);

graph1->SetMarkerStyle(20);
graph1->Draw("APC");
graph2->SetLineStyle(2);
graph2->Draw("C");
```

## Combination of Model

As last example we make a combined model from two different channels. We assume we have two gaussian signal plus exponential background in two different channels, where the signal parameters are common, while the background parameters are different for the two channels. We use the factory tool to build the combined pdf, the RooSimultaneous for the two channel. We will generate some pseudo-data for the two channels and we will make a combined fit and then save this workspace in a file. The workspace can then be used by any of the RooStats calculator we have described before.

We start making the pdf for each single channel. Note we redefine nsig = mu * xsec where xsec is different for the two channel and is given, while mu is the parameter of interest of the model

```   RooWorkspace w("w");

w.factory("Exponential:bkg1_pdf(x[0,10], a1[-0.5,-2,-0.2])");
w.factory("Gaussian:sig_pdf(x, mass, sigma[0.5])");

w.factory("prod:nsig1(mu[1,0,5],xsec1)");

w.factory("SUM:model1(nsig1*sig_pdf, nbkg1[1000,0,10000]*bkg1_pdf)");  // for extended model

w.factory("Exponential:bkg2_pdf(x, a2[-0.5,-2,-0.2])");

w.factory("prod:nsig2(mu,xsec2)");
w.factory("SUM:model2(nsig2*sig_pdf, nbkg2[100,0,10000]*bkg2_pdf)");  // for extended model
```

Now we can generate the joint model

```
// Create discrete observable to label channels
w.factory("index[channel1,channel2]");

// Create joint pdf (RooSimultaneous)
w.factory("SIMUL:jointModel(index,channel1=model1,channel2=model2)");
```

We generate also a data set from the created model. Note that we need to add to the observables also the category

```
RooAbsPdf * pdf = w.pdf("jointModel");
RooRealVar * x = w.var("x");  // the observable
RooCategory * index = w.cat("index");  // the category

// generate the data (since it is exetended we can generate the global model

// use fixed random numbers for reproducibility
RooRandom::randomGenerator()->SetSeed(111);
// generate binned
// plot the generate data in 50 bins (default is 100)
x->setBins(50);

// generate events of joint model
// NB need to add also category as observable
RooDataSet * data = pdf->generate( RooArgSet(*x,*index), AllBinned());  // will generate accordint \
to total S+B events
data->SetName("data");
w.import(*data);

```
Here is an example on how we can fit the model and also plot the resulting pdf for each channel separately

```
RooPlot * plot1 = x->frame();
RooPlot * plot2 = x->frame();
data->plotOn(plot1,RooFit::Cut("index==index::channel1"));
data->plotOn(plot2,RooFit::Cut("index==index::channel2"));

RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad"));
r->Print();

pdf->plotOn(plot1,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel1"));
pdf->plotOn(plot2,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel2"));
//draw the two separate pdf's
pdf->paramOn(plot1);
c1 = new TCanvas();
c1->Divide(1,2);
c1->cd(1); plot1->Draw();
c1->cd(2); plot2->Draw();

```

We then create ModelConfig and save all workspace in a file to be used later

```  RooStats::ModelConfig mc("ModelConfig",&w);
mc.SetPdf(*pdf);
mc.SetParametersOfInterest(*w.var("mu"));
mc.SetObservables(RooArgSet(*w.var("x"),*w.cat("index")));

// define set of nuisance parameters
w.defineSet("nuisParams","a1,nbkg1,a2,nbkg2");

mc.SetNuisanceParameters(*w.set("nuisParams"));

// set also the snapshot
mc.SetSnapshot(*w.var("mu"));

// import model in the workspace
w.import(mc);

// write the workspace in the file
TString fileName = "CombinedModel.root";
w.writeToFile(fileName,true);
cout << "model written to file " << fileName << endl;
```

We can then use this workspace on the different RooStats calculators.

-- LorenzoMoneta - 26-Apr-2012

Topic revision: r9 - 2012-08-21 - LorenzoMoneta Log In  Cern Search TWiki Search Google Search RooStats All webs Copyright &© 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback