A first set of RooStats tutorials were given at CERN on October 15-16th 2009: (agenda: http://indico.cern.ch/conferenceDisplay.py?confId=68912). Below is given some materials used during the RooStats hands-on sessions. The RooFit tutorials are available here.
RooStats is a software package relying on RooFit for the description of model and data which in turn provides utilities for fitting, plotting, Monte-Carlo studies, etc. On top of RooFit, the RooStats package provides:
ProfileLikelihoodCalculator
,
HybridCalculator
,
HypoTestInvertor
,
BayesianCalculator
,
MCMCCalculator
,
NeymanConstruction
and FeldmanCousins
,
HLFactory
, ModelConfig
, BernsteinCorrection
, SPlot
, etc.
The RooStats package has been designed in common between Atlas (K. Cranmer), CMS (G. Schott), RooFit (W. Verkerke) and ROOT (L. Moneta) and is available from the ROOT releases since version 5.22. As of October 2009, the project is getting maturity, most of the for seen methods are included in ROOT version 5.25.02 while a few more developments are still planned. The link to the project web page is: https://twiki.cern.ch/twiki/bin/view/RooStats/WebHome
A few advice words before starting. Remember to:
.L macro.C+
then macro();
,
using namespace RooFit; using namespace RooStats;
,
Setting ROOT 5.25.02 using the CERN AFS installation:
/afs/cern.ch/sw/lcg/app/releases/ROOT/5.25.02/slc4_amd64_gcc34/root/bin/thisroot.csh
or .sh
/afs/cern.ch/sw/lcg/app/releases/ROOT/5.25.02/x86_64-slc5-gcc34-opt/root/bin/thisroot.csh
or .sh
Some links to documentation:
$ROOTSYS/tutorials
Below will given code snippets showing how to try different analysis models:
Compute:
HypoTestInvertor
and BayesianCalculator
RooStats tutorials macro (an updated version is provided below) give the solution. They work in two steps:
rs500d_PrepareWorkspace_GaussOverFlat_withSystematics("WS_GaussOverFlat_withSystematics.root",0)
rs501_ProfileLikelihoodCalculator_limit("WS_GaussOverFlat_withSystematics.root")
Tutorials you could use are:
rs500a
to rs500d
for the model PDF specification,
rs501
and rs502
for the likelihood approach,
rs505
for the classical hypothesis test,
rs701
for the Bayesian credibility interval (tomorrow),
rs801
for classical upper limit (HypoTestInvertor
class).
The updated version of the above tutorials is available from:
http://cern.ch/schott/public/updated_roostats_tutorials.zip
On linux, unzip the files with: unzip updated_roostats_tutorials.zip
The model is handled by RooFit and in the examples below, the RooFit::RooWorkspace factory is used for that. A few examples are given:
// Use a RooWorkspace to store the PDF models, prior informations, list of parameters, ... RooWorkspace myWS("myWS"); // Number of signal and background events myWS.factory("S[2,0,10]"); // default value 2 and range [0,10] myWS.factory("B[1]"); // value fixed to 1 // Observable (dummy observable used in case of Poisson likelihood function) myWS.factory("x[0,1]"); // arbitrary range [0,1] myWS.var("x")->setBins(1); // Signal and background distribution of the observable myWS.factory("Uniform::sigPdf(x)"); myWS.factory("Uniform::bkgPdf(x)"); // S+B and B-only models (both extended PDFs) myWS.factory("SUM::model(S*sigPdf,B*bkgPdf"); myWS.factory("ExtendPdf::modelBkg(bkgPdf,B)"); // Set random seed to make the generation reproducible (RooRandom::randomGenerator())->SetSeed(4357); // generate binned data with fixed number of events RooAbsData* data = myWS.pdf("model")->generateBinned(*myWS.var("x"),myWS.var("S")->getVal()+myWS.var("B")->getVal(),Name("data"));
// Observable myWS.factory("mass[0,500]"); // range [0,500] // Signal and background distribution of the observable myWS.factory("Gaussian::sigPdf(mass,250,15)"); myWS.factory("Uniform::bkgPdf(mass)");
// set B and sigma as a non-constant variable ans specify a range myWS->var("B")->setConstant(false); myWS->var("B")->setRange(0,200); myWS->var("sigSigma")->setConstant(false); myWS->var("sigSigma")->setRange(0,30); // Other prior PDFs myWS.factory("Gaussian::prior_sigSigma(sigSigma,15,5)") ; myWS.factory("Gaussian::prior_B(B,100,50)"); myWS.factory("PROD::priorNuisance(prior_sigSigma,prior_B)");
Specify the data, the model and a list of parameters of interest. Set the size of the confidence interval:
// Set up the ProfileLikelihoodCalculator ProfileLikelihoodCalculator plc(*data,*model,*POI); // ProfileLikelihoodCalculator usually make intervals: the 95% CL one-sided upper-limit is the same as the two-sided upper-limit of a 90% CL interval plc.SetTestSize(0.10);
Confidence interval:
model->fitTo(*data,SumW2Error(kFALSE)); LikelihoodInterval* interval = plc.GetInterval(); model->fitTo(*data,SumW2Error(kFALSE)); const double lowerLimit = interval->LowerLimit(*parameterOfInterest); const double upperLimit = interval->UpperLimit(*parameterOfInterest);
Significance:
// Create a copy of the POI parameters to set the values to zero RooArgSet nullparams; nullparams.addClone(*parameterOfInterest); ((RooRealVar *) (nullparams.first()))->setVal(0); plc.SetNullParameters(nullparams); HypoTestResult* plcResult = plc.GetHypoTest(); const double significance = plcResult->Significance();
LikelihoodIntervalPlot plot(interval); plot.Draw();
There is currently a problem with profile-likelihood plots in presence of systematic uncertainty; the interval gets out right but the plot doesn't. Below is a patch to get the correct plot until this is fixed in RooStats:
// patch to profile-likelihood plot with systematics TF1 * tmp = interval->GetLikelihoodRatio()->asTF(*parameterOfInterest); // clone to sample the PL calculation TF1 * f1 = (TF1 *) tmp->Clone(); f1->Draw("SAME");
HybridCalculator
performs sampling of the selected test statistics in both S+B and B-only hypotheses. Then it integrates this distribution to compute the p-values or the CL_S ratio.
Bayesian prior on nuisance parameters are integrated out (using a Monte-Carlo technique).
Other features are not documented here, such as: switching test statistics, merging results run independently on batch, etc.
Specify the data and the null and alternative hypotheses models. More options can be specified later. See the rs201 macro for more details (but not today).
HybridCalculator hc("hc","HybridCalculator",*data,*modelSB,*modelB); hc.SetNumberOfToys(5000); hs.SetTestStatistics(1);
HybridResult* hcResult = hc.GetHypoTest(); double p_value_sb = hcResult->AlternatePValue(); double p_value_b = hcResult->NullPValue(); double cl_s = hcResult->CLs(); double significance = hcResult->Significance();
HybridPlot* hcPlot = hcResult->GetPlot("hcPlot","p-Values plot",100); hcPlot->Draw();
The HypoTestInvertor
in ROOT 5.25.02 is using HybridCalculatorResult
for different value of the parameter of interest until it identifies the value of the parameter of interest such as the alternate (ie. S+B) hypothesis is excluded at a given confidence level. In the next ROOT version HypoTestInvertor
will be extended to also invert hypothesis tests from ProfileLikelihoodCalculator
. The inversion is based on the CL_S ratio estimated by each hypothesis test. Usage of CL_SB has been make configurable for the next future release.
In the rush for the ROOT release one line was forgotten in the constructor. A minor change fixes it for ROOT 5.25.02 (it has been fixed for the next release).
namespace Temp { class HypoTestInvertor : public RooStats::HypoTestInvertor { public: HypoTestInvertor(const char * name, const char * title, RooStats::HypoTestCalculator * calc, RooRealVar * var) : RooStats::HypoTestInvertor(name, title, calc, var){} virtual void SetModel( const RooStats::ModelConfig & ){} ClassDef(Temp::HypoTestInvertor,1); }; }This patch requires to run macros in compiled mode exclusively.
One starts by instantiating and configuring a HybridCalculator
object:
HybridCalculator myhc("myhc","HybridCalculator",*data, totPdf, bkgPdf); myhc.SetTestStatistics(2); myhc.SetNumberOfToys(1000);
Use the patch version of HypoTestInvertor
in the Temp
namespace.
Temp::HypoTestInvertor myInvertor("myInvertor","",&myhc,¶meterOfInterest); myInvertor.SetTestSize(0.05);
Three types of running modes are currently available:
myInvertor.RunOnePoint(3.9);
myInvertor.RunFixedScan(5,1,6);
here the interval is [1,6] and 5 points will be tested
myInvertor.RunAutoScan(1,6,0.005);
here the interval is [1,6] and points will be tested until one is found with CL_S = 0.050 +/- 0.005.
HypoTestInvertorResult* results = myInvertor.GetInterval(); HypoTestInvertorPlot myInvertorPlot("myInvertorPlot","",results); TGraph* gr1 = myInvertorPlot.MakePlot();
The following plot show an example of hypothesis test inversion based on CL_SB assuming 1 observed event and 1 expected background event. From the PDG 2008 table 32.3, the upper limit is expected to be 4.74 as was obtained with RooStats (substracting the 1 background event you get 3.74 on the plot below).
The BayesianCalculator
class implements the Bayesian theorem. It also return a confidence interval for one single parameter of interest
Suggestions for the exercises:
BayesianCalculator
to those from HybridCalculator
on the same model for both upper and lower limits.
The BayesianCalculator
constructor inputs: the data, the PDF model, the list of parameters of interest and their prior PDF on the parameter of interest and nuisance parameters to integrate out in the pdf model.
BayesianCalculator bc(data,*model,RooArgSet(*POI),*priorPOI,&nuisanceParameters);
Set the confidence level of the credibility interval and compute it. Returns a SimpleInterval
.
bc.SetTestSize(0.05); SimpleInterval* interval = bc.GetInterval(); double lowerLimit = interval->LowerLimit(); double upperLimit = interval->UpperLimit();
// In ROOT >5.25.02: // bc.GetPosteriorPlot()->Draw(); // This is the current plot: // bcalc.PlotPosterior(); // The code below produce a much nicer plot: RooAbsPdf* fPosteriorPdf = bcalc.GetPosteriorPdf(); RooPlot* plot = POI->frame(); plot->SetTitle(TString("Posterior probability of parameter \"")+TString(POI->GetName())+TString("\"")); fPosteriorPdf->plotOn(plot,RooFit::Range(interval->LowerLimit(),interval->UpperLimit(),kFALSE),RooFit::VLines(),RooFit::DrawOption("F"),RooFit::MoveToBack(),RooFit::FillColor(kGray)); fPosteriorPdf->plotOn(plot); plot->GetYaxis()->SetTitle("posterior probability"); plot->Draw();
Download exercises from here:
http://cern.ch/cranmer/public/RooStats_CompareIntervals.tar
or on lxplus
~cranmer/public/RooStats_CompareIntervals.tar
We can modify the 500d tutorial to have a floating mass, and fit a 2-d scan. This is already done in
rs500e_PrepareWorkspace_GaussOverFlat_withSystematics_floatingMass.C
rs501_ThreeTypesOfLimits.C
A http://www-ekp.physik.uni-karlsruhe.de/~schott/BCRooInterface/ has been developed to the http://www.mppmu.mpg.de/bat/ (BAT). It is possible with it to run on the same problem as described above with Bayesian method using Markov Chain MC. Here is how to install BAT (on lxplus using the tcsh shell):
wget http://www.mppmu.mpg.de/bat/source/BAT-0.3.1.tar.gz tar xvfz BAT-0.3.1.tar.gz cd BAT-0.3.1 ./configure --prefix=${PWD} make make install setenv BATINSTALLDIR ${PWD} setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:${BATINSTALLDIR}/lib setenv CPATH ${BATINSTALLDIR}/include cd models/RooInterface make
Then copy the input ROOT file (containing the workspace, ie. WS_GaussOverFlat_withSystematics_floatingMass.root
) in the RooInterface
directory and run with:
./runRooInterface.exe WS_GaussOverFlat_withSystematics_floatingMass.root myWS gv bat_plots.ps
We are lucky, it worked out-of-the-box since we have been using the the workspace preparation the same name conventions as the defaults sets in the BAT/RooInterface
. Here is one of the plots of the results:
-- GregorySchott - 15-Oct-2009 -- KyleCranmer - 16-Oct-2009 -- GregorySchott - 29-Oct-2009
I | Attachment | History | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|---|
png | BAT_ContourPlot.png | r1 | manage | 138.7 K | 2009-10-29 - 15:16 | UnknownUser | |
png | BayesianCalculatorPlot.png | r1 | manage | 15.6 K | 2009-10-15 - 20:53 | UnknownUser | |
png | HybridResultPlot_GaussOverFlat.png | r1 | manage | 9.1 K | 2009-10-16 - 08:57 | UnknownUser | |
png | HybridResultPlot_Poisson.png | r1 | manage | 8.8 K | 2009-10-16 - 08:56 | UnknownUser | |
png | HypoTestInvertorPlot.png | r1 | manage | 11.4 K | 2009-10-15 - 20:10 | UnknownUser | |
png | ProfileLikelihoodCalculator_UpperLimitPoisson.png | r1 | manage | 9.5 K | 2009-10-16 - 08:56 | UnknownUser | |
png | RooDataSetGaussOverFlat.png | r1 | manage | 10.2 K | 2009-10-16 - 08:56 | UnknownUser | |
png | RooDataSetPoisson.png | r1 | manage | 5.1 K | 2009-10-16 - 08:56 | UnknownUser | |
gif | ThreeLimitsComparison.gif | r1 | manage | 8.4 K | 2009-10-16 - 13:38 | KyleCranmer |