RooRealVar v("v","My Variable",5,0,10); //value is 5, range is 0->10 v.setVal(6); //change value v.setMin(-5); //change min v.setMax(11); //change max v.setConstant(true); //make a constant |

RooCategory c("c","My Discrete var"); c.defineType("val1",1); c.defineType("val2",5); c.setIndex(1); //to set using the type index c.setLabel("val2"); //to set using the name of the type |

RooAddition add("add","my add",RooArgSet(v1,v2)); RooFormulaVar add2("add","alternative add", "@0+@1",RooArgList(v1,v2)); //the same thing |

Note the use of the RooArgList, since is an ordered version of a RooArgSet... the ordering is important clearly to get the formula right.

RooWorkspace wYou can do:

Old way | New way | |

Variables | RooRealVar n("n","number of events",10,0,5000) | w.factory("n[10,0,5000]") |

Var Addition | RooAddition mean("mean","s+b",RooArgSet(s,b) | w.factory("sum::mean(s[5,0,100],b[5,0,100])") |

Var Product | RooProduct myProduct("myProduct","myProduct",RooArgSet(v1,v2)) | w.factory("prod::myProduct(v1,v2)") |

Generic Function | RooFormulaVar myFunc("myFunc","","@0*@1+sin(@2)",RooArgList(v1,v2,v3)) | w.factory("expr::myFunc('v1*v2+sin(v3)',v1,v2,v3) w.factory("expr::myFunc('@0*@1+sin(@2)',{v1,v2,v3})") |

Pdfs | RooPoisson pois("pois","my poisson",n,mean) | w.factory("Poisson::pois(n,mean)") |

Pdf Addition | RooAddPdf model("model","my model",RooArgList(pdf1,pdf2),coeff1) | w.factory("SUM::model(coeff1[0.5,0,1]*pdf1,pdf2)") |

Pdf Product | RooProdPdf model("model","my model",RooArgSet(pdf1,pdf2)) | w.factory("PROD::model(pdf1,pdf2)") |

Generic pdf | RooGenericPdf.... | w.factory("EXPR::myPdf('a+b',a,b)") |

Category | RooCategory c("c","");c.defineType("val1=3"); | w.factory("c[val1=3]"); |

You can then access the variables (as pointers) through the workspace:

w.var("n")->Print() w.function("mean") w.pdf("model") w.cat("c")To add manually created stuff, use the

Roostats likes to make use of arg sets (RooArgSet), and these can be defined, stored, and accessed through the workspace like this:

w.defineSet("observables","n") w.defineSet("mySet","s,b") w.set("observables"); //pointer to observables RooArgSet

To inspect your workspace, just call

w.Print()on it, and it will show you whats in there!

RooDataSet data("myData","myData",*w.var("n"))If we wanted a dataset with many observables, we define a set in the workspace and then construct from that:

RooDataSet data2("myData2","myData2",*w.set("mySet"))You could add the dataset to the workspace as well, but it usually isn't necessary.... the use of the workspace was only to exploit the

w.import(data); //NOTE this makes a copy of your dataset, modifying data wont change the dataset in the workspace w.data("myData"); //gives you a pointer to it for future use/modificationsTo set the value of your "n" you will set the value of the var that is included in the dataset and then

w.var("n")->setVal(40) data.add(*w.var("n"))Inspecting the dataset can be done like this:

data.Print("v"); //tells you how many entries and what variables are in the dataset data.get(0)->Print("v"); //gets the 0th entry of the dataset (if it exists), which is a RooArgSet, and prints the values of all the vars in that set

1. The way the first arg is added to the set determines if the set **owns** the args. addOwned(..) will activate ownership (all subsequent calls must be addOwned). add(..) will activate nonOwnership (all subsequent calls must be add). addClone(...) creates a clone of the arg (arg->Clone()) and takes ownership of that clone. Therefore you cannot addClone followed by add, or vice versa - i.e. a set with a clone must own all its args (set->isOwning() == true).

2. If it owns, it will try to delete the args when it is destroyed

3. All constructors (including the copy constructor) will use "add", i.e. the RooArgSet you make will not own the args it is "copying" from the rvalue collection.

Basically, you can run into all sorts of troubles if you dont keep track of ownership, for example:

RooArgSet* set1 = new RooArgSet; set1->addOwned(*someVar); //someVar should not be deleted by anything else RooArgSet set2(*set1); //asking for trouble... if set1 is now deleted (or if it had been on the stack and goes out of scope) then you have a bunch of pointers to bad objects set1->releaseOwnership(); //now it wont delete the args when it is deleted set2.takeOwnership(); //if we dont do this, we have a potential memory leak, if nothing deletes the variable. delete set1; //this is now ok, set2 has ownership

You can take a 'snapshot' of any RooAbsCollection, which returns an owning collection of clones.

The value of a plotted pdf is NEvt times binWidth times getVal(observables), where observables are the variables of the frame, and are used are integrated over to give the denominator of the pdf value

I start with one channel, a simple counting experiment (i.e. one bin), one single background estimate with an uncertainty (this uncertainty would include everything, including lumi uncertainty if the estimate includes MC estimates), luminosity with an uncertainty, and an overall acceptance estimate, with uncertainty.

Description | Quantity | Measurement |

Luminosity | 5.0 +/- (3.9%) fb-1 | |

Number of observed events | 61 | |

Background estimate | 41.7 +/- 4.6 | |

Signal Acceptance | 0.71 +/- 0.09 |

Here's a quick estimate for what the cross-section should come out to be:

Here I introduce the convention that parameters (i.e. anything that is allowed to float during the likelihood maximization) are primed. So here I have used gaussian constraint terms for each of the nuisance parameters, based on the measured/estimated values my analysis had for them. Therefore the groupings for each term in this pdf are: data = Parameters of interest = Nuisance parameters =

Here is the code to construct this model inside a workspace. You are required to define a range for each of the variables involved, otherwise the numbers are constants, unable to fluctuate as part of the fitting procedure to follow. You could use infinities (given by `+INF`

and `-INF`

) for the limits but it is better to pick sensible values, e.g. lower limit of 0 - this gets called a **truncated gaussian** but is only significantly different from a regular gaussian if the mean of the gaussian is within a few st.dev of 0, which isn't the case in this example.

using namespace RooFit; //construct the model RooWorkspace w("w"); w.factory("Gaussian::constraint_b(nuisance_b[41.7,0,100],41.7,4.6)"); //constrained b to be positive - "truncated gaussian" w.factory("Gaussian::constraint_acc(nuisance_acc[0.71,0,1],0.71,0.09)"); //constrained acc in range 0-1 w.factory("Gaussian::constraint_lumi(nuisance_lumi[5.0,0.0,10.0],5.0,0.195)"); //constrained lumi from 0 to 10.0 w.factory("prod::s(sigma[0,100],nuisance_lumi,nuisance_acc)"); //constructs a function w.factory("sum::mean(s,nuisance_b)"); //another function w.factory("Poisson::pois(n[61,0,100],mean)"); //a pdf (special function) w.factory("PROD::model(pois,constraint_b,constraint_lumi,constraint_acc)"); //another pdf //define RooArgSets for convenience w.defineSet("obs","n"); //observables w.defineSet("poi","sigma"); //parameters of interest w.defineSet("np","nuisance_b,nuisance_lumi,nuisance_acc"); //nuisance parameters RooDataSet data("data", "data", *w.set("obs")); data.add(*w.set("obs")); //actually add the data |

The prod and sum lines of the model could have been combined into one generic function line using either of the following:

w.factory("expr::mean('@0*@1*@2+@3',{sigma[0,100],nuisance_lumi,nuisance_acc,nuisance_b})"); w.factory("expr::mean('sigma*nuisance_lumi*nuisance_acc+nuisance_b',sigma[0,100],nuisance_lumi,nuisance_acc,nuisance_b)"); |

The whole model could in fact be done in one line, because the factory method allows terms to be nested...

w.factory("PROD::model(Poisson::pois(n[61,0,100],expr::mean('sigma*nuisance_lumi*nuisance_acc+nuisance_b',sigma[0,100],nuisance_lumi[5.0,0.0,10.0],nuisance_acc[0.71,0,1],nuisance_b[41.7,0,100])),\ Gaussian::constraint_b(nuisance_b,41.7,4.6),Gaussian::constraint_acc(nuisance_acc,0.71,0.09),Gaussian::constraint_lumi(nuisance_lumi,5.0,0.195))"); |

To do this in Roofit, ask the model to create the NLL...

RooAbsReal* nll = w.pdf("model")->createNLL(data); |

[#1] INFO:Minization -- Including the following contraint terms in minimization: (constraint_b,constraint_lumi,constraint_acc) |

root [20] nll->Print("v") --- RooAbsArg --- .... Servers: (0x5d13d10,V-) RooNLLVar::nll_model_data "-log(likelihood)" (0x5d639f0,V-) RooConstraintSum::nll_model_data_constr "nllCons" Proxies: !set -> 1) nll_model_data 2) nll_model_data_constr root[21] nll->getComponents()->find("nll_model_data")->Print(); //etc etc |

RooPlot* frame = w.var("sigma")->frame(); nll->plotOn(frame,ShiftToZero()); //the ShiftToZero option puts the minimum at 0 on the y-axis frame->Draw();We can see the minimum is about where we expect it to be (about 5), but lets run the minimizer (minuit) to find it for sure, and also evaluate the uncertainty on the parameter estimate for sigma. Here you have three choices:

where is the parameter of interest, are the other parameters, are the values of the nuisance parameters that maximize the likelihood for that particular value of (these are called the **profile maximum likelihood estimators**) and the denominator is simply the maximum likelihood value and acts as a normalization.

After forming this pll function, the errors are taken as the point where two times the negative log of this function is equal to 1. This comes from statistics results suggesting is distributed so 1 sigma of the distribution is located less than 1. So the lower and upper errors are just the limits of this region.

RooAbsReal* pll = nll->createProfile(*w.set("poi")) pll->plotOn(frame,LineColor(kRed)) frame->Draw()I've zoomed in and drawn a line on the plot to show where we expect the errors to be (pll is just the negative log of the profile likelihood ratio, hence the line at 0.5 instead of 1). So by eye we are expecting sigma to be about 5.5, with lower limit of about 3.0 and upper limit of about 8.0.

If we run the minimizer now, followed by minos for the errors...

RooMinuit mini(*nll); mini.minos(*w.set("poi")); //could give no arg list and it will calculate minos errors for all the parameters |

The maximum likelihood estimator (MLE) values for the parameters get propagated back to the variables inside the RooWorkspace, and the minos asymmetric errors get calculated for the poi. You'll see this in the printout, but you can save the information to a "RooFitResult" object like this:

RooFitResult* res = mini.save("myResult","My Result"); res->status(); //should be 0 for success! |

To access the info about sigma:

w.var("sigma")->getVal();w.var("sigma")->getErrorLo(); w.var("sigma")->getErrorHi(); |

So drawing these low and high limits on our plot we get this (note that `getErrorLo()`

returns a negative number by convention):

l.DrawLine(w.var("sigma")->getVal()+w.var("sigma")->getErrorLo(),0,w.var("sigma")->getVal()+w.var("sigma")->getErrorLo(),0.5); l.DrawLine(w.var("sigma")->getVal()+w.var("sigma")->getErrorHi(),0,w.var("sigma")->getVal()+w.var("sigma")->getErrorHi(),0.5) |

Note that all this can be accomplished in one line, using the **fitTo** method, where you fit the original model to the data - the default roofit fitting method is to maximize the likelihood function constructed from the dataset you give to the model....

RooFitResult* res = w.pdf("model")->fitTo(data,Minos(*w.set("poi")),Save(),Hesse(false)); |

So the final result I have for this model is which is in line with our quick estimate from earlier.

{ using namespace RooFit; //construct the model RooWorkspace w("w"); w.factory("Gaussian::constraint_b(nuisance_b[41.7,0,100],41.7,4.6)"); //constrained b to be positive - "truncated gaussian" w.factory("Gaussian::constraint_acc(nuisance_acc[0.71,0,1],0.71,0.09)"); //constrained acc in range 0-1 w.factory("Gaussian::constraint_lumi(nuisance_lumi[5.0,0.0,10.0],5.0,0.195)"); //constrained lumi from 0 to 10.0 w.factory("prod::s(sigma[0,100],nuisance_lumi,nuisance_acc)"); w.factory("sum::mean(s,nuisance_b)"); w.factory("Poisson::pois(n[61,0,100],mean)"); w.factory("PROD::model(pois,constraint_b,constraint_lumi,constraint_acc)"); //define RooArgSets for convenience w.defineSet("obs","n"); //observables w.defineSet("poi","sigma"); //parameters of interest w.defineSet("np","nuisance_b,nuisance_lumi,nuisance_acc"); //nuisance parameters RooDataSet data("data", "data", *w.set("obs")); data.add(*w.set("obs")); //actually add the data RooFitResult* res = w.pdf("model")->fitTo(data,Minos(*w.set("poi")),Save(),Hesse(false)); if(res->status()==0) { w.var("sigma")->Print(); } else { cout << "Likelihood maximization failed" << endl; } } |

where is the profile likelihood ratio, and is the poi. This means each different value of the poi has its very own test statistic. Low values of indicate good agreement with that choice of . We need to find the that has its corresponding for the **observed** data at a p-value of 0.05, using the distribution of assuming that that value of is correct. Mathematically, we want to find satisfying:

The is to indicate that when generating toy mc (or otherwise) in order to build up the distribution of the test statistic, you should choose all the nuisance parameter values to be the ones that they took for the conditional MLE when you formed the likelihood using the observed data and fix the poi value to . I.e. you generate pseudo-data for all the observables in your model, but to do this you have to choose values for all the parameters, which you do in the way described here. But then for each time you have generated pseudo-data (toy mc) you then allow the parameters to float as usual for when you determine the value of the test statistic.

We start from the example problem in section 1. When we computed and drew the profile likelihood ratio we were drawing the observed value of each of the test statistics (without a factor of 2) for each possible value of . So clearly where this curve (the red curve) is 0 is where there is the most compatibility between that particular model and the observed data. So at what point along the poi axis is the observed test statistic at the p-value=0.05 critical point?

One way is to use the asymptotic formulae for the distribution of our test statistic (Cowan, Cranmer, Gross, Vitells).

The distribution is non-central distributed, with non-centrality parameter: where is the st.dev of the unconditional MLE (the mean value of is obviously the value , since we would expect on average for the best fit value of to come out as the underlying value that is used to generate the distribution). Thankfully we do not need to determine just yet because we want to know the distribution in the case where - so this gives a non-centrality parameter of 0, and so the test statistic is distributed according to a central (i.e. regular) distribution with 1 d.o.f; aka Wilk's theorem. All this holds for the case of one poi, mutiple poi I will come to later.

So the p-value quantity is just given in ROOT by a chi2 distribution with 1 dof:

p_mu = TMath::Prob(t_obs,1.); // = 1 - (2*ROOT::Math::normal_cdf(sqrt(t_obs),1.) - 1); |

(Remember, a chi^2 value of x corresponds to being sqrt(x) st.dev away from the centre of a normal distribution, so we are computing the fraction of the normal distribution within sqrt(x) st.devs of 0, and subtracting that from 1).

So taking the problem from section 1, we can plot these p-values easily with the following code:

RooAbsReal* nll = w.pdf("model")->createNLL(data); RooAbsReal* pll = nll->createProfile(*w.set("poi")); RooFormulaVar p_mu("p_mu","p_{#mu} using asymptotic formulae","TMath::Prob(2*@0,1.)",RooArgList(*pll)); RooPlot* frame = w.var("sigma")->frame(Range(0,30)); p_mu.plotOn(frame,LineColor(kGreen)); frame->Draw(); TLine l;l.SetLineStyle(2); l.DrawLine(0,0.05,30,0.05); |

To give this:

So where the curve intersects the dashed line are the 95% CL limits (CL(s+b) limits). You can extract the location of this as follows:

double limit = p_mu.findRoot(*w.var("sigma"),0,30,0.05); |

Note that if you had multiple roots to find, you would need an iterative procedure to find them all, one by one - a simple while loop would do. Here I get about 11.5 as the upper limit

This was drawn by making a custom function in RooFit, where the value of the function is set to 0 for values below the MLE. To make a custom function, first do:

RooClassFactory::makeFunction("RooOneSidedProfileLL","profileLL") |

Then in the `evaluate()`

method of the generated cxx file, add:

Double_t pllVal = profileLL; //need to evaluate profileLL first const RooProfileLL* ll = dynamic_cast |

You'll need to also add the include for the RooProfileLL class: `#include "RooProfileLL.h"`

Then you can compile that class and use it like this:

.L RooOneSidedProfileLL.cxx+ RooOneSidedProfileLL opll("opll","opll",*pll); //pll is the pointer to your profilell function made earlier opll.plotOn(frame,LineColor(kMagenta)); |

This new test statistic has a different asymptotic formula for its distribution to the previous test statistic : it is the sum of a delta function (at 0) and a chi^2 distribution, with a factor of 0.5 in front of both. This makes sense: if you throw toys (pseudo-data) under the assumption of the given value of , we expect the unconditional MLE to fluctuate according to a gaussian with mean = . So if you want the distribution of for this same value of , half the time you'll get exactly 0 from the test statistic (the came out bigger than ) and half the time it will be the same as what would have given you.

How does this alter the cumulative distribution and the subsequent p-value? The p-value is the fraction of the distribution further from 0 than the observed test statistic value. But now half of the distribution has shifted to sit exactly at 0 (because of the delta function). So the cumulative distribution (i.e. integrating from 0 up to some value) is half whatever it was before, plus 0.5 from the delta function that appears at 0. The cumulative distributions are thus given in terms of the cumulative distribution of a normal distribution, as:

Test statistic | Cumulative Distribution | Code |

`2*ROOT::Math::normal_cdf(sqrt(t_obs),1.) - 1;` |
||

`ROOT::Math::normal_cdf(sqrt(t_obs),1.);` |

And the p-values as thus given by 1-Cumulative Distribution:

Test statistic | P-Value | Code | Value of test-stat @ pVal=0.05 |

`2*(1.-ROOT::Math::normal_cdf(sqrt(t_obs),1.));` or `TMath::Prob(t_obs,1.);` |
(delta-log-likelihood value is t_mu/2 = (1.959^2)/2 = 1.92) | ||

`ROOT::Math::normal_cdf_c(sqrt(t_obs),1.);` |

(Note that the value of the sqrt of the test statistic that gives you a p-value of 0.05 corresponds to the one-tailed or two-tailed nSigma points for a 95% interval. I.e. 95% of a normal distribution is within +/-1.959sigma .... so the two-sided test stat corresponds to the 95% confidence interval where the 5% outside the interval is two-tailed. The is correspondingly one-tailed: 90% of normal distribution is within +/-1.64sigma, so only 5% is above the upper limit i.e. one-tailed.)

So drawing this new p-value on our plot....

RooFormulaVar p_mu_q("p_mu_q","p_{#mu} from one-sided pll asymptotic formulae","ROOT::Math::normal_cdf_c(sqrt(2*@0),1.)",RooArgList(opll)); p_mu_q.plotOn(frame,LineColor(kCyan)); |

and with a bit of tidying up, we see the following plot:

So we see that the one-sided upper limit is slightly smaller than the two-sided upper limit. Generally the latter is favoured in the case where you actually want to set an upper limit on new physics. The former is fine for confidence intervals (i.e. 1sigma errors) on actual parameter estimates, as done in section 1.
In fact, a slightly different test statistic is used in ATLAS for upper limits on new physics, which I will call the **bounded one-sided test statistic**. This is for cases where the null hypothesis value of the poi is believed to be the lowest value it could physically take (typically this would be 0 if your poi was the signal strength for a new physics signal), and so cases where your observed data (or the pseudo-data used to build up the test statistic distribution) give an *unphysical* MLE for your poi (less than the null hypothesis value) are modified to use the likelihood function value where the poi has been set to its null hypothesis value.

where is the null value of the poi (**Note:** the literature often defines this quantity to be instead, where now represents the probability of observing greater agreement with the null hypothesis than the observed data. I break with the convention because it seems more natural to think of p-values always being defined to the same side of the observed test statistic). When setting an upper limit on an alternative hypothesis, you want the p-value for the alternative hypothesis to be small, but simultaneously want the p-value of the null hypothesis to be big (i.e. the null is right and the alternative is wrong). So you to penalise the cases where the null hypothesis p-value comes out small as well, so for CLs we find the value of where the following quantity is 0.05:

So in the cases where the alternative hypothesis test statistic distribution actually sits pretty much on top of the null hypothesis distribution (i.e. no sensitivity), then the p values come out very similar, and this new p-value therefore becomes very large, no matter how small the alternative hypothesis p-value is.

If we want to use the asymptotic formulae we will need a way to estimate , which is the st.dev. of , which we can do using the asimov dataset, as will be explained in chapter 4.

{ using namespace RooFit; gROOT->ProcessLine(".L RooOneSidedProfileLL.cxx+"); //see details in section for how to make this file //construct the model RooWorkspace w("w"); //all in one line!! w.factory("PROD::model(\ Poisson::pois(n[61,0,100], \ expr::mean('sigma*nuisance_lumi*nuisance_acc+nuisance_b', \ sigma[0,100], nuisance_lumi[5.0,0.0,10.0],nuisance_acc[0.71,0,1],nuisance_b[41.7,0,100])), \ Gaussian::constraint_b(nuisance_b,41.7,4.6), \ Gaussian::constraint_acc(nuisance_acc,0.71,0.09), \ Gaussian::constraint_lumi(nuisance_lumi,5.0,0.195) \ )"); //define RooArgSets for convenience w.defineSet("obs","n"); //observables w.defineSet("poi","sigma"); //parameters of interest w.defineSet("np","nuisance_b,nuisance_lumi,nuisance_acc"); //nuisance parameters RooDataSet data("data", "data", *w.set("obs")); data.add(*w.set("obs")); //actually add the data RooAbsReal* nll = w.pdf("model")->createNLL(data); RooProfileLL* pll = dynamic_cast |

In the model we have done so far, we have been calling the number of events our observable. But instead, lets move to our observable being a new quantity, called "dummy", and instead move the number of events "n" to be the number of events in our dataset. "Dummy" always takes the value of 1, for every event in our dataset - we'll use a RooCategory to represent it. So what does our dataset look like for the example problem in section 1? We had nObs=61, so to represent this in our new setup we have:

w.factory("dummy[val=1]"); w.defineSet("obs","dummy"); RooDataSet data("data","data", *w.set("obs")); for(int i=0;i<61;i++) data.add(*w.cat("dummy")); |

Now we need to modify our model a little bit - for any given event, the pdf for the "dummy" observable is just going to be Probability(dummy=1)=1. The interesting part now moves into what is the probability of us getting this many events (entries) in our dataset. What we need to form is what RooFit calls an **Extended PDF**. These can be constructed using the RooExtendPdf class, but can also be done via the RooAddPdf class (and hence the SUM method of a RooWorkspace factory) where you specify that you want to sum up a pdf with a co-efficient in front of it. If the number of coefficients matches the number of pdfs you are summing up, then the pdf is "extended" and effectively acquires an additional poisson term in front of it to describe the number of events expected. Specifically, our model looks like this:

Note that M is not a pdf - it is not normalized to 1, it is normalized to , whatever that value might be.

In code, we do it like this:

w.factory("expr::mean('sigma*nuisance_lumi*nuisance_acc+nuisance_b', \ sigma[0,100], nuisance_lumi[5.0,0.0,10.0],nuisance_acc[0.71,0,1],nuisance_b[41.7,0,100])"); w.factory("SUM::model(mean*PROD::constraints(Gaussian::constraint_b(nuisance_b,41.7,4.6), \ Gaussian::constraint_acc(nuisance_acc,0.71,0.09), \ Gaussian::constraint_lumi(nuisance_lumi,5.0,0.195) \ ))"); |

The generated "RooAddPdf" model is different to our previous model, in that we can now do:

w.pdf("model")->expectedEvents(w.set("obs")) |

and this tells use, for whatever the current values of the pdf parameters, what the expected number of events is i.e. what the value of "mean" is:

w.func("mean")->getVal(); //will have the same value as expectedEvents(..) above |

Why have we done this? We're paving the way to have our dataset actually hold a continuous value observable instead of just dummy, where we could include the pdf for this continuous observable multiplicatively into our model. More on that later.

At this stage you should be able to proceed as in Chapter 1, and fit your data to the model and extract the MLE of your signal cross-section. The only thing you have to add is that you must tell RooFit to explicitly include the constraint terms for each of the nuisance parameters (if you don't, it will try to be clever and discard them before running minuit, and that wont get you very far!!). Thankfully we already defned the RooArgSet "np" of all the nuisance parameters, so we're good to go:

RooFitResult* res = w.pdf("model")->fitTo(data,Minos(*w.set("poi")),Constrain(*w.set("np")),Save(),Hesse(false)); |

So here's how we define a Global Observable: it is an observable which is the same value for all entries in your dataset. When we come to generating pseudo-data, we should only need to generate the global observables once per pseudo-dataset... you don't want the values to change between the rows of your dataset. When working with extended pdfs, a global observable is a pdf parameter that you will want to be held constant at some particular value (its observed value, or the value it takes for your pseudo-data) when you run fits with your model (on the observed data, or the pseudo-data), but you want the the value of this parameter to be able to change each time you generate new pseudo-data. So in our model, what are the Global Observables? The nominal value of the luminosity is one: if a parallel universe you ran their own version of your analysis on their parallel universe LHC, they would probably measure a different nominal value of the luminosity. The same goes for your nominal background estimate and nominal acceptance (they clearly have uncertainties, they're effectively an observable in what is called an **auxiliary measurement**. Your parallel universe you may have got slightly different MC from their generator and therefore predicted a slightly different background component, and so on).

OK so we go back to our model and throw in that these nominal values are no longer constants, but are additional parameters (with a range) that we hold constant when running fits. Before doing this, see that when we ask our pdf what its parameters are, we get the following:

root [23] w.pdf("model")->getParameters(w.set("obs"))->Print(); //the argument of getParameters is to tell the model which parameters are not in fact parameters but are actually observables RooArgSet::parameters = (nuisance_acc,nuisance_b,nuisance_lumi,sigma) |

Now if we go back and modify the model to read:

w.factory("SUM::model(mean*PROD::constraints(Gaussian::constraint_b(nuisance_b,b0[41.7,0,100],4.6), \ Gaussian::constraint_acc(nuisance_acc,acc0[0.71,0,1],0.09), \ Gaussian::constraint_lumi(nuisance_lumi,lumi0[5.0,0,10],0.195) \ ))"); |

When we ask for the parameters of the model we now get:

RooArgSet::parameters = (acc0,b0,lumi0,nuisance_acc,nuisance_b,nuisance_lumi,sigma) |

Having defined the poi, the nuisance parameters and the observables as "sets" in the workspace, we can now automatically say which parameters are global observables: they're the parameters that are neither nuisance parameters, poi, or observables that will appear in the RooDataSet):

RooArgSet* globalObs = w.pdf("model")->getParameters(RooArgSet(RooArgSet(*w.set("obs"),*w.set("np")),*w.set("poi"))); globalObs->Print(); RooArgSet::parameters = (acc0,b0,lumi0) |

If you tried to run a fit as this stage, you'll be in big trouble... you need to hold the global observables constant because they aren't going to be constrained by your dataset you're fitting to (the dataset doesn't have the nominal values for these observables in the rows of the dataset)....

TIterator* iter = globalObs->createIterator(); for (RooAbsArg *a = (RooAbsArg *) iter->Next(); a != 0; a = (RooAbsArg *) iter->Next()) { RooRealVar *rrv = dynamic_cast<RooRealVar*>(a); if(rrv) rrv->setConstant(true); } |

Note: In RooStats (if we ever get there ), the user is asked to specify the observables, global observables, and the poi - the nuisance parameters are the ones that are automatically determined.

.

So to generate a pseudo-dataset, we first need to generate pseudo-data values for the global observables (the observables not present in the RooDataSet we plan to make). RooFit gives you a method to do this:

RooRandom::randomGenerator()->SetSeed(111); //in the interest of reproducibility of results RooDataSet* pseudoGlobals = w.pdf("model")->generateSimGlobal(*globalObs,1); //the 1 is to generate one set of values only pseudoGlobals->Print() DataStore modelData (Generated From model) Contains 1 entries Observables: 1) acc0 = 0.698192 C L(0 - 1) "acc0" 2) b0 = 38.6157 C L(0 - 100) "b0" 3) lumi0 = 5.07694 C L(0 - 10) "lumi0" |

But we should keep in mind that these toy MC values of the global observables were generated with a particular set of values for the nuisance parameters (and the poi) in the model. The recommendation is to use the conditional MLE values of the nuisance parameters, for the particular values of the POI that we are generating pseudo-data under the assumption of, constructed using the observed data. I.e. we should use . A bayesian approach would involve randomizing the nuisance parameter value.... something we can come back to later.

So, lets first conditionally maximize the nuisance parameters, holding the poi constant (the global observables should be set to their nominal values still, and also held constant). We'll do this for a poi value of, say, 10fb.

w.var("sigma")->setVal(10); w.var("sigma")->setConstant(true); RooFitResult* res = w.pdf("model")->fitTo(data,Minos(false),Constrain(*w.set("np")),Save(),Hesse(false)); //conditional MLE nuisance values have now been set w.set("np")->Print("v"); //optional - to see the conditional MLE values for the poi=10 parameter point w.var("sigma")->setConstant(false); |

So now we're ready to generate our toyMC global observables. This time I get:

RooDataSet* pseudoGlobals = w.pdf("model")->generateSimGlobal(*globalObs,1); //the 1 is to generate one set of values only pseudoGlobals->Print() DataStore modelData (Generated From model) Contains 1 entries Observables: 1) acc0 = 0.641359 C L(0 - 1) "acc0" 2) b0 = 35.6624 C L(0 - 100) "b0" 3) lumi0 = 5.04248 C L(0 - 10) "lumi0" |

As expected, for a poi value higher than the MLE estimate for it, we see the background estimate has gone down compared to our nominal value... this has been caused by the nuisance parameter having a conditional MLE value pushed down by the large poi value.

We now need to set the global observables in the model to these values, and then generate our actual (nonglobal) observables:

RooArgSet* allVars = w.pdf("model")->getVariables(); const RooArgSet* pseudoGlobalValues = pseudoGlobals->get(0); allVars->assignValueOnly(*pseudoGlobals->get(0)); //this is a clever way to copy all the generated global observable values over to the model... it will change values of the global observable parameters, even if the parameters have already been setConstant delete allVars; delete pseudoGlobals;//cleanup |

Now you're ready to generate the pseudo-dataset. (.... this may not be needed... need to check....->>Make sure your global observables have been setConstant before doing this - you want them to stay fixed to the pseudo-values you have just generated while you generate the pseudo-values for your observables)

RooDataSet* pseudoData = w.pdf("model")->generate(*w.set("obs"),Extended()); |

We need the "Extended" option, otherwise the number of events generated will always be equal to the expected number of events, which is whatever the value of "mean" is at this point. What you get is actually a dataset with 1 entry, but it has a weight of x, where x is the pseudo number of events. This is just an alternative way to work with a dataset, rather than having a dataset with no "weight" field, so all entries have weight=1.

pseudoData->Print() RooDataSet::wu[,weight:weight] = 1 entries (60 weighted) |

We then use this pseudo-dataset (60 events, with acc0, b0 and lumi0 nominal values set to the values generated above) to evaluate the test statistic. Lets evaluate the one-sided profileLL that we invented in chapter 2.

RooAbsReal* pseudo_nll = w.pdf("model")->createNLL(*pseudoData,Constrain(*w.set("np"))); RooProfileLL* pseudo_pll = dynamic_cast |

This looks like this:

We can choose to evaluate this test statistic at a value of the poi we would like to test. We've generated it under , so lets evaluate it at . To evaluate the test statistic just do: w.var("sigma")->setVal(15);double q_mu = pseudo_opll.getVal()*2; //we didn't include the factor of two in our custom function |

But note: **The explicit evaluation of this test statistic causes a minimization to occur, which modifies the values of the model parameters (non-observables)**. So you need to make sure before you generate any further pseudoData you set the parameters back to where you want them. So just after you did the fit with the real data, take a snapshot of the nuisance parameters and use that (along with your chosen value of the poi) to reset the parameters before each pseudoData generation:

RooArgSet* npValues = w.set("np")->snapshot(); ... for(int i=0;i<nToys;i++) { RooArgSet* allVars = w.pdf("model")-<getVariables(); allVars->assignValueOnly(*npValues);w.var("sigma")->setVal(10); //reset the poi and nuisance parameters ... } |

So in this way we can sample our test statistic (at whatever value of we want) under the assumption of a particular value of the poi=. You could then histogram these values to take a look at the sampling distribution of your test statistic. But instead we'll use this opportunity to use our **first ever roostats class**: SamplingDistribution (and SamplingDistPlot, which visualizes a SamplingDistribution).

A SamplingDistribution just holds a set of values (which may be weighted, but if no weight given then the weight is assumed=1) that define some distribution. I.e. its an unbinned histogram :-P. We construct it using a vector of doubles (with an optional second vector of doubles to represent the weights, which we don't need for this example)...

using namespace RooStats; SamplingDistribution sd("opll_sd","One-sided pll test stat @ sigma=10",testStatValues,"q_{#mu}"); |

`testStatValues`

is your vector of values, the rest of the arguments are just convenient names, titles, and variable names, which I have yet to figure out the use of!

We then visualize the SamplingDistribution by adding it to a SamplingDistPlot.

SamplingDistPlot sdp; sdp.AddSamplingDistribution(&sd); sdp.SetLineColor(kMagenta,&sd); //for example, to change the coloring sdp.SetLineWidth(2); sdp.AddLine(observedTestStatValue,0,observedTestStatValue,0.2,"q_{#mu}^{obs}"); //where I saved the value from earlier TCanvas c2; //to draw the plot to a new canvas sdp.Draw(); cout << "P-value = " << 1.-sd.CDF(observedTestStatValue) << endl; //to get the p-value of the observation |

This gives me (after doing 2000 toy MC pseudo-data generations):

This is the distribution . The pvalue I get is 0.0715. So the observed data doesn't appear very compatible with the hypothesis of . We conclude this from using a test statistic that was really designed for testing a hypothesis of , but a small p-value is a small p-value, so that's that.

We start with the example problem we have been using throughout this tutorial, but using the form of the model where the global observables have clearly been promoted to "observable" status in the Gaussian constraint terms (i.e. a true frequentist construction). We'll keep the model in its extended form, and this time set up the observed data using the "weights" feature of a RooDataSet:

RooWorkspace w("w"); w.factory("expr::mean('sigma*nuisance_lumi*nuisance_acc+nuisance_b', \ sigma[0,100], nuisance_lumi[5.0,0.0,10.0],nuisance_acc[0.71,0,1],nuisance_b[41.7,0,100])"); w.factory("SUM::model(mean*PROD::constraints(Gaussian::constraint_b(nuisance_b,b0[41.7,0,100],4.6), \ Gaussian::constraint_acc(acc0[0.71,0,1],nuisance_acc,0.09), \ Gaussian::constraint_lumi(lumi0[5.0,0,10],nuisance_lumi,0.195) \ ))"); w.factory("n[61.0,0,2000]"); //the weight variable for the dataset entry //define RooArgSets for convenience w.defineSet("obs",""); //observables w.defineSet("poi","sigma"); //parameters of interest w.defineSet("np","nuisance_b,nuisance_lumi,nuisance_acc"); //nuisance parameters RooDataSet data("data","data", *w.set("obs"), "n"); //n is the weight variable data.add(*w.set("obs"),61.0); //add an entry with weight=61.0 //identify the global observable parameters and make them constant (to behave like observables in fits) RooArgSet* globalObs = w.pdf("model")->getParameters(RooArgSet(RooArgSet(*w.set("obs"),*w.set("np")),*w.set("poi"))); TIterator* iter = globalObs->createIterator(); for (RooAbsArg *a = (RooAbsArg *) iter->Next(); a != 0; a = (RooAbsArg *) iter->Next()) { RooRealVar *rrv = dynamic_cast |

Now we find the nuisance parameter values for the value of the poi that we wish to assume, which we can do by holding the poi constant and running the fit...

w.var("sigma")->setVal(10); w.var("sigma")->setConstant(true); RooFitResult* res = w.pdf("model")->fitTo(data,Minos(false),Constrain(*w.set("np")),Save(),Hesse(false)); w.set("np")->Print("v"); w.var("sigma")->setConstant(false); RooArgSet* np_cmle = w.set("np")->snapshot(); //conditional MLE values for mu=10 |

The values I got are:

1) 0x1fd49ef0 RooRealVar:: nuisance_b = 38.707 +/- 4.18845 L(0 - 100) "nuisance_b" 2) 0x1fd49190 RooRealVar:: nuisance_lumi = 4.96521 +/- 0.19396 L(0 - 10) "nuisance_lumi" 3) 0x1fd49840 RooRealVar:: nuisance_acc = 0.652868 +/- 0.0822396 L(0 - 1) "nuisance_acc" |

These are therefore the expected values for b0, lumi0 and acc0 - i.e. we should use these values for our asimov dataset. The corresponding expected number of events, which will be our nObs for the Asimov dataset, is given by the value of the "mean" function in our model:

w.function("mean").getVal(); //this gave me 71.11 with the above nuisance parameter values |

It's not a problem that this value is non-integer. So our Asimov Dataset is given by: nObs=71.11, b=38.707, lumi=4.965, acc=0.653. Lets plug make this dataset (which includes manually setting the global observables in the model), and check that fitting the model to this dataset gives us back :

RooDataSet asimovData("asimovData","Asimov Dataset", *w.set("obs"), "n"); asimovData.add(*w.set("obs"),w.function("mean")->getVal()); w.var("lumi0")->setVal(np_cmle->getRealValue("nuisance_lumi")); w.var("b0")->setVal(np_cmle->getRealValue("nuisance_b")); w.var("acc0")->setVal(np_cmle->getRealValue("nuisance_acc")); RooFitResult* res = w.pdf("model")->fitTo(asimovData,Minos(false),Constrain(*w.set("np")),Save(),Hesse(false)); w.var("sigma")->Print(); |

This gives me:

RooRealVar::sigma = 10 +/- 3.28692 L(0 - 100) |

Hurray, everything works and we have our Asimov dataset!

This dataset is useful because we can now extract for use in the asymptotic formulae, by using the relationship:

Where is the profile likelihood ratio formed with the asimov dataset. So lets evaluate this quantity and extract our estimate for . But which value of do we choose to calculate it at? Does it matter? Lets express as a function, just to see how stable things are across the possible values...

RooAbsReal* asimovNLL = w.pdf("model")->createNLL(asimovData,Constrain(*w.set("np"))); RooProfileLL* asimovPLL = dynamic_cast |

And I get:

so not exactly independent of the choice of (remember, "sigma" on the x-axis is the parameter of interest in this model, aka ). The recommendation is toNow we're ready to use asymptotic formulae to calculate CLs limits!

where is the test statistic you choose to form (for a CLs upper limit, this will probably be a one-sided test statistic like in fact). So we effectively need to pick a value of , evaluate its (asymptotic) distribution under the assumption of , and again under the assumption of . Then calculate the p-values from the position of the observed value of the test statistic, and compute the above fraction. If its too big, we need to move to a value of further away from , which will hopefully lead to a smaller value for , without significantly altering the value of (we are setting an upper limit because we believe the observed data is compatible with the hypothesis, so the p-value for any test statistic under this hypothesis we expect to be large~0.5).

I | Attachment | History | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|---|

png | nllPlot.png | r1 | manage | 10.4 K | 2013-01-05 - 15:30 | WillButtinger | |

png | nllpllPlot.png | r2 r1 | manage | 11.2 K | 2013-01-05 - 16:06 | WillButtinger | |

png | nllpllPlot2.png | r1 | manage | 11.4 K | 2013-01-05 - 16:43 | WillButtinger | |

png | opll_pseudoData.png | r1 | manage | 14.0 K | 2013-01-13 - 19:02 | WillButtinger | |

png | pValuePlot.png | r1 | manage | 11.1 K | 2013-01-07 - 19:21 | WillButtinger | |

png | pValuePlot2.png | r1 | manage | 13.1 K | 2013-01-12 - 15:18 | WillButtinger | |

png | qmu_testStat.png | r2 r1 | manage | 14.3 K | 2013-01-07 - 19:28 | WillButtinger | |

png | qmu_testStat2.png | r1 | manage | 16.2 K | 2013-01-12 - 15:23 | WillButtinger | |

png | qmu_tmu_testStat.png | r1 | manage | 17.5 K | 2013-01-07 - 20:34 | WillButtinger | |

png | qmu_tmu_testStat2.png | r1 | manage | 17.5 K | 2013-01-12 - 15:27 | WillButtinger | |

png | sampDistPlot.png | r1 | manage | 5.6 K | 2013-01-13 - 20:00 | WillButtinger | |

png | sigmaMuHat.png | r1 | manage | 9.2 K | 2013-01-15 - 12:16 | WillButtinger |

Topic revision: r29 - 2017-03-06 - WillButtinger

**Webs**

- ABATBEA
- ACPP
- ADCgroup
- AEGIS
- AfricaMap
- AgileInfrastructure
- ALICE
- AliceEbyE
- AliceSPD
- AliceSSD
- AliceTOF
- AliFemto
- ALPHA
- ArdaGrid
- ASACUSA
- AthenaFCalTBAna
- Atlas
- AtlasLBNL
- AXIALPET
- CAE
- CALICE
- CDS
- CENF
- CERNSearch
- CLIC
- Cloud
- CloudServices
- CMS
- Controls
- CTA
- CvmFS
- DB
- DefaultWeb
- DESgroup
- DPHEP
- DM-LHC
- DSSGroup
- EGEE
- EgeePtf
- ELFms
- EMI
- ETICS
- FIOgroup
- FlukaTeam
- Frontier
- Gaudi
- GeneratorServices
- GuidesInfo
- HardwareLabs
- HCC
- HEPIX
- ILCBDSColl
- ILCTPC
- IMWG
- Inspire
- IPv6
- IT
- ItCommTeam
- ITCoord
- ITdeptTechForum
- ITDRP
- ITGT
- ITSDC
- LAr
- LCG
- LCGAAWorkbook
- Leade
- LHCAccess
- LHCAtHome
- LHCb
- LHCgas
- LHCONE
- LHCOPN
- LinuxSupport
- Main
- Medipix
- Messaging
- MPGD
- NA49
- NA61
- NA62
- NTOF
- Openlab
- PDBService
- Persistency
- PESgroup
- Plugins
- PSAccess
- PSBUpgrade
- R2Eproject
- RCTF
- RD42
- RFCond12
- RFLowLevel
- ROXIE
- Sandbox
- SocialActivities
- SPI
- SRMDev
- SSM
- Student
- SuperComputing
- Support
- SwfCatalogue
- TMVA
- TOTEM
- TWiki
- UNOSAT
- Virtualization
- VOBox
- WITCH
- XTCA

Welcome Guest

Copyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback

or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback