-- IroKoletsou - 2020-10-05

The Run2 WZ analysis fitting code used at LAPP, based on RooFit

This code, based and using RooFit, is developed by Louis Portales

Code's git directory at CERN:

https://gitlab.cern.ch/lportale/wzfit

In this directory you'll find both the tools to perform the fit and tools that allow to interpret the results and extract information such as the uncertainty ranking, their pull etc. You just have to clone it in your work directory.

Initial setup:

In order to perform the template fit and have information concerning the mus and their uncertainties as well as the significance you can use a simple root setup:

  • setupATLAS
  • lsetup "root 6.20.06-x86_64-centos7-gcc8-opt"

or, in order to have the same setup as for the LAPP's code:

  • setupATLAS
  • asetup 21.2.114,AnalysisBase

But this setup doesn't allow you to use TRooFit tools (an expansion pack with further functionnalities), essential to the interpretation of the fit. In order to do this, you need to setup TRooFit as describer here:

https://gitlab.cern.ch/will/TRooFit

Nevertheless, the AthAnalysis version that you use has not to have any conflict with the root version that you have. If the compilation doesn't work properly, try to use a more recent AthAnalysis than the one recommended in this twiki page. For the moment, the following setup works:

  • mkdir build source
  • cd build
  • acmSetup AthAnalysis,21.2.142
  • acm clone_project TRooFit will/TRooFit
  • acm find_packages
  • acm compile

You can do this in your home directory for example.

Temporary starting point: construction of the templates

Ideally, the starting point of the fit should be histograms directly given by the LAPP's analysis code (https://twiki.cern.ch/twiki/bin/view/Main/LappDibosonAnalyses). Nevertheless, as far as the templates to use are not optimized, the starting point of this part of the analysis are ntuples produced by the same code and used here in order to make the template histograms.

The macro that constructs the templates is:

fitScripts/mkTemplates_baseline.C

To run this macro (and get templates for all regions and all processes), just (remember to modify the dirOut directory):

.L mkTemplates_baseline.C

main()

You can create templates either with or without systematics: Bool_t doSys = kFALSE; and modelling uncertainty: Bool_t doMod = kFALSE;

****NOTE: this macro will be modified in the following days, in order first to simplify and then to add properly the templates for the signal sub regions

  • Theory uncertainties are not implemented yet:
- QCD scale uncertainty is only applied as a flat uncertainty in the QCD template, so no need for an axtra histogram (yet) - PDF uncertainties are not available yet - modelling uncertainty has to be implemented

  • The list of object uncertainties is found here:

fitScripts/SystList.C (in case of modification, one has to take care to use the same names as given in the LAPP code)

For example g_sysALL contains the list of all the object systematics.

Nevertheless, in the case of the EW, QCD and ttV processes, you can't run all the systematics simultaneously. You have to split them into three sublists (for example g_sysALL1) that you will define in SysList.C. Then you will run mkTemplate.C three times, replacing g_sysALL by g_sysALL1, 2 and 3 (but remember not to run on the nominal again for 2 and 3, for example with a (if isys==0) continue; for the second and third time that you run).

NOTE about the MET and JER systematics As you can see in the macro, after creating the templates, two additionnal functions are called for the samples that are created with systematics:

if(doSys) mkMETTemplates( (dirOut+fileIn).Data() ); if(doSys) mkJERTemplates( (dirOut+fileIn).Data() );

These macros are symmetrizing the MET and JER templates that are already created with the previous step. When you run in one step, using g_sysALL, you can do everything at once, as seen in the macro. But if you run with g_sysALL_1 to 3, you can proceed as following:

Create the three templates separately for the three category of systemnatics commenting the lines refering to MET and JER, make the hadd to have only one root file and then run the standalone versions of the two macros. For example:

.L mkMETtemplates.C

mkMET_main()

  • As long as we work with MC estimated fakes, everything is correct and already normalized in the ntuples. If we start working with MM fakes, this has to be modified.

In the end, in your dirOut directory, you'll find one root file per process and inside them you'll find all the templates concerning this process (one per region). These are the templates that will be used in the fit.

Smoothing of the uncertainty templates

After having created the templated, for the treatment of the systematic, you can then apply a smoothing procedure.

****NOTE: this is an ongoing work, not detailed here for the moment and macros not in git yet

You can either proceed to a simple smoothing, using: fitScripts/mkSmoothSys.C (but problems have been noticed with the resulting templates)

or explore probably more interesting tools, as the ones that you can find here: fitScripts/mkSmoothSys_tool.C

The latter script is based on the CommonSystSmoothingTool, developed by the Exotic group, that needs to be compiled beforehand. The instructions to do so are on the tool's git page ( https://gitlab.cern.ch/atlas-phys/exot/CommonSystSmoothingTool ) , and have been tested for our context.

If you do that, then the names of the histograms will change by smooth* or default* so you have to specify this prefix inside FitParameters.C.

Set the Parameters of the Fit

While creating the workspace, you have to set some preliminary parameters such as:

  • if you want to blind all or part of the regions
  • which regions to include
  • if you want to extract differential measurements
  • what sources of systematic include in the fit, if any
  • what root files to use (the root files including the templates that were created during the previous step)

All these are very clear inside the parametrization file:

fitScripts/FitParameters.C

but there are a few things that has to be clarified:

  • If you set BLIND_all=true, the Asimov templates are created automatically. But if you only blind the SR, then you also have to create the Asimov data in this region. This is done using the mkAsimovData.C macro, where you can also rescale some of the templates, but this procedure is in any case far from being optimal
  • If you set doOldSetup=true, then you'll have the exact same conditions as in the previous run, which means every mu floating in its CR and the SR simultaneously. If you want to use the old setup with minor modifications, you can so this in mkWorkspace.C, as will be explained in the following
  • If you set doExport=false, you will only create the workspace but not proceed to the actual fit. This is nevertheless what is generally done if you want to do more than simple tests, as you will anyway perform the fit in a later stage, using RooFit tools. We generally fix this to true only when we want to make fast tests giving only the mu's and their uncertainties and the significance, using some approximations

****NOTE: some work will be done here soon, in order to properly take into account the SR separation into sub-regions

Constructing the workspace

In order to construct the workspace, you have to use:

fitScripts/mkWorkspace.C

First you have to create the channels - in our case every region included in the fit corresponds to a channel. You don't have to modify anything here, as you have already defined the regions that you want to include in the fit in FitParameters.C

Then you define the nuisance parameters:

- Concerning the statistical uncertainty, activated using ActivateStatError(), one can set a threshold, currently set to 0,001, in order not to take into account too low statistical uncertainties for same samples.

- The object uncertainties are also taken from SysList and in FitParameters the one that well be included in the fit are defined, so there is no need to modify anything here, unless a new source of uncertainty is added.

- Theory uncertainties has to be included. As explained, for the moment the QCD scale is included as a uniform uncertainty to the QCD template:

meas.AddUniformSyst( "QCDscale_qcd" ); 

if(process=="qcd") sample->AddOverallSys( "QCDscale_qcd",0.8,1.3);
This might be modified later. The Uniform NP is a more conservative definition, but might yield instabilities in the fits. By commenting the "AddUniformSyst()" function, a Gaussian NP will be used which might be sufficient

  • About the correlation of the uncertainties, this is technically given by the name of the applied uncertainty. For instance in the lines:

if( process=="ew" || process=="qcd" ){
	if(strstr(channel.c_str(),"SignalRegion")) 
	    sample->AddHistoSys ( ("genshapemodelling_"+process).data() , histname , fname , ("genshapemodelling_"+process+"__1up/").data() , histname , fname , ("genshapemodelling_"+process+"__1down/").data() );
	if(strstr(channel.c_str(),"CR"))
	    sample->AddHistoSys ( ("genshapemodelling_"+process).data() , histname , fname , ("shapemodelling_"+process+"__1up/").data() , histname , fname , ("shapemodelling_"+process+"__1down/").data() );
    }

****NOTE: the histograms retrieved for the "CR" case currently do not have the right normalisation. This must be corrected.

One can see an uncertainty applied only on the EW and QCD templates and only in the SR and the QCD-R. The naming of the uncertaonty includes the process, which makes the modelling uncertainty completely decorrelated between them - which is what we want. On the other hand, this uncertainty is completely correlated in the SR and the QCD-R (again because of the naming), which is by the way the interest of using the control regions. In the hypothetical case that one would like to introduce an independent to the different regions source of uncertainty, the way to do this is to specify "SR" for example in the name of the uncertainty.

Then you have to define which mu is measured in which channel. For example, you see that if you follow the old setup, the mu_QCD is floating in both the SR and the QCD region:

WZQCD . AddNormFactor( "mu_WZQCD" , 1 , 0 , 5 );

In order not to do so, the flag g_doNoMuBkg can be set to false for example.

If you don't use the old setup, you have to define all this by hand.

****NOTE: some work will be done here soon, in order to properly take into account thhe SR separation into sub-regions

Finally you have to introduce the nuisance parameters separately by process. The theory uncertainties have to be added.

In order to construct the workspace, just do: root mkWorkspace.C. Then you will see that a new Workspace directory is created, with several root files, The one that interests us the most is:

Workspace_combined_WZjjEW_XsecFit_model.root

Interpretation of the results without TRooFit

Some interpretation studies can be done without using TRooFit.

  • compute the statistical significance

In order to do that, use:

fitScripts/PostProcessing.C

In this macro, one has to specify the output rootfile and the parameter for which the significance is estimated (it doesn't has to be the POI, the same can be done for every mu introduced in the fit). In orde to do so, one has to replace SigXsecOverSM with any paremeter (such as mu_QCD, but using th esame name as in mkWorkspace.C) both in the definition and in the step where it is fixed to zero (in order to test the null hypothesis and compute the pi value and the significance):

RooRealVar* mu = w->var("SigXsecOverSM"); //Get POI
RooArgSet poi(*mu);                       //Tell it is the POI
RooArgSet *nullParams = (RooArgSet*) poi.snapshot(); //Define the parameters for the null 
nullParams->setRealValue("SigXsecOverSM", 0.);       //Set its value to 0 

This macro also allows to compute the impact of a systematic uncertainty on the significance. If you add, for instance, these lines:

// NP list
TString nameNP[2] = {"alpha_modelling_ew", "alpha_modelling_qcd"};
  
// Fix desired NPs for impact check
for (int iv = 0; iv < 2; iv++) 
    w->var(nameNP[iv])->setConstant(kTRUE);    

you can exclude from the fit the source of interest that you want to study and see the change on the significance.

* Similarly, you can check the impact on the mu uncertainty of a group of variables, by removing them from the fit (for the ranking of the sources of uncertainty). For this, you can use:

fitScripts/mkNPrank_cat.C

which gives the table with the importance of every uncertainty category.

If you want to add the possibility to fix NP categories to any of the scripts, you can include fixNP.C to that script, which give you access to the following functions:

void fixAll(RooWorkspace *w, bool doConst){
void fixTheoQCD(RooWorkspace *w, bool doConst);
void fixTheoEW(RooWorkspace *w, bool doConst);
void fixBkgNorm(RooWorkspace *w, bool doConst);
void fixMCstat(RooWorkspace *w,d bool oConst);
void fixJER(RooWorkspace *w, bool doConst);
void fixJES(RooWorkspace *w, bool doConst);
void fixEgamma(RooWorkspace *w, bool doConst);
void fixMuon(RooWorkspace *w, bool doConst);
void fixMET(RooWorkspace *w, bool doConst);
void fixPRW(RooWorkspace *w, bool doConst);

that takes as inputs a workspace and a boolean specifying whether you want to fix, or un-fix the NP for a given category.

  • create the log likelihood minimization plot

In order to do that, use:

fitScripts/mkNLLplot.C

This is long, the computation must be done for every mu point. It is generally a good idea to run it on the interactive queues at CC-IN2P3, in order not-to risk your session to be closed as the script runs.

Similar plots can be drawn for all NPs as a sanity check, using

fitScripts/mkNLLplot.C

This can be run in batch using the steering files in fitScripts/NLLbatch, and calling

./NLLbatch/sendBatch.sh

Interpretation of the results using TRooFit

Using directly TRooFit allows you to have a very extensive interpretation of the fit in a very straightforward way, as you can see in:

troofit/troofitExample.C

As you can see, you have to use the root file contructed in during the workspace construction. Then you can directly for instance pront our the pre-fit yields.

Then you actually perform the fit using thhe workspace (w->fitTo()) which means that when executing mkWorkspace.C you don't have to actually fit (doExport=false).

Then every information is given with a very simple commande, for example:

* You can print out the post-fit yields (with full post-fit uncertainties) using: w->Print("channels samples yields");

When you do that, you'll also print out the pull of the uncerainties. This is not to be used for the communication of the results, but can be used as a first sanity check:

For example, alpha uncertainties (gaussian) such as background normalization, should have a constraint close to 0 with un uncertainty close to 1, but gamma uncertainties (Poisson) should have a pull close to one with small uncertainties.

* You can make the pull plot using: w->Draw("pull");

and anything else described here:

https://cernbox.cern.ch/index.php/s/tzyUPsHqBSgKK1A

Send Batch for a full NLL minimization plot

When you run the full fit with full uncertainties, the NLL plot is very long to make. One should send a batch, after setting the right root version (steeringFit.sub):

in fitScripts/:

./sendBatch.sh

Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2020-10-23 - IroKoletsou
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback