Statistics is used everything and is something you don't want to miss. Here's just some of my understanding.

I would make this topic as rich and complete as possible.

Statistics


Significance.

The significance measures the discrepancy between your s+b data and your bkg-only model. So you expect a data to look like s+b, so how different it looks compared to an assumption where you don't have a signal, will give you the power to reject the bkg-only model.

Some asymptotic formula: https://cds.cern.ch/record/2643488/files/ATL-COM-GEN-2018-026.pdf

Errors

What is Statistical Error?

What is Systematic Error?

Auxiliary measurements provide uncertainties for measured variables. Those variables are used in analysis. Explicitly variables are like luminosity, calibration factor and scale factor inside a certain object bin of pt vs eta; implicitly, they can be "the effect of changing your smearing algorithm", "changing your underlying parton distribution function", etc. They are given by auxiliary measurement in a 1 sigma deviation manner.

The 1 sigma variation will be usually assigned to a gaussian constraint, $ Gaus (\theta; mean=0, sigma=1) $ , and prediction will be sth like $(nominal * (1 + variation * \theta))$ (It's the simplest linear case). This is based on the assumption that 1 sigma variation on your parameter will have a 1 sigma variation impact on the final result.

See here for some introduction to systematic uncertainty. It also introduce something about profile likelihood, which absorb the nuisance parameters
[ Pekka K. Definition and Treatement of Systematic Uncertainties in High Energy Physics and Astrophysics].





What is a fit?

A fit is a procedure to find minimum/maximum of a certain metric, to see the ability of your model to describe the data.

Best Fit Interpretation

Overconstrain/Underconstrain

Errors are usually calculated assuming 1 sigma deviation from nominal, by convention. So nuisance parameter are usually assigned to a Gaussian constraint with sigma=1. If the estimated error given by fitting algorithms is less than 1, we have overconstrain. If it's over 1, we have underconstrain.

The estimated errors on the nuisance parameter given by the the fitting algorithm are what the algorithm defines 1sigma error. So if you have overconstrain, for example 0.9, the algorithm thought your error should be 0.9 * variation_1sigma. In this case, you might have overestimated your error, or you can say it's too conservative. But also, it could be due to your response model is too simple.

See more here: [W. Verkerke Practical Statistics - Part III ]

Technical procedure of fitting

Fitting algorithms

Algorithms of estimating errors

Error of a POI/fitting parameter are given by:

Migrad: quick esetimate.

Hesse: square of second derivative at best fit point. This is assuming parabola shape NLL.

Minos: find intersection of min_NLL + 0.5 and profile scan of POI.

Minuit2:

Examples of metrics

  • Likelihood $ \mathcal{L}$ : multiplication of simple probability in each measurement(usually means yield in bin), as an combined probability of observing this kind of data.
  • $ NLL $: negative log likelihood $-ln( \mathcal{L} )$. It has some nice property in fitting.
  • $\chi^2$: It's simply $ 2 NLL $ when $ \mathcal{L}$ is gaussian-like, and is used in simple fitting.

Some tutorials

-- RongkunWang - 2017-09-04
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2019-02-19 - RongkunWang
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback