Plots for Data to MC Comparison: Ratios and Pulls
In this section we summarize the interpretation and some recommendations for different common alternatives used to compare data to MC distributions. Some initial remarks:
- Doing a histogram always comports losing information, if you plan to do a further statistical interpretation you should try to use the original unbinned distributions. More details are given in other sections of this twiki: bin size, wide bins and error bars on counts.
- The recommendations outlined here are hence foreseen for presentational purposes and are in some cases approximate. If you want to go beyond that, they need to be checked for your particular problem.
- In the following, we will aim to define “adequate” error bars in the sense of providing a confidence interval with a coverage close to 68.3%, symmetric in coverage but potentially asymmetric in length around the mean value.
Data/MC ratio
Frequently, one wants to divide the observed data by the expected yield from simulation, bin by bin wise.
If we can treat the bins as independent and the error bars define an appropriate confidence interval (will assume the common 68.3% CL but can be extended to any CL), we can compare at a glance the departure from the horizontal line at one to find “anomalies”.
Note that correlation between bins, statistical or systematic,
can easily invalidate such comparisons.
If you chose this approach is is recommended to divide data by simulation rather than the opposite, because in most cases the ratio is better behaved, having larger number of simulated events. It is also important to remark that a
ratio of quantities (see
FMRatioCI)
is often ill-behaved and subtraction based quantities, like pulls, have better statistical properties.
Gaussian regime. If both data and MC have a number of events large enough, gaussian approximations can be used. If additionally the errors are significantly smaller than the means, one can safely use error propagation. When they become of comparable size you might want to use the more precise treatment described in
RatioOfGaussians.
Poisson Regime. Again, if the number of counts is small for the data or one of the (relevant) MC contributions, the error propagation starts to fail (see
FMRatioCI for details).
Note that the propagation of the so-called Poisson errors (Garwood intervals) does not provide good coverage.
Often we cannot directly use the expression for the ratio of Poisson counts because the denominator is more complex. The simulation includes the contribution from different samples. which imply a linear combination of Poissons, maybe include weights... An approximate approach that proves to provide a reasonably good coverage (reminder: this is only for drawing an error bar) is the following:
- Replace the denominator by a constant times a Poisson distribution such that both the mean and variance are preserved. This leads to a simple two-equation system, with the constant and Poisson means as unknown.
- Calculate a confidence interval for the ratio of Poisson for ratio of the data and this approximated Poisson distribution, either with the mid-P approach (providing a coverage closer to 68.3 % but sometimes smaller) or exact Clopper-Pearson (guarantees ≥ 68.3%)
Studies
with pseudo-experiments show good coverage properties even for very low expectations. The method also works when the denominator is gaussian and systematic uncertainties can be included.
It is implemented in Root as
Data rescaling
An alternative approach also widely used consists in rescaling both data counts and their error (
Poisson errors if appropriate) by the MC expectation. Note that this only represents a change of scale if the MC has negligible error. In this case, again, departures from 1 can be easily identified. MC is treated in a similar way, rescaled by its own counts, hence the central value is set to 1 by definition and the rescaled errors provide an interval around this fixed value. Note that with this approach implies a different statistical interpretation. The error bars or bands in the data or MC represent the 68.3% CI for each of them. Using the criteria of compatibility of bars and band "touching" results in overcoverage, being equivalent to a linear the errors rather than in quadrature (more details can be found
here).
Pull-like plots
Still another way of visually presenting data vs MC comparison, consists in calculating bin by bin the difference between the data and the MC, divided by some estimation of the error. Again, we must stress this might be interesting for presentational purposes, but in most cases involves approximations, hence it is
highly recommended to use the original distributions to perform any further statistical test.
This quantity can only be interpreted as a “pull” with normal distribution of μ=0 and σ=1, when both data and MC are purely gaussian and all errors are taken into account.
Although this is not the case for low number of counts a number of methods are proposed to approximately produce the desired output.
The so-called “standard error”, using the straightforward definition of (ndata-pred)/σ provides an approximated answer when taking σ=√pred. Where pred represent the prediction, with the corresponding scales (ie pred=xs*lumi*Nmc). Note this is not the statistical error on the MC prediction, but the statistical error on the data estimated from the MC prediction. With this construction and if the prediction is done with a sizeable number of simulated events, even for small average expectations ther resulting distribution is expected to have μ=0 and σ=1, although not necessarily with a gaussian shape. Two important notes. Firstly, using , σ=√ndata, where ndata is the observed number of data events, can lead to significantly biased results for low number of counts. Secondly, it is
not correct to use as σ the “Poisson errors” (Garwood intervals) for Poisson statistics, since it produces distributions with a standard deviation that can be significantly lower than one and with biased means.
The standard error provides a simple and sensible result in the sense of getting a distribution whose average is close to 0 and whose variance is close to 1, but for some cases with a shape not totally compatible with a Gaussian, as shown in the plots below. A more precise result, especially for the tails, can be obtained calculating the p-value for the observed counts assuming a Poisson law with μ=pred and transforming it into a one sided z-score. This can be achieved in ROOT with
sqrt(2)*TMath::ErfInverse(-1+2*ROOT::Math::poisson_cdf(ndata,pred)) and in R with
qnorm(ppois(ndata,pred)))
An example is shown below for the case of a variable driven by Poisson statistics with mean=3. The pulls obtained in each case are compared with the ideal normal law.
Original distribution for a Poisson law of mean=3



Pull definition acoording to three different methods: direct calculation with error calculated from MC (top left); direct calculation with error calculated from data (top right); direct calculation with error calculated from Garwood interval (bottom left); Z-score calculated from Poisson p-value (bottom right).
--
FranciscoMatorras - 2017-06-09