This is a tWiki page describing how to scale/weight events using the tools provided in order to simulate different pileup conditions with the same data.
In real data it is likely we will have several different run conditions with different pileup factors due to the luminosity. It is unlikely that we will generate sufficient Monte Carlo for every different run condition to satisfy a complete offline analysis.
In the future the LHC is expected to go through several upgrades, which will make necessary an LHCb upgrade for higher-luminosity running. Several scenarios are proposed, and it is unlikely we will generate sufficient Monte Carlo in every proposed set of conditions. By the time the upgrade arises, we will have a large amount of real data. We should be able to reweight/resample the dataset to approximate high-luminosity running, using the real data.
We will have two types of tools. Resampling and Reweighting. The Reweighting tool has been provided as a TupleTool (DOxygen TupleToolMCInteractions). In the future the function of this tool will be copied into a simpler GaudiTool, and a resampling tool will also be provided.
The number of interactions (I) in an event follows a poisson distribution, with a mean (mu) proportional to the luminosity (L), proportional to the total cross-section (sigma), and inversely proportional to the bunch crossing frequency (f). The mean (mu) for DC06 data was 0.68 interactions per event. The mean (mu) for the MC09 data is 1 interaction per event. For more details on the initial run conditions see Gloria's talk Tues 11/05/09.
To approximate a dataset with a different mean, one can either resample the existing dataset, throwing away some events, to build up the desired poisson, or reweight event-by-event by a factor (w) to scale the effective number of interactions in the sample at hand.
Resampling and Reweighting are useful in different circumstances. In the case of infinite statistics, both approaches are equivalent. In the case of limited statistics, Reweighting must have equal or smaller statistical errors. If you wish to make a dataset which "looks like" the expected real data, such that CPU times and retention rates can be calculated correctly, or if you cannot use event-by-event weights in your analysis, Resampling may be required.
Warning in the case that you would like to approximate a larger number of interactions per event (larger mu), the necesary event-by-event weights (w) are larger than one. This is a curious case from a statistical standpoint.
In the limit of large statistics, this is no problem. In the limit where the weights (w) are of order 1, this is no problem. For example, if you want to scale up by a small amount.
In the limit of small statistics, or large changes in the mean, the weight can become very large, for large (I). These large weights must be handled correctly in your analysis by propagating the errors, or placing a cut-off on the distribution.
In the case of resampling, you cannot compensate for the events which were never generated (w > 1). In this case a point must be chosen to normalise the distribution, which then gives the true poisson distribution below this point, and the original poisson distribution, as an approximation, above this point.
This is equivalent of redefining : for(w > x) w=x; an approach which can be taken, if you are not really too interested in the events with large weights and small statistics.
In the limit of large statistics, small (w), where there are only small changes in (w), or where each (w) has a similar number of events, scaling your usual poisson error by the average weight is an adequate representation of the new error.
In the limit of small statistics, where each (w) has only a small number of events each, the dataset corresponding to each different (I), or each differerent (w) should be treated differently and independently. A correctly calculated likelihood distribution or chi2 minimisation could be used to perfectly calculate the error.
In the middle ground, where some bins have large statistics, and some don't, treating each (I) independently, and summing quadratically, will give an adequate estimate of the error.
If you have scaled from Lumi (L1) to Lumi (L2), by applying event-by-event weights, the effective integrated luminosity of the sample is multiplied by the ratio (L2/L1). In the case that L2>L1, some event weights will be greater than one, such that the same relationship holds.
In the case that the weights have been normalised to some factor (x), the same scaling applies, that the effective integrated luminosity is multiplied by L2/(L1.x).
If the event weights have been scaled by resampling, a normalisation factor will have been chosen, and the approximate integrated lumi will be multiplied by L2/(L1.x) similarly. Again, when scaling L2>L1 then (x) will probably be a large positive number, and the dataset will contain many fewer events, QED. This approximation holds well for L1>L2, and while L2/L1 is of order 1. For larger scaling each bin in (I) or (w) should be treated independently and the effective luminosity calculated by the user on a case-by-case basis.
If you have scaled the Lumi AND the bunch crossing rate/cross-section then the calculation is the same, the ratio L2/(L1.x) still applies, however this ratio will no longer be equal to mu2/(mu1.x). Also, the length of time taken to reach this Luminosity might be different due to the bunch-crossing rate. Care should be taken.
-- RobLambert - 18 May 2009