Calibration Requirements using Reconstructed Data

This page summarizes the various requirements for calibration use cases needing reconstructed data and the possible implementations.

Reminder of the processing flow

  1. The event filter farm (EFF) writes out 2 kHz.
  2. They all go through the monitoring farm (MF) .
    1. Some (~50 Hz) are reconstructed in the MF for monitoring purposes.
  3. The data is written in parrallel to 4 files. Each file collects about 60000 events (2 GB) in 2 minutes
  4. The raw data is copied to castor and distributed to Tier1s.
  5. The raw data is reconstructed at Tier1s after the green light has been given for reconstruction. This can be a few days after data taking.
  6. The reconstructed data is stripped once enough reconstructed data is available. This can be several days (weeks?) after data taking.
  7. Steps 5-6 are repeated if needed.

The question discussed in this page is where and how we perform the calibrations that require reconstructed

Sources of reconstructed data

Monitoring Farm

In the monitoring farm we will reconstruct order of 50 Hz using Brunel. Which events are to be reconstructed is defined by the routing bits. This data is used to produce histograms that will be analysed in real time and stored.
  • All monitoring done at this level is in real time.
  • It is not foreseen to save this data, but it could be done.
    • The data is not in root format, so the most easy would be to save it as a MDF file. Such a format could then only be read in by the same version of the event model.

Hot stream

A special calibration stream, already mentioned by the Streaming Task Force, was advocated. We could have a low rate of "hot" events suitable for calibration purposes, like alignment or PID, to be forked off the standard data flow, reconstructed and made available to experts for analysis.
  • This data would have to be reconstructed, probably at the pit (PLUS farm).

Analysis

From this point the MF and hot streams are equivalent.
  • The PLUS farm could be used to analyse them. There is a buffer of 30 TB at the pit (i.e. 10^9 events, or 6 days of 100% efficient running at 2 kHz), of which some could be used for this data. In principle data is deleted after some time but one could pin down some data for later use. This data should be used quickly and not kept for long time anyway. It is still possible to copy some of the data to some scratch space or laptop if needed.
  • One could migrate to castor.
    • Even distribute to Tier1s?
It all depends on the timescale during which we need this data. In 2008 we are likely to need all the data all the time. But what about 2009? Will we ever look at this data once the processing has been done?

Offline reconstructed data

The 2kHz data will be distributed to Tier1s and reconstructed there. This will happen only after the green light has been given by the DQ team, typically after a day or so. Any monitoring that does not need immediate feedback to the detector or is input to the reconstruction could run there. The output will be histograms which will be shipped back to CERN. The output of the reconstruction are rDST files. It is not foreseen to run jobs on this data. The reconstruction is done at the file level, i.e. about 60000 events from the same time interval of 2 minutes.

Stripping

After enough reconstructed rDST data has been collected on a Tier1 the stripping is run. This can happen a long time after the reconstruction and there is no ordering of the events guaranteed. Monitoring tasks can be performed there as well. Although it is probably more practical to write ot the events of interest to a DST and analyse them later as a user job.

Users

Alignment

The alignment group would like to use a ~24h "grace period" to provide alignment constants and cross-check what they are doing: update alignment constants and then again produce monitoring histograms to check that the new alignment makes sense on a sample that is representative for the full run. This cannot easily be done in a monitoring farm. The important point is that the calibration must run in phase with the (re)processing, but it does not need to be real-time.

It should be possible to redo this alignment before a reprocessing. A sample of reconstructed events representative of a given run is needed to redo the alignment constants if needed.

RICH calibration

RICH DataQuality TWiki

Muon ID

Offline processing

Typically the mass scale, i.e. the magnetic field will be determined to the full precision at the stripping level only.

Some calibrations will need detailed user analyses to be made. Typical examples are the D* and Lambda PID calibrations.

  • In principle these calibrations determine high level conditions, like the mis-ID rate.
  • In general the data quality flag in the bookkeeping should not depend on these jobs.
  • One must ensure that all data samples are surveyed by the appropriate jobs.
  • If possible these calibrations should be done automatically in the processing step.

Other sources of calibration

Online calibration

Some quantities will be monitored and calibrated using the monitoring farm. The result will be put in the condition database for use in the processing. See MF.

Calibration farm

This is special farm (so far of one node) that has access to special calibration events which are not saved. The calorimeter is the only user so far.

Conclusion

-- PatrickKoppenburg - 15 Jul 2008

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2008-08-14 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback