Difference: CalibrationRequirements (1 vs. 7)

Revision 72018-09-23 - MarcoCattaneo

Line: 1 to 1
Changed:
<
<
META TOPICPARENT name="DataQuality"
>
>
META TOPICPARENT name="LHCbInternal.DataQuality"
 

Calibration Requirements using Reconstructed Data

The DQ pages have moved to https://lbtwiki.cern.ch/bin/view/Computing/DataQuality.

Revision 62008-09-25 - unknown

Line: 1 to 1
 
META TOPICPARENT name="DataQuality"

Calibration Requirements using Reconstructed Data

Changed:
<
<
This page summarizes the various requirements for calibration use cases needing reconstructed data and the possible implementations.
>
>
The DQ pages have moved to https://lbtwiki.cern.ch/bin/view/Computing/DataQuality.
 
Deleted:
<
<

Reminder of the processing flow

  1. The event filter farm (EFF) writes out 2 kHz.
  2. They all go through the monitoring farm (MF) .
    1. Some (~50 Hz) are reconstructed in the MF for monitoring purposes.
  3. The data is written in parrallel to 4 files. Each file collects about 60000 events (2 GB) in 2 minutes
  4. The raw data is copied to castor and distributed to Tier1s.
  5. The raw data is reconstructed at Tier1s after the green light has been given for reconstruction. This can be a few days after data taking.
  6. The reconstructed data is stripped once enough reconstructed data is available. This can be several days (weeks?) after data taking.
  7. Steps 5-6 are repeated if needed.

The question discussed in this page is where and how we perform the calibrations that require reconstructed

Sources of reconstructed data

Monitoring Farm

In the monitoring farm we will reconstruct order of 50 Hz using Brunel. Which events are to be reconstructed is defined by the routing bits. This data is used to produce histograms that will be analysed in real time and stored.
  • All monitoring done at this level is in real time.
  • It is not foreseen to save this data, but it could be done.
    • The data is not in root format, so the most easy would be to save it as a MDF file. Such a format could then only be read in by the same version of the event model.

Hot stream

A special calibration stream, already mentioned by the Streaming Task Force, was advocated. We could have a low rate of "hot" events suitable for calibration purposes, like alignment or PID, to be forked off the standard data flow, reconstructed and made available to experts for analysis.
  • This data would have to be reconstructed, probably at the pit (PLUS farm).

Analysis

From this point the MF and hot streams are equivalent.
  • The PLUS farm could be used to analyse them. There is a buffer of 30 TB at the pit (i.e. 10^9 events, or 6 days of 100% efficient running at 2 kHz), of which some could be used for this data. In principle data is deleted after some time but one could pin down some data for later use. This data should be used quickly and not kept for long time anyway. It is still possible to copy some of the data to some scratch space or laptop if needed.
  • One could migrate to castor.
    • Even distribute to Tier1s?
It all depends on the timescale during which we need this data. In 2008 we are likely to need all the data all the time. But what about 2009? Will we ever look at this data once the processing has been done?

Offline reconstructed data

The 2kHz data will be distributed to Tier1s and reconstructed there. This will happen only after the green light has been given by the DQ team, typically after a day or so. Any monitoring that does not need immediate feedback to the detector or is input to the reconstruction could run there. The output will be histograms which will be shipped back to CERN. The output of the reconstruction are rDST files. It is not foreseen to run jobs on this data. The reconstruction is done at the file level, i.e. about 60000 events from the same time interval of 2 minutes.

Stripping

After enough reconstructed rDST data has been collected on a Tier1 the stripping is run. This can happen a long time after the reconstruction and there is no ordering of the events guaranteed. Monitoring tasks can be performed there as well. Although it is probably more practical to write ot the events of interest to a DST and analyse them later as a user job.

Users

Alignment

The alignment group would like to use a ~24h "grace period" to provide alignment constants and cross-check what they are doing: update alignment constants and then again produce monitoring histograms to check that the new alignment makes sense on a sample that is representative for the full run. This cannot easily be done in a monitoring farm. The important point is that the calibration must run in phase with the (re)processing, but it does not need to be real-time.

It should be possible to redo this alignment before a reprocessing. A sample of reconstructed events representative of a given run is needed to redo the alignment constants if needed.

RICH calibration

RICH DataQuality TWiki

Muon ID

Offline processing

Typically the mass scale, i.e. the magnetic field will be determined to the full precision at the stripping level only.

Some calibrations will need detailed user analyses to be made. Typical examples are the D* and Lambda PID calibrations.

  • In principle these calibrations determine high level conditions, like the mis-ID rate.
  • In general the data quality flag in the bookkeeping should not depend on these jobs.
  • One must ensure that all data samples are surveyed by the appropriate jobs.
  • If possible these calibrations should be done automatically in the processing step.

Other sources of calibration

Online calibration

Some quantities will be monitored and calibrated using the monitoring farm. The result will be put in the condition database for use in the processing. See MF.

Calibration farm

This is special farm (so far of one node) that has access to special calibration events which are not saved. The calorimeter is the only user so far.

Conclusion

  -- PatrickKoppenburg - 15 Jul 2008 \ No newline at end of file

Revision 52008-08-14 - unknown

Line: 1 to 1
 
META TOPICPARENT name="DataQuality"

Calibration Requirements using Reconstructed Data

Line: 52 to 52
 

Muon ID

Added:
>
>

Offline processing

Typically the mass scale, i.e. the magnetic field will be determined to the full precision at the stripping level only.

Some calibrations will need detailed user analyses to be made. Typical examples are the D* and Lambda PID calibrations.

  • In principle these calibrations determine high level conditions, like the mis-ID rate.
  • In general the data quality flag in the bookkeeping should not depend on these jobs.
  • One must ensure that all data samples are surveyed by the appropriate jobs.
  • If possible these calibrations should be done automatically in the processing step.

Other sources of calibration

Online calibration

Some quantities will be monitored and calibrated using the monitoring farm. The result will be put in the condition database for use in the processing. See MF.

Calibration farm

This is special farm (so far of one node) that has access to special calibration events which are not saved. The calorimeter is the only user so far.
 

Conclusion

Revision 42008-08-07 - unknown

Line: 1 to 1
 
META TOPICPARENT name="DataQuality"

Calibration Requirements using Reconstructed Data

Line: 30 to 30
 
  • This data would have to be reconstructed, probably at the pit (PLUS farm).

Analysis

Changed:
<
<
From this point the the MF and hot streams are equivalent.
>
>
From this point the MF and hot streams are equivalent.
 
  • The PLUS farm could be used to analyse them. There is a buffer of 30 TB at the pit (i.e. 10^9 events, or 6 days of 100% efficient running at 2 kHz), of which some could be used for this data. In principle data is deleted after some time but one could pin down some data for later use. This data should be used quickly and not kept for long time anyway. It is still possible to copy some of the data to some scratch space or laptop if needed.
  • One could migrate to castor.
    • Even distribute to Tier1s?

Revision 32008-07-30 - UlrichKerzel

Line: 1 to 1
 
META TOPICPARENT name="DataQuality"

Calibration Requirements using Reconstructed Data

Line: 48 to 48
 It should be possible to redo this alignment before a reprocessing. A sample of reconstructed events representative of a given run is needed to redo the alignment constants if needed.

RICH calibration

Added:
>
>
RICH DataQuality TWiki
 

Muon ID

Revision 22008-07-15 - unknown

Line: 1 to 1
 
META TOPICPARENT name="DataQuality"

Calibration Requirements using Reconstructed Data

Line: 8 to 8
 

Reminder of the processing flow

  1. The event filter farm (EFF) writes out 2 kHz.
Changed:
<
<
  1. They all go through the monitoring farm (MF) .
>
>
  1. They all go through the monitoring farm (MF) .
 
    1. Some (~50 Hz) are reconstructed in the MF for monitoring purposes.
Changed:
<
<
  1. The raw data is copied to castor and distributed to Tier1s.
  2. The raw data is reconstructed at Tier1s after the green light has been given for reconstruction. This can be a few days after data taking.
  3. The reconstructed data is stripped once enough reconstructed data is available. This can be several days (weeks?) after data taking.
  4. Steps 4-5 are repeated if needed.
>
>
  1. The data is written in parrallel to 4 files. Each file collects about 60000 events (2 GB) in 2 minutes
  2. The raw data is copied to castor and distributed to Tier1s.
  3. The raw data is reconstructed at Tier1s after the green light has been given for reconstruction. This can be a few days after data taking.
  4. The reconstructed data is stripped once enough reconstructed data is available. This can be several days (weeks?) after data taking.
  5. Steps 5-6 are repeated if needed.
 
Changed:
<
<
The question discussed in this page is where and how we perform the calibrations that require reconstructyed
>
>
The question discussed in this page is where and how we perform the calibrations that require reconstructed
 

Sources of reconstructed data

Deleted:
<
<
Last update: Jul 15 2008.
 

Monitoring Farm

In the monitoring farmwe will reconstruct order of 50 Hz using Brunel. Which events are to be reconstructed is defined by the routing bits. This data is used to produce histograms that will be analysed in real time and stored.
  • All monitoring done at this level is in real time.
  • It is not foreseen to save this data, but it could be done.
Changed:
<
<
    • The data is not in root format, so the most easy would be to save it as a MSF file. Such a format could then only be read in by the same version of the event model.
>
>
    • The data is not in root format, so the most easy would be to save it as a MDF file. Such a format could then only be read in by the same version of the event model.
 

Hot stream

Changed:
<
<
A special calibration stream, already mentioned by the Streaming Task Force, was advocated. We would like a lot rate of "hot" events suitable for calibration purposes, like alignment or PID, to be forked off the standard data flow, reconstructed and made available to experts for analysis.
>
>
A special calibration stream, already mentioned by the Streaming Task Force, was advocated. We could have a low rate of "hot" events suitable for calibration purposes, like alignment or PID, to be forked off the standard data flow, reconstructed and made available to experts for analysis.
 
  • This data would have to be reconstructed, probably at the pit (PLUS farm).
Changed:
<
<

Distribution

From this point the the MF and hot streams are reconstructed and equivalent.
>
>

Analysis

From this point the the MF and hot streams are equivalent.
 
  • The PLUS farm could be used to analyse them. There is a buffer of 30 TB at the pit (i.e. 10^9 events, or 6 days of 100% efficient running at 2 kHz), of which some could be used for this data. In principle data is deleted after some time but one could pin down some data for later use. This data should be used quickly and not kept for long time anyway. It is still possible to copy some of the data to some scratch space or laptop if needed.
  • One could migrate to castor.
    • Even distribute to Tier1s?
It all depends on the timescale during which we need this data. In 2008 we are likely to need all the data all the time. But what about 2009? Will we ever look at this data once the processing has been done?
Changed:
<
<

Online calibration

Some quantities will be monitored and calibrated using the monitoring farm. The result will be put in the condition database for use in the processing. See above.

Calibration farm

This is special farm (so far of one node) that has access to special calibration events which are not saved. The calorimeter is the only user so far.
>
>

Offline reconstructed data

The 2kHz data will be distributed to Tier1s and reconstructed there. This will happen only after the green light has been given by the DQ team, typically after a day or so. Any monitoring that does not need immediate feedback to the detector or is input to the reconstruction could run there. The output will be histograms which will be shipped back to CERN. The output of the reconstruction are rDST files. It is not foreseen to run jobs on this data. The reconstruction is done at the file level, i.e. about 60000 events from the same time interval of 2 minutes.

Stripping

After enough reconstructed rDST data has been collected on a Tier1 the stripping is run. This can happen a long time after the reconstruction and there is no ordering of the events guaranteed. Monitoring tasks can be performed there as well. Although it is probably more practical to write ot the events of interest to a DST and analyse them later as a user job.
 
Added:
>
>

Users

 

Alignment

Changed:
<
<
Last update: Jul 8 2008.

Contact person: Wouter Hulsbergen

The alignment of the tracking stations can be monitored online, but if a problem is found one cannot re-run the alignment in the monitoring farm. One needs fully reconstructed events. The hot stream would be ideal.

  • Where would it run?
    • plus farm?
      • Could the monitoring farm save more information (residuals...), allowing the alignment not to have to redo all the track fits?
    • lxbatch?
    • Grid?

The alignment group needs for instance to run several times on reconstructed events.

RICH refractive index calibration

Last update: Jun 20 2008.

It needs tracks. Is this similar to Alignment or can it be done on the Brunel jobs running in the monitoring farm?

Offline processing

Last update_: Jun 20 2008.
>
>
The alignment group would like to use a ~24h "grace period" to provide alignment constants and cross-check what they are doing: update alignment constants and then again produce monitoring histograms to check that the new alignment makes sense on a sample that is representative for the full run. This cannot easily be done in a monitoring farm. The important point is that the calibration must run in phase with the (re)processing, but it does not need to be real-time.
 
Changed:
<
<
Typically the mass scale, i.e. the magnetic field will be determined to the full precision at the stripping level only.
>
>
It should be possible to redo this alignment before a reprocessing. A sample of reconstructed events representative of a given run is needed to redo the alignment constants if needed.
 
Changed:
<
<

Calibration user jobs

Last update: Jun 20 2008.
>
>

RICH calibration

 
Changed:
<
<
Some calibrations will need detailed user analyses to be made. Typical examples are the D* and Lambda PID calibrations.
  • In principle these calibrations determine high level conditions, like the mis-ID rate.
  • In general the data quality flag in the bookkeeping should not depend on these jobs.
  • One must ensure that all data samples are surveyed by the appropriate jobs.
  • If possible these calibrations should be done automatically in the processing step.
>
>

Muon ID

 
Added:
>
>

Conclusion

  -- PatrickKoppenburg - 15 Jul 2008 \ No newline at end of file

Revision 12008-07-15 - unknown

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="DataQuality"

Calibration Requirements using Reconstructed Data

This page summarizes the various requirements for calibration use cases needing reconstructed data and the possible implementations.

Reminder of the processing flow

  1. The event filter farm (EFF) writes out 2 kHz.
  2. They all go through the monitoring farm (MF) .
    1. Some (~50 Hz) are reconstructed in the MF for monitoring purposes.
  3. The raw data is copied to castor and distributed to Tier1s.
  4. The raw data is reconstructed at Tier1s after the green light has been given for reconstruction. This can be a few days after data taking.
  5. The reconstructed data is stripped once enough reconstructed data is available. This can be several days (weeks?) after data taking.
  6. Steps 4-5 are repeated if needed.

The question discussed in this page is where and how we perform the calibrations that require reconstructyed

Sources of reconstructed data

Last update: Jul 15 2008.

Monitoring Farm

In the monitoring farmwe will reconstruct order of 50 Hz using Brunel. Which events are to be reconstructed is defined by the routing bits. This data is used to produce histograms that will be analysed in real time and stored.
  • All monitoring done at this level is in real time.
  • It is not foreseen to save this data, but it could be done.
    • The data is not in root format, so the most easy would be to save it as a MSF file. Such a format could then only be read in by the same version of the event model.

Hot stream

A special calibration stream, already mentioned by the Streaming Task Force, was advocated. We would like a lot rate of "hot" events suitable for calibration purposes, like alignment or PID, to be forked off the standard data flow, reconstructed and made available to experts for analysis.
  • This data would have to be reconstructed, probably at the pit (PLUS farm).

Distribution

From this point the the MF and hot streams are reconstructed and equivalent.
  • The PLUS farm could be used to analyse them. There is a buffer of 30 TB at the pit (i.e. 10^9 events, or 6 days of 100% efficient running at 2 kHz), of which some could be used for this data. In principle data is deleted after some time but one could pin down some data for later use. This data should be used quickly and not kept for long time anyway. It is still possible to copy some of the data to some scratch space or laptop if needed.
  • One could migrate to castor.
    • Even distribute to Tier1s?
It all depends on the timescale during which we need this data. In 2008 we are likely to need all the data all the time. But what about 2009? Will we ever look at this data once the processing has been done?

Online calibration

Some quantities will be monitored and calibrated using the monitoring farm. The result will be put in the condition database for use in the processing. See above.

Calibration farm

This is special farm (so far of one node) that has access to special calibration events which are not saved. The calorimeter is the only user so far.

Alignment

Last update: Jul 8 2008.

Contact person: Wouter Hulsbergen

The alignment of the tracking stations can be monitored online, but if a problem is found one cannot re-run the alignment in the monitoring farm. One needs fully reconstructed events. The hot stream would be ideal.

  • Where would it run?
    • plus farm?
      • Could the monitoring farm save more information (residuals...), allowing the alignment not to have to redo all the track fits?
    • lxbatch?
    • Grid?

The alignment group needs for instance to run several times on reconstructed events.

RICH refractive index calibration

Last update: Jun 20 2008.

It needs tracks. Is this similar to Alignment or can it be done on the Brunel jobs running in the monitoring farm?

Offline processing

Last update_: Jun 20 2008.

Typically the mass scale, i.e. the magnetic field will be determined to the full precision at the stripping level only.

Calibration user jobs

Last update: Jun 20 2008.

Some calibrations will need detailed user analyses to be made. Typical examples are the D* and Lambda PID calibrations.

  • In principle these calibrations determine high level conditions, like the mis-ID rate.
  • In general the data quality flag in the bookkeeping should not depend on these jobs.
  • One must ensure that all data samples are surveyed by the appropriate jobs.
  • If possible these calibrations should be done automatically in the processing step.

-- PatrickKoppenburg - 15 Jul 2008

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback