Difference: StreamingTaskForce (29 vs. 30)

Revision 302008-02-25 - AnatolySolomin

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 15 to 15
 
<!-- PDFSTART -->

Introduction

Changed:
<
<
When data is collected from the LHCb detector, the raw data will be transferred in quasi real time to the LHCb associated Tier 1 sites for the reconstruction to produce rDST files. The rDST files are used for stripping jobs where events are selected for physics analysis. Events selected in this way are written into DST files and distributed in identical copies to all the Tier 1 sites. These files are then accessible for physics analysis by individual collaborators. The stripping stage might be repeated several times a year with refined selection algorithms.
>
>
When data is collected from the LHCb detector, the raw data will be transferred in quasi real time to the LHCb associated Tier 1 sites for the reconstruction to produce rDST files. The rDST files are used for stripping jobs where events are selected for physics analysis. Events selected in this way are written into DST files and distributed in identical copies to all the Tier 1 sites. These files are then accessible for physics analysis by individual collaborators. The stripping stage might be repeated several times a year with refined selection algorithms.
  This report examines the needs and requirements of streaming at the data collection level as well as in the stripping process. We also look at how the information for the stripping should be made persistent and what bookkeeping
Changed:
<
<
information is required. Several use cases are analysed for the development of
>
>
information is required. Several use cases are analyzed for the development of
 a set of recommendations.

The work leading to this report is based on the streaming task force remit

Line: 38 to 38
 Streaming Task Force Hypernews.

Definition of words

Changed:
<
<
  • A stream refers to the collection of events that are stored in the same physical file for a given run period. Not to be confused with I/O streams in a purely computing context (e.g. streaming of objects into a Root file)
  • A selection is the output of a given selection during the stripping. There will be one or more selections in a given stream. It is expected that a selection should have a typical (large) size of 106 (107) events in 2 fb-1. This means a reduction factor of 2 x 104 (103) compared to the 2 kHz input stream or an equivalent rate of 0.1 (1.0) Hz.
>
>
  • A stream refers to the collection of events that are stored in the same physical file for a given run period. Not to be confused with I/O streams in a purely computing context (e.g. streaming of objects into a Root file).
  • A selection is the output of a given selection during the stripping. There will be one or more selections in a given stream. It is expected that a selection should have a typical (large) size of 106 (107) events in 2 fb-1. This means a reduction factor of 2 x 104 (103) compared to the 2 kHz input stream or an equivalent rate of 0.1 (1.0) Hz.
 

Use cases

A set of use cases to capture the requirements for the streaming were
Changed:
<
<
analysed:
>
>
analyzed:
  The analysis related to the individual use cases is documented in the Wiki pages related to the streaming task force.

Experience from other experiments

Changed:
<
<
Other experiments with large data volumes have valuable experience. Below are two examples of what is done elsewhere.
>
>
Other experiments with large data volumes have valuable experience. Below are two examples of what is done elsewhere.
 

D0

Line: 73 to 88
 time but there is no difference in the type of events going to each of them. The stream is buffered until the first stream has finished processing the run and updated the conditions. It is also checked that the new conditions
Changed:
<
<
have migrated to the remote centres and that they (by manual inspection) look
>
>
have migrated to the remote centers and that they (by manual inspection) look
 reasonable. When the green light is given (typically in less than 24h) the reconstruction takes place at 4 remote sites (hence the 4 files above).
Line: 89 to 104
 

BaBar

In BaBar there are a few different streams from the detector. A few for
Changed:
<
<
detector calibration like $e^+ e^- \rightarrow e^+ e^-$ (Bhabha events)
>
>
detector calibration like e+e- → e+e-
<!--$e^+ e^- \rightarrow e^+ e^-$-->
(Bhabha events)
 are prescaled to give the correct rate independent of luminosity. The dominant stream where nearly all physics come from is the hadronic stream. This large stream is not processed until the calibration constants are ready from the processing of the calibration streams for a given run.
Changed:
<
<
BaBar initailly operated with a system of rolling calibrations where
>
>
BaBar initially operated with a system of rolling calibrations where
 calibrations for a given run n were used for the reconstruction of run
Changed:
<
<
n+1, using the socalled 'AllEvents' stream. In this way the full statistics was available for the calibrations, there was no double processing of events but the conditions were always one run late. A consequence of this setup was that runs had to be processed sequentially, in chronological order, introducing scaling problems. The scaling problems were worsened by the fact that individual runs were processed on large farms of CPUs, and harvesting the calibration data, originating from the large number of jobs running in parallel, introduced a severe limit on the scalability of the processing farm. These limits on scalability were succesfully removed by splitting the process of rolling calibrations from the processing of the data. Since the calibration only requires a very small fraction of the events recorded, these events could easily be separated by the trigger. Next this calibration stream is processed (in chronological order) as before, producing a rolling calibration. As the event rate is limited, scaling of this 'prompt calibration' pass is not a problem. Once the calibration constants for a given run have been determined in this way and have been propagated into a conditions database, the processing of the 'main stream' for that run is possible. Note that in this system the processing of the main physics data uses the calibrations constants obtained from the same run, and the processing of the 'main stream' is not restricted to a strict sequential, chronological order, but can be done for each run independently, on a collection of computing farms. This allows for easy scaling of the processing.
>
>
n+1, using the so called 'AllEvents' stream. In this way the full statistics was available for the calibrations, there was no double processing of events but the conditions were always one run late. A consequence of this setup was that runs had to be processed sequentially, in chronological order, introducing scaling problems. The scaling problems were worsened by the fact that individual runs were processed on large farms of CPUs, and harvesting the calibration data, originating from the large number of jobs running in parallel, introduced a severe limit on the scalability of the processing farm. These limits on scalability were successfully removed by splitting the process of rolling calibrations from the processing of the data. Since the calibration only requires a very small fraction of the events recorded, these events could easily be separated by the trigger. Next this calibration stream is processed (in chronological order) as before, producing a rolling calibration. As the event rate is limited, scaling of this 'prompt calibration' pass is not a problem. Once the calibration constants for a given run have been determined in this way and have been propagated into a conditions database, the processing of the 'main stream' for that run is possible. Note that in this system the processing of the main physics data uses the calibrations constants obtained from the same run, and the processing of the 'main stream' is not restricted to a strict sequential, chronological order, but can be done for each run independently, on a collection of computing farms. This allows for easy scaling of the processing.
  The reconstructed data is fed into a subsequent stripping job that writes out DST files. On the order of 100 files are written with some of them containing
Line: 134 to 151
 only ever been done on the full hadronic sample.

Proposal

Changed:
<
<
Here follows the recomendations of the task force.
>
>
Here follows the recommendations of the task force.
 

Streams from detector

A single bulk stream should be written from the online farm. The advantage of this compared to a solution where several streams are written based on triggers
Line: 144 to 161
  detector stream there is no way to avoid duplication of events. To sort this out later in an analysis would be error prone.
Changed:
<
<
The disadvantages are
>
>
The disadvantages are:
 
  • It becomes harder to reprocess a smaller amount of the dataset according to the HLT selections (it might involve sparse reading). Experience from past experiments shows that this rarely happens.
  • It is not possible to give special priority to a specific high priority analysis with a narrow exclusive trigger. As nearly every analysis will
Changed:
<
<
rely on larger selections for their result (normalisation to J/$\Psi$ signal, flavour tagging calibration) this seems in any case an unlikely
>
>
rely on larger selections for their result (normalization to J/Ψ signal, flavor tagging calibration) this seems in any case an unlikely
  scenario.
Changed:
<
<
With more exclusive HLT selections later in the lifetime of LHCb the arguments might change and could at that point force a rethink.
>
>
With more exclusive HLT selections later in the lifetime of LHCb the arguments might change and could at that point force a rethink.
  Many experiments use a hot stream for providing calibration and monitoring of the detector as described in the sections on how streams are treated in
Changed:
<
<
BaBar and D0. In LHCb this should be completely covered within the monitoring
>
>
BaBar and D0. In LHCb this should be completely covered within the monitoring
 farm. To be able to debug problems with alignment and calibration performed in the monitoring farm a facility should be developed to persist the events used for this task. These events would effectively be a second very low rate
Changed:
<
<
stream. The events would only be useful for debugging the behaviour of tasks
>
>
stream. The events would only be useful for debugging the behavior of tasks
 carried out in the monitoring farm.

Processing timing

To avoid a backlog it is required that the time between when data is collected and reconstructed is kept to a minimum. As the first stripping will take place at the same time this means that all calibration required for this has to be
Changed:
<
<
done in the monitoring farm. It is adviseable to delay the processing
>
>
done in the monitoring farm. It is advisable to delay the processing
 for a short period (8 hours?) allowing shifters to give a green light for reconstruction. If problems are discovered a run will be marked as bad and the reconstruction postponed or abandoned.

Number of streams in stripping

Considering the low level of overlap between different selections, as
Changed:
<
<
documented in the page the appendix on correlations, it is a clear recommendation that we group selections into a
>
>
documented in the page the appendix on correlations, it is a clear recommendation that we group selections into a
 small number of streams. This has some clear advantages compared to a single stream:
  • Limited sparse reading of files. All selections will make up 10% or more
Line: 192 to 210
  by the ROOT team, with the introduction of a flag to disable reading of the streaming information.
Changed:
<
<
The disadvantages are very limited.
>
>
The disadvantages are very limited:
 
  • An analysis might cover more than one stream making it harder to deal
Changed:
<
<
with double counting of events. Lets take the Bs $\rightarrow$ μ+μ- analysis as an example. The signal will come from the two-body stream while the BR normalisation will come from the J/$\Psi$ stream. In this case the double counting doesn't matter though so the objection is not real. If the signal itself is extracted from more than one stream there is a design error in the stripping for that analysis.
>
>
with double counting of events. Lets take the Bs → μ+μ- analysis as an example. The signal will come from the two-body stream while the BR normalization will come from the J/Ψ stream. In this case the double counting doesn't matter though so the objection is not real. If the signal itself is extracted from more than one stream there is a design error in the stripping for that analysis.
 
  • Data will be duplicated. According to the analysis based on the DC04 TDR selections the duplication will be very limited. If we are limited in available disk space we should reconsider the mirroring of all stripped
Line: 215 to 233
 reading.

In total one might expect around 30 streams from the stripping, each with

Changed:
<
<
around $10^7$ events in 2 inverse fb of integrated
>
>
around 107 events in 2 fb-1 of integrated
 luminosity. This can be broken down as:
Changed:
<
<
  • Around 20 physics analysis streams of $10^7$ events each. There will
>
>
  • Around 20 physics analysis streams of 107 events each. There will
  most likely be significant variation in size between the individual streams.
  • Random events that will be used for developing new selections. To get
Changed:
<
<
reasonable statistics for a selection with a reduction factor of $10^5$ a sample of $10^7$ events will be required. This will make it equivalent to a single large selection.
>
>
reasonable statistics for a selection with a reduction factor of 105 a sample of 107 events will be required. This will make it equivalent to a single large selection.
 
  • A stream for understanding the trigger. This stream is likely to have a large overlap with the physics streams but for efficient trigger studies this can't be avoided.
Line: 230 to 247
  particle identification.
  • A stream with random triggers after L0 to allow for the development of new code in the HLT. As a narrow exclusive HLT trigger might have a
Changed:
<
<
rejection factor of $10^5$ (corresponding to 10 Hz) a sample of $10^7$ is again a reasonable size.
>
>
rejection factor of 105 (corresponding to 10 Hz) a sample of 107 is again a reasonable size.
 

Monte Carlo data

Data from inclusive and "cocktail" simulations will pass through the stripping
Line: 241 to 257
 management overheads involved in doing anything else will be excessive.

Meta data in relation to selection and stripping

Changed:
<
<
As outlined in the use cases every analysis requires additional information about what is analysed apart from the information in the events themselves.
>
>
As outlined in the use cases every analysis requires additional information about what is analyzed apart from the information in the events themselves.
 

Bookkeeping information required

From a database with the meta data from the stripping is should be possible to:
Line: 250 to 267
  data spread across several files and a problem could be present with just one of them.
  • For an arbitrary list of files that went into a selection obtain some _B
Changed:
<
<
counting_ numbers that can be used for normalising branching ratios. This
>
>
counting_ numbers that can be used for normalizing branching ratios. This
  number might be calculated during the stripping phase.
Changed:
<
<
  • To correct the above numbers when a given file turns unreadable (i.e. should know exactly which runs contributed to a given file).
  • When the stripping was performed to be able to recover the exact conditions used during the stripping.
>
>
  • To correct the above numbers when a given file turns unreadable (i.e. should know exactly which runs contributed to a given file).
  • When the stripping was performed to be able to recover the exact conditions used during the stripping.
  It is urgent to start a review of exactly what extra information is required for this type of bookkeeping information as well as how the information is
Line: 277 to 296
 change at a time after the stripping has taken place. Also it might be required to identify which runs were classified as good at some time in the past to judge if some past analysis was affected by what was later identified
Changed:
<
<
as bad data. When selecting data for an analysis this information should be available thus putting a requirement on the bookkeeping system to be able to interrogate the conditions.
>
>
as bad data. When selecting data for an analysis this information should be available thus putting a requirement on the bookkeeping system to be able to interrogate the conditions.
 

Procedure for including selections in the stripping

The note

Changed:
<
<
LHCb-2004-031 describes the (somewhat obsolete) guidelines to follow when providing
>
>
LHCb-2004-031 describes the (somewhat obsolete) guidelines to follow when providing
 a new selection and there are released Python tools that check these
Changed:
<
<
guidelines. However, the experience with organising stripping jobs is poor:
>
>
guidelines. However, the experience with organizing stripping jobs is poor:
 for DC04 only 3 out of 29 preselections were compliant in the tests and for DC06 it is a long battle to obtain a stripping job with sufficient reduction
Changed:
<
<
and with fast enough execution time. To ease the organisation:
>
>
and with fast enough execution time. To ease the organization:
 
  • Tools should be provided that automate the subscription of a selection to the stripping.
  • The actual cuts applied in the selections
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback