Difference: StreamingTaskForce (1 vs. 31)

Revision 312008-02-27 - AnatolySolomin

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 167 to 167
  past experiments shows that this rarely happens.
  • It is not possible to give special priority to a specific high priority analysis with a narrow exclusive trigger. As nearly every analysis will
Changed:
<
<
rely on larger selections for their result (normalization to J/Ψ
>
>
rely on larger selections for their result (normalization to J/ψ
  signal, flavor tagging calibration) this seems in any case an unlikely scenario.
Line: 215 to 215
  with double counting of events. Lets take the Bs → μ+μ- analysis as an example. The signal will come from the two-body stream while the BR normalization
Changed:
<
<
will come from the J/Ψ stream. In this case the double
>
>
will come from the J/ψ stream. In this case the double
  counting doesn't matter though so the objection is not real. If the signal itself is extracted from more than one stream there is a design error in the stripping for that analysis.

Revision 302008-02-25 - AnatolySolomin

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 15 to 15
 
<!-- PDFSTART -->

Introduction

Changed:
<
<
When data is collected from the LHCb detector, the raw data will be transferred in quasi real time to the LHCb associated Tier 1 sites for the reconstruction to produce rDST files. The rDST files are used for stripping jobs where events are selected for physics analysis. Events selected in this way are written into DST files and distributed in identical copies to all the Tier 1 sites. These files are then accessible for physics analysis by individual collaborators. The stripping stage might be repeated several times a year with refined selection algorithms.
>
>
When data is collected from the LHCb detector, the raw data will be transferred in quasi real time to the LHCb associated Tier 1 sites for the reconstruction to produce rDST files. The rDST files are used for stripping jobs where events are selected for physics analysis. Events selected in this way are written into DST files and distributed in identical copies to all the Tier 1 sites. These files are then accessible for physics analysis by individual collaborators. The stripping stage might be repeated several times a year with refined selection algorithms.
  This report examines the needs and requirements of streaming at the data collection level as well as in the stripping process. We also look at how the information for the stripping should be made persistent and what bookkeeping
Changed:
<
<
information is required. Several use cases are analysed for the development of
>
>
information is required. Several use cases are analyzed for the development of
 a set of recommendations.

The work leading to this report is based on the streaming task force remit

Line: 38 to 38
 Streaming Task Force Hypernews.

Definition of words

Changed:
<
<
  • A stream refers to the collection of events that are stored in the same physical file for a given run period. Not to be confused with I/O streams in a purely computing context (e.g. streaming of objects into a Root file)
  • A selection is the output of a given selection during the stripping. There will be one or more selections in a given stream. It is expected that a selection should have a typical (large) size of 106 (107) events in 2 fb-1. This means a reduction factor of 2 x 104 (103) compared to the 2 kHz input stream or an equivalent rate of 0.1 (1.0) Hz.
>
>
  • A stream refers to the collection of events that are stored in the same physical file for a given run period. Not to be confused with I/O streams in a purely computing context (e.g. streaming of objects into a Root file).
  • A selection is the output of a given selection during the stripping. There will be one or more selections in a given stream. It is expected that a selection should have a typical (large) size of 106 (107) events in 2 fb-1. This means a reduction factor of 2 x 104 (103) compared to the 2 kHz input stream or an equivalent rate of 0.1 (1.0) Hz.
 

Use cases

A set of use cases to capture the requirements for the streaming were
Changed:
<
<
analysed:
>
>
analyzed:
  The analysis related to the individual use cases is documented in the Wiki pages related to the streaming task force.

Experience from other experiments

Changed:
<
<
Other experiments with large data volumes have valuable experience. Below are two examples of what is done elsewhere.
>
>
Other experiments with large data volumes have valuable experience. Below are two examples of what is done elsewhere.
 

D0

Line: 73 to 88
 time but there is no difference in the type of events going to each of them. The stream is buffered until the first stream has finished processing the run and updated the conditions. It is also checked that the new conditions
Changed:
<
<
have migrated to the remote centres and that they (by manual inspection) look
>
>
have migrated to the remote centers and that they (by manual inspection) look
 reasonable. When the green light is given (typically in less than 24h) the reconstruction takes place at 4 remote sites (hence the 4 files above).
Line: 89 to 104
 

BaBar

In BaBar there are a few different streams from the detector. A few for
Changed:
<
<
detector calibration like $e^+ e^- \rightarrow e^+ e^-$ (Bhabha events)
>
>
detector calibration like e+e- → e+e-
<!--$e^+ e^- \rightarrow e^+ e^-$-->
(Bhabha events)
 are prescaled to give the correct rate independent of luminosity. The dominant stream where nearly all physics come from is the hadronic stream. This large stream is not processed until the calibration constants are ready from the processing of the calibration streams for a given run.
Changed:
<
<
BaBar initailly operated with a system of rolling calibrations where
>
>
BaBar initially operated with a system of rolling calibrations where
 calibrations for a given run n were used for the reconstruction of run
Changed:
<
<
n+1, using the socalled 'AllEvents' stream. In this way the full statistics was available for the calibrations, there was no double processing of events but the conditions were always one run late. A consequence of this setup was that runs had to be processed sequentially, in chronological order, introducing scaling problems. The scaling problems were worsened by the fact that individual runs were processed on large farms of CPUs, and harvesting the calibration data, originating from the large number of jobs running in parallel, introduced a severe limit on the scalability of the processing farm. These limits on scalability were succesfully removed by splitting the process of rolling calibrations from the processing of the data. Since the calibration only requires a very small fraction of the events recorded, these events could easily be separated by the trigger. Next this calibration stream is processed (in chronological order) as before, producing a rolling calibration. As the event rate is limited, scaling of this 'prompt calibration' pass is not a problem. Once the calibration constants for a given run have been determined in this way and have been propagated into a conditions database, the processing of the 'main stream' for that run is possible. Note that in this system the processing of the main physics data uses the calibrations constants obtained from the same run, and the processing of the 'main stream' is not restricted to a strict sequential, chronological order, but can be done for each run independently, on a collection of computing farms. This allows for easy scaling of the processing.
>
>
n+1, using the so called 'AllEvents' stream. In this way the full statistics was available for the calibrations, there was no double processing of events but the conditions were always one run late. A consequence of this setup was that runs had to be processed sequentially, in chronological order, introducing scaling problems. The scaling problems were worsened by the fact that individual runs were processed on large farms of CPUs, and harvesting the calibration data, originating from the large number of jobs running in parallel, introduced a severe limit on the scalability of the processing farm. These limits on scalability were successfully removed by splitting the process of rolling calibrations from the processing of the data. Since the calibration only requires a very small fraction of the events recorded, these events could easily be separated by the trigger. Next this calibration stream is processed (in chronological order) as before, producing a rolling calibration. As the event rate is limited, scaling of this 'prompt calibration' pass is not a problem. Once the calibration constants for a given run have been determined in this way and have been propagated into a conditions database, the processing of the 'main stream' for that run is possible. Note that in this system the processing of the main physics data uses the calibrations constants obtained from the same run, and the processing of the 'main stream' is not restricted to a strict sequential, chronological order, but can be done for each run independently, on a collection of computing farms. This allows for easy scaling of the processing.
  The reconstructed data is fed into a subsequent stripping job that writes out DST files. On the order of 100 files are written with some of them containing
Line: 134 to 151
 only ever been done on the full hadronic sample.

Proposal

Changed:
<
<
Here follows the recomendations of the task force.
>
>
Here follows the recommendations of the task force.
 

Streams from detector

A single bulk stream should be written from the online farm. The advantage of this compared to a solution where several streams are written based on triggers
Line: 144 to 161
  detector stream there is no way to avoid duplication of events. To sort this out later in an analysis would be error prone.
Changed:
<
<
The disadvantages are
>
>
The disadvantages are:
 
  • It becomes harder to reprocess a smaller amount of the dataset according to the HLT selections (it might involve sparse reading). Experience from past experiments shows that this rarely happens.
  • It is not possible to give special priority to a specific high priority analysis with a narrow exclusive trigger. As nearly every analysis will
Changed:
<
<
rely on larger selections for their result (normalisation to J/$\Psi$ signal, flavour tagging calibration) this seems in any case an unlikely
>
>
rely on larger selections for their result (normalization to J/Ψ signal, flavor tagging calibration) this seems in any case an unlikely
  scenario.
Changed:
<
<
With more exclusive HLT selections later in the lifetime of LHCb the arguments might change and could at that point force a rethink.
>
>
With more exclusive HLT selections later in the lifetime of LHCb the arguments might change and could at that point force a rethink.
  Many experiments use a hot stream for providing calibration and monitoring of the detector as described in the sections on how streams are treated in
Changed:
<
<
BaBar and D0. In LHCb this should be completely covered within the monitoring
>
>
BaBar and D0. In LHCb this should be completely covered within the monitoring
 farm. To be able to debug problems with alignment and calibration performed in the monitoring farm a facility should be developed to persist the events used for this task. These events would effectively be a second very low rate
Changed:
<
<
stream. The events would only be useful for debugging the behaviour of tasks
>
>
stream. The events would only be useful for debugging the behavior of tasks
 carried out in the monitoring farm.

Processing timing

To avoid a backlog it is required that the time between when data is collected and reconstructed is kept to a minimum. As the first stripping will take place at the same time this means that all calibration required for this has to be
Changed:
<
<
done in the monitoring farm. It is adviseable to delay the processing
>
>
done in the monitoring farm. It is advisable to delay the processing
 for a short period (8 hours?) allowing shifters to give a green light for reconstruction. If problems are discovered a run will be marked as bad and the reconstruction postponed or abandoned.

Number of streams in stripping

Considering the low level of overlap between different selections, as
Changed:
<
<
documented in the page the appendix on correlations, it is a clear recommendation that we group selections into a
>
>
documented in the page the appendix on correlations, it is a clear recommendation that we group selections into a
 small number of streams. This has some clear advantages compared to a single stream:
  • Limited sparse reading of files. All selections will make up 10% or more
Line: 192 to 210
  by the ROOT team, with the introduction of a flag to disable reading of the streaming information.
Changed:
<
<
The disadvantages are very limited.
>
>
The disadvantages are very limited:
 
  • An analysis might cover more than one stream making it harder to deal
Changed:
<
<
with double counting of events. Lets take the Bs $\rightarrow$ μ+μ- analysis as an example. The signal will come from the two-body stream while the BR normalisation will come from the J/$\Psi$ stream. In this case the double counting doesn't matter though so the objection is not real. If the signal itself is extracted from more than one stream there is a design error in the stripping for that analysis.
>
>
with double counting of events. Lets take the Bs → μ+μ- analysis as an example. The signal will come from the two-body stream while the BR normalization will come from the J/Ψ stream. In this case the double counting doesn't matter though so the objection is not real. If the signal itself is extracted from more than one stream there is a design error in the stripping for that analysis.
 
  • Data will be duplicated. According to the analysis based on the DC04 TDR selections the duplication will be very limited. If we are limited in available disk space we should reconsider the mirroring of all stripped
Line: 215 to 233
 reading.

In total one might expect around 30 streams from the stripping, each with

Changed:
<
<
around $10^7$ events in 2 inverse fb of integrated
>
>
around 107 events in 2 fb-1 of integrated
 luminosity. This can be broken down as:
Changed:
<
<
  • Around 20 physics analysis streams of $10^7$ events each. There will
>
>
  • Around 20 physics analysis streams of 107 events each. There will
  most likely be significant variation in size between the individual streams.
  • Random events that will be used for developing new selections. To get
Changed:
<
<
reasonable statistics for a selection with a reduction factor of $10^5$ a sample of $10^7$ events will be required. This will make it equivalent to a single large selection.
>
>
reasonable statistics for a selection with a reduction factor of 105 a sample of 107 events will be required. This will make it equivalent to a single large selection.
 
  • A stream for understanding the trigger. This stream is likely to have a large overlap with the physics streams but for efficient trigger studies this can't be avoided.
Line: 230 to 247
  particle identification.
  • A stream with random triggers after L0 to allow for the development of new code in the HLT. As a narrow exclusive HLT trigger might have a
Changed:
<
<
rejection factor of $10^5$ (corresponding to 10 Hz) a sample of $10^7$ is again a reasonable size.
>
>
rejection factor of 105 (corresponding to 10 Hz) a sample of 107 is again a reasonable size.
 

Monte Carlo data

Data from inclusive and "cocktail" simulations will pass through the stripping
Line: 241 to 257
 management overheads involved in doing anything else will be excessive.

Meta data in relation to selection and stripping

Changed:
<
<
As outlined in the use cases every analysis requires additional information about what is analysed apart from the information in the events themselves.
>
>
As outlined in the use cases every analysis requires additional information about what is analyzed apart from the information in the events themselves.
 

Bookkeeping information required

From a database with the meta data from the stripping is should be possible to:
Line: 250 to 267
  data spread across several files and a problem could be present with just one of them.
  • For an arbitrary list of files that went into a selection obtain some _B
Changed:
<
<
counting_ numbers that can be used for normalising branching ratios. This
>
>
counting_ numbers that can be used for normalizing branching ratios. This
  number might be calculated during the stripping phase.
Changed:
<
<
  • To correct the above numbers when a given file turns unreadable (i.e. should know exactly which runs contributed to a given file).
  • When the stripping was performed to be able to recover the exact conditions used during the stripping.
>
>
  • To correct the above numbers when a given file turns unreadable (i.e. should know exactly which runs contributed to a given file).
  • When the stripping was performed to be able to recover the exact conditions used during the stripping.
  It is urgent to start a review of exactly what extra information is required for this type of bookkeeping information as well as how the information is
Line: 277 to 296
 change at a time after the stripping has taken place. Also it might be required to identify which runs were classified as good at some time in the past to judge if some past analysis was affected by what was later identified
Changed:
<
<
as bad data. When selecting data for an analysis this information should be available thus putting a requirement on the bookkeeping system to be able to interrogate the conditions.
>
>
as bad data. When selecting data for an analysis this information should be available thus putting a requirement on the bookkeeping system to be able to interrogate the conditions.
 

Procedure for including selections in the stripping

The note

Changed:
<
<
LHCb-2004-031 describes the (somewhat obsolete) guidelines to follow when providing
>
>
LHCb-2004-031 describes the (somewhat obsolete) guidelines to follow when providing
 a new selection and there are released Python tools that check these
Changed:
<
<
guidelines. However, the experience with organising stripping jobs is poor:
>
>
guidelines. However, the experience with organizing stripping jobs is poor:
 for DC04 only 3 out of 29 preselections were compliant in the tests and for DC06 it is a long battle to obtain a stripping job with sufficient reduction
Changed:
<
<
and with fast enough execution time. To ease the organisation:
>
>
and with fast enough execution time. To ease the organization:
 
  • Tools should be provided that automate the subscription of a selection to the stripping.
  • The actual cuts applied in the selections

Revision 292008-02-25 - AnatolySolomin

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 41 to 41
 
  • A stream refers to the collection of events that are stored in the same physical file for a given run period. Not to be confused with I/O streams in a purely computing context (e.g. streaming of objects into a Root file)
  • A selection is the output of a given selection during the stripping. There will be one or more selections in a given stream. It is
Changed:
<
<
expected that a selection should have a typical (large) size of $10^6$ ($10^7$) events in 2 inverse fb. This means a reduction reduction factor of 2 x $10^4$ ($10^3$) compared to the 2 kHz input
>
>
expected that a selection should have a typical (large) size of 106 (107) events in 2 fb-1. This means a reduction factor of 2 x 104 (103) compared to the 2 kHz input
  stream or an equivalent rate of 0.1 (1.0) Hz.

Use cases

Revision 282007-11-07 - TWikiGuest

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 326 to 326
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169645316" name="f6912d38a7a798f44515748ea9a19bf6.png" path="f6912d38a7a798f44515748ea9a19bf6.png" size="482" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169645393" name="b1cf191d6109f3c24096672312838e5f.png" path="b1cf191d6109f3c24096672312838e5f.png" size="272" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1165927379" name="396a8c0267f4feffdaadf71c2911d8bc.png" path="396a8c0267f4feffdaadf71c2911d8bc.png" size="275" user="UnknownUser" version=""
Added:
>
>
META FILEATTACHMENT attachment="latex1889e7a180123c8749650782854bff4a.png" attr="h" comment="" name="latex1889e7a180123c8749650782854bff4a.png" stream="GLOB(0x9e2d484)" user="Main.TWikiGuest" version="1"
META FILEATTACHMENT attachment="latex49ace87ea89a9d7058fc20f02c3312ef.png" attr="h" comment="" name="latex49ace87ea89a9d7058fc20f02c3312ef.png" stream="GLOB(0xa28b3e0)" user="Main.TWikiGuest" version="1"
META FILEATTACHMENT attachment="latex179429aa4fae6d21f6219ea1a2db894c.png" attr="h" comment="" name="latex179429aa4fae6d21f6219ea1a2db894c.png" stream="GLOB(0x9951b4c)" user="Main.TWikiGuest" version="1"
META FILEATTACHMENT attachment="latex5f2305015b92a3c1cd2280a4a0326109.png" attr="h" comment="" name="latex5f2305015b92a3c1cd2280a4a0326109.png" stream="GLOB(0xa2bcb18)" user="Main.TWikiGuest" version="1"
META FILEATTACHMENT attachment="latex3d3dc03c690d5fb532749257b24bde5a.png" attr="h" comment="" name="latex3d3dc03c690d5fb532749257b24bde5a.png" stream="GLOB(0xa299af8)" user="Main.TWikiGuest" version="1"
META FILEATTACHMENT attachment="latex364571e0473c9db81ab6eb46d30660ab.png" attr="h" comment="" name="latex364571e0473c9db81ab6eb46d30660ab.png" stream="GLOB(0xa31301c)" user="Main.TWikiGuest" version="1"
META FILEATTACHMENT attachment="latex63ef4c2f67f0a94a0d8f73ffc152fe81.png" attr="h" comment="" name="latex63ef4c2f67f0a94a0d8f73ffc152fe81.png" stream="GLOB(0xa37f640)" user="Main.TWikiGuest" version="1"
META FILEATTACHMENT attachment="latexa02788eb0f18ebb61f9828a3bcd16f1a.png" attr="h" comment="" name="latexa02788eb0f18ebb61f9828a3bcd16f1a.png" stream="GLOB(0xa2d8e48)" user="Main.TWikiGuest" version="1"
META FILEATTACHMENT attachment="latex60233bf672fef5a3e64f07807eca09f9.png" attr="h" comment="" name="latex60233bf672fef5a3e64f07807eca09f9.png" stream="GLOB(0xa29d794)" user="Main.TWikiGuest" version="1"

Revision 262007-01-24 - UlrikEgede

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 35 to 35
  More background for the discussions leading to the recommendations in this report can be found in the
Changed:
<
<
[[https://hypernews.cern.ch/HyperNews/LHCb/get/streamingTaskForce.html][Streaming Task Force Hypernews]].
>
>
Streaming Task Force Hypernews.
 

Definition of words

  • A stream refers to the collection of events that are stored in the same physical file for a given run period. Not to be confused with I/O streams in a purely computing context (e.g. streaming of objects into a Root file)
  • A selection is the output of a given selection during the stripping. There will be one or more selections in a given stream. It is
Changed:
<
<
expected that a selection should have a typical (large) size of $10^6 (10^7)$ events in 2 ifb. This means a reduction reduction factor of $2 \times 10^4 (10^3)$ compared to the 2 kHz input
>
>
expected that a selection should have a typical (large) size of $10^6$ ($10^7$) events in 2 inverse fb. This means a reduction reduction factor of 2 x $10^4$ ($10^3$) compared to the 2 kHz input
  stream or an equivalent rate of 0.1 (1.0) Hz.

Use cases

A set of use cases to capture the requirements for the streaming were analysed:
Changed:
<
<
>
>
  The analysis related to the individual use cases is documented in the Wiki pages related to the streaming task force.

Experience from other experiments

Changed:
<
<
Other experiments with large data volumes will have valuable experience. Below follows an overview of what is done elsewhere.
>
>
Other experiments with large data volumes have valuable experience. Below are two examples of what is done elsewhere.
 

D0

In D0 the data from the detector has two streams. The first stream is of very low rate and selected in their L3 trigger. It is reconstructed more or less

Changed:
<
<
straight away and its use seems similar to the tasks we will perform in the
>
>
straight away and its use is similar to the tasks we will perform in the
 monitoring farm. The second stream contains all triggered data (including all of the first stream). Internally the stream is written to 4 files at any given time but there is no difference in the type of events going to each of
Line: 152 to 151
  past experiments shows that this rarely happens.
  • It is not possible to give special priority to a specific high priority analysis with a narrow exclusive trigger. As nearly every analysis will
Changed:
<
<
rely on larger selections for their result (normalisation to $J/\Psi$
>
>
rely on larger selections for their result (normalisation to J/$\Psi$
  signal, flavour tagging calibration) this seems in any case an unlikely scenario.
Changed:
<
<
With later more exclusive HLT selections the arguments might change and could
>
>
With more exclusive HLT selections later in the lifetime of LHCb the arguments might change and could
 at that point force a rethink.

Many experiments use a hot stream for providing calibration and monitoring

Line: 179 to 178
 

Number of streams in stripping

Considering the low level of overlap between different selections, as
Changed:
<
<
documented in the page [[STFSelectionCorrelations][the appendix on correlations]], it is a clear recommendation that we group selections into a
>
>
documented in the page the appendix on correlations, it is a clear recommendation that we group selections into a
 small number of streams. This has some clear advantages compared to a single stream:
  • Limited sparse reading of files. All selections will make up 10% or more
Line: 197 to 195
  The disadvantages are very limited.
  • An analysis might cover more than one stream making it harder to deal
Changed:
<
<
with double counting of events. Lets take the $B_s \rightarrow \mu^+    mu^-$ analysis as an example. The signal will come from the two-body stream while the BR normalisation will come from the $J/\Psi$ stream. In
>
>
with double counting of events. Lets take the Bs $\rightarrow$ μ+μ- analysis as an example. The signal will come from the two-body stream while the BR normalisation will come from the J/$\Psi$ stream. In
  this case the double counting doesn't matter though so the objection is not real. If the signal itself is extracted from more than one stream there is a design error in the stripping for that analysis.
Line: 216 to 215
 given selection should form less than 10% of a stream to avoid too sparse reading.
Changed:
<
<
In total one might expect around 20 streams from the stripping, each with around $10^7$ events in 2 ifb of integrated
>
>
In total one might expect around 30 streams from the stripping, each with around $10^7$ events in 2 inverse fb of integrated
 luminosity. This can be broken down as:
  • Around 20 physics analysis streams of $10^7$ events each. There will most likely be significant variation in size between the individual streams.
Line: 227 to 226
  equivalent to a single large selection.
  • A stream for understanding the trigger. This stream is likely to have a large overlap with the physics streams but for efficient trigger studies
Changed:
<
<
this is can't be avoided.
>
>
this can't be avoided.
 
  • A few streams for detailed calibration of alignment, tracking and particle identification.
  • A stream with random triggers after L0 to allow for the development of
Line: 293 to 292
 
  • Tools should be provided that automate the subscription of a selection to the stripping.
  • The actual cuts applied in the selections
Changed:
<
<
should be considered as the responsibility of the physics WGs
>
>
should be considered as the responsibility of the physics WGs.
 
  • We suggest the nomination of stripping coordinators in each WG. They are likely to be the same person as the "standard particles" coordinators.
  • If a subscribed selection fails automatic tests for a new round of
Line: 307 to 306
 
Changed:
<
<
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1165927380" name="32bc47d7fddd326831f0424345ba37ef.png" path="32bc47d7fddd326831f0424345ba37ef.png" size="579" user="UnknownUser" version=""
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169645056" name="261ac10953248fb8f231e34d43e8af1a.png" path="261ac10953248fb8f231e34d43e8af1a.png" size="544" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169644912" name="9566e46b57dcaf4dd6bf01234c72196a.png" path="9566e46b57dcaf4dd6bf01234c72196a.png" size="268" user="UnknownUser" version=""
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1153476804" name="8ca4d69364f47c4ba2a7783b0f54e0f0.png" path="8ca4d69364f47c4ba2a7783b0f54e0f0.png" size="537" user="UnknownUser" version=""
Changed:
<
<
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1165929059" name="cb49732765e9ee2a69ff7c3286639bf7.png" path="cb49732765e9ee2a69ff7c3286639bf7.png" size="487" user="UnknownUser" version=""
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169643999" name="8591947ab747b35b6c49905385d9d116.png" path="8591947ab747b35b6c49905385d9d116.png" size="278" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169643998" name="4899fb44f14867ddc63aa25d835c547f.png" path="4899fb44f14867ddc63aa25d835c547f.png" size="278" user="UnknownUser" version=""
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1165927381" name="8ec8f1234e57f139a068e89eb3b2e5fa.png" path="8ec8f1234e57f139a068e89eb3b2e5fa.png" size="273" user="UnknownUser" version=""
Changed:
<
<
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1165927380" name="b6270bfbee0e25d7d565330d2e47bbba.png" path="b6270bfbee0e25d7d565330d2e47bbba.png" size="333" user="UnknownUser" version=""
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169645436" name="e0394c9bcdb655a873154a051809e726.png" path="e0394c9bcdb655a873154a051809e726.png" size="276" user="UnknownUser" version=""
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1165927380" name="190242186166c1bed8070cf90c11b750.png" path="190242186166c1bed8070cf90c11b750.png" size="397" user="UnknownUser" version=""
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169645158" name="bc7a9ebf107b36e638cc6e14e1fec82f.png" path="bc7a9ebf107b36e638cc6e14e1fec82f.png" size="462" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169645026" name="bbb43df5c942223a671f9431db08c927.png" path="bbb43df5c942223a671f9431db08c927.png" size="464" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169645436" name="0bd27889fd320c5206ebab1b3646e1f4.png" path="0bd27889fd320c5206ebab1b3646e1f4.png" size="384" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169645393" name="0a183ed5142c1166275da8fb1cbbd43f.png" path="0a183ed5142c1166275da8fb1cbbd43f.png" size="188" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169644223" name="5624e0fec1427d03cdbe348abbdaff4e.png" path="5624e0fec1427d03cdbe348abbdaff4e.png" size="464" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169645393" name="c8f392c02dba7a22af400e3bab49a234.png" path="c8f392c02dba7a22af400e3bab49a234.png" size="290" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169645083" name="6ef14db6099274d6113743bcc85af5dc.png" path="6ef14db6099274d6113743bcc85af5dc.png" size="419" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169644912" name="2acc27bbded396ffb113b9173050440f.png" path="2acc27bbded396ffb113b9173050440f.png" size="464" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169643998" name="7211c2fa4ea74200d14e81d44376b8c3.png" path="7211c2fa4ea74200d14e81d44376b8c3.png" size="215" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169645316" name="f6912d38a7a798f44515748ea9a19bf6.png" path="f6912d38a7a798f44515748ea9a19bf6.png" size="482" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1169645393" name="b1cf191d6109f3c24096672312838e5f.png" path="b1cf191d6109f3c24096672312838e5f.png" size="272" user="UnknownUser" version=""
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1165927379" name="396a8c0267f4feffdaadf71c2911d8bc.png" path="396a8c0267f4feffdaadf71c2911d8bc.png" size="275" user="UnknownUser" version=""

Revision 252006-12-12 - MarcoCattaneo

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 15 to 15
 
<!-- PDFSTART -->

Introduction

Changed:
<
<
When data is collected from the LHCb detector the raw data will in quasi real time be transferred to the LHCb associated Tier 1 sites for the reconstruction
>
>
When data is collected from the LHCb detector, the raw data will be transferred in quasi real time to the LHCb associated Tier 1 sites for the reconstruction
 to produce rDST files. The rDST files are used for stripping jobs where events
Changed:
<
<
are selected for physics analysis. Event selected in this way are written into
>
>
are selected for physics analysis. Events selected in this way are written into
 DST files and distributed in identical copies to all the Tier 1 sites. These files are then accessible for physics analysis by individual collaborators. The stripping stage might be repeated several times a year with
Line: 26 to 26
  This report examines the needs and requirements of streaming at the data collection level as well as in the stripping process. We also look at how the
Changed:
<
<
information for the stripping should be persisted and what bookkeeping
>
>
information for the stripping should be made persistent and what bookkeeping
 information is required. Several use cases are analysed for the development of a set of recommendations.
Line: 43 to 43
 
  • A selection is the output of a given selection during the stripping. There will be one or more selections in a given stream. It is expected that a selection should have a typical (large) size of $10^6 (10^7)$ events in 2 ifb. This means a reduction
Changed:
<
<
reduction factor of $2 \times 10^4 (10^3)$ compared to the 2 kHz ingoing
>
>
reduction factor of $2 \times 10^4 (10^3)$ compared to the 2 kHz input
  stream or an equivalent rate of 0.1 (1.0) Hz.

Use cases

Changed:
<
<
A set of use cases to catch the requirements for the streaming were
>
>
A set of use cases to capture the requirements for the streaming were
 analysed:
Line: 79 to 79
 reasonable. When the green light is given (typically in less than 24h) the reconstruction takes place at 4 remote sites (hence the 4 files above).
Changed:
<
<
For analysis jobs there is a stripping proceedure which selects events in the DST files but do not make copies of them. So an analysis will read something
>
>
For analysis jobs there is a stripping procedure which selects events in the DST files but does not make copies of them. So an analysis will read something
 similar to our ETC files. This aspect is not working well. A huge load is experienced on the data servers due to large overheads in connection with reading sparse data.

Until now reprocessing of a specific type of physics data has not been done

Changed:
<
<
but a reprocessing of all B triggers are planned. This will require reading
>
>
but a reprocessing of all B triggers is planned. This will require reading
 sparse events once from the stream with all the raw data from the detector.

BaBar

Line: 104 to 104
 was no double processing of events but the conditions were always one run late. A consequence of this setup was that runs had to be processed sequentially, in chronological order, introducing scaling problems. The scaling
Changed:
<
<
problems where worsened by the fact that individual runs were processed on
>
>
problems were worsened by the fact that individual runs were processed on
 large farms of CPUs, and harvesting the calibration data, originating from the large number of jobs running in parallel, introduced a severe limit on the scalability of the processing farm. These limits on scalability were succesfully removed by splitting the process of rolling calibrations from the processing of the data. Since the calibration only requires a very small fraction of the events recorded,
Changed:
<
<
these events could easily be seperated by the trigger. Next this calibration
>
>
these events could easily be separated by the trigger. Next this calibration
 stream is processed (in chronological order) as before, producing a rolling calibration. As the event rate is limited, scaling of this 'prompt calibration' pass is not a problem. Once the calibration constants for a given
Line: 126 to 126
  The reconstructed data is fed into a subsequent stripping job that writes out DST files. On the order of 100 files are written with some of them containing
Changed:
<
<
multiple selections. One of the streams contain all hadronic events. If a
>
>
multiple selections. One of the streams contains all hadronic events. If a
 selection has either low priority or if its rejection rate is too poor an ETC file is written instead with pointers into the stream containing all hadronic events.
Line: 140 to 140
 

Streams from detector

A single bulk stream should be written from the online farm. The advantage of this compared to a solution where several streams are written based on triggers
Changed:
<
<
are:
>
>
is:
 
  • Event duplication is in all cases avoided within a single subsequent selection. If a selection involves picking events from more than one detector stream there is no way to avoid duplication of events. To sort
Line: 169 to 169
 carried out in the monitoring farm.

Processing timing

Changed:
<
<
To avoid a backlog it is required that the time between data is collected
>
>
To avoid a backlog it is required that the time between when data is collected
 and reconstructed is kept to a minimum. As the first stripping will take place at the same time this means that all calibration required for this has to be done in the monitoring farm. It is adviseable to delay the processing for a short period (8 hours?) allowing shifters to give a green light for reconstruction. If problems are discovered a run will be marked as bad and the
Changed:
<
<
reconstruction abandoned.
>
>
reconstruction postponed or abandoned.
 

Number of streams in stripping

Changed:
<
<
Considering the low level of overlap between different selections as
>
>
Considering the low level of overlap between different selections, as
 documented in the page [[STFSelectionCorrelations][the appendix on
Changed:
<
<
correlations]] it is a clear recomendation that we group selections into a small number of streams. Compared to a single stream it has some clear advantages:
>
>
correlations]], it is a clear recommendation that we group selections into a small number of streams. This has some clear advantages compared to a single stream:
 
  • Limited sparse reading of files. All selections will make up 10% or more of a given file.
  • No need to use ETC files as part of the
Line: 191 to 191
 
  • There are no overheads associated with sparse data access. Currently there are large I/O overheads in reading single events (32kB per TES container), but also large CPU overheads when Root opens a file
Changed:
<
<
(reading of dictionaries etc.). This lattter problem is being addressed
>
>
(reading of dictionaries etc.). This latter problem is being addressed
  by the ROOT team, with the introduction of a flag to disable reading of the streaming information.
Line: 212 to 212
 The appendix on correlations shows that it will be fairly easy to divide the data into streams. The full correlation table can be created automatically followed by a manual grouping based mainly
Changed:
<
<
on the correlations but also on analysis that naturally belong together. No
>
>
on the correlations but also on analyses that naturally belong together. No
 given selection should form less than 10% of a stream to avoid too sparse reading.
Line: 236 to 236
  is again a reasonable size.

Monte Carlo data

Changed:
<
<
Data from inclusive and "coctail" simulations will pass through the stripping
>
>
Data from inclusive and "cocktail" simulations will pass through the stripping
 process as well. To avoid complicating the system is recommended to process
Changed:
<
<
these events in the same way as the data. Why this will produce some selections that are irellevant for the simulation sample being processed the
>
>
these events in the same way as the data. While this will produce some selections that are irrelevant for the simulation sample being processed, the
 management overheads involved in doing anything else will be excessive.

Meta data in relation to selection and stripping

Line: 254 to 254
 
  • For an arbitrary list of files that went into a selection obtain some B counting numbers that can be used for normalising branching ratios. This number might be calculated during the stripping phase.
Changed:
<
<
  • To correct the above numbers when a given file turns unreadable (ie should know exactly which runs contributed to a given file).
>
>
  • To correct the above numbers when a given file turns unreadable (i.e. should know exactly which runs contributed to a given file).
 
  • When the stripping was performed to be able to recover the exact conditions used during the stripping.

It is urgent to start a review of exactly what extra information is required

Line: 273 to 273
 confusion.

Identification of good and bad runs. The definition of bad might need to be

Changed:
<
<
more fine grained as some analysis will be able to cope with spcific problems
>
>
more fine grained as some analysis will be able to cope with specific problems
 (like no RICH info). This information belongs in the Conditions database rather than in the bookkeeping as the classification of good and bad might change at a time after the stripping has taken place. Also it might be required to identify which runs were classified as good at some time in the past to judge if some past analysis was affected by what was later identified
Changed:
<
<
as bad data. when selecting data for an analysis this information should be available thus putting a requirement on the bookkeeping system to be able to interrogate the conditions.
>
>
as bad data. When selecting data for an analysis this information should be available thus putting a requirement on the bookkeeping system to be able to interrogate the conditions.
 

Procedure for including selections in the stripping

The note

Changed:
<
<
LHCb-2004-031 describe the (somewhat obsolete) guidelines to follow when providing
>
>
LHCb-2004-031 describes the (somewhat obsolete) guidelines to follow when providing
 a new selection and there are released Python tools that check these guidelines. However, the experience with organising stripping jobs is poor: for DC04 only 3 out of 29 preselections were compliant in the tests and for

Revision 242006-12-12 - GerhardRaven

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 99 to 99
  BaBar initailly operated with a system of rolling calibrations where calibrations for a given run n were used for the reconstruction of run
Changed:
<
<
n+1. In this way the full statistics was used for the calibrations, there
>
>
n+1, using the socalled 'AllEvents' stream. In this way the full statistics was available for the calibrations, there
 was no double processing of events but the conditions were always one run
Changed:
<
<
late. The system was abandoned as it was hard to administrate, was undefined for the first run after a break and could migrate problems in one bad run into the next good run.
>
>
late. A consequence of this setup was that runs had to be processed sequentially, in chronological order, introducing scaling problems. The scaling problems where worsened by the fact that individual runs were processed on large farms of CPUs, and harvesting the calibration data, originating from the large number of jobs running in parallel, introduced a severe limit on the scalability of the processing farm. These limits on scalability were succesfully removed by splitting the process of rolling calibrations from the processing of the data. Since the calibration only requires a very small fraction of the events recorded, these events could easily be seperated by the trigger. Next this calibration stream is processed (in chronological order) as before, producing a rolling calibration. As the event rate is limited, scaling of this 'prompt calibration' pass is not a problem. Once the calibration constants for a given run have been determined in this way and have been propagated into a conditions database, the processing of the 'main stream' for that run is possible. Note that in this system the processing of the main physics data uses the calibrations constants obtained from the same run, and the processing of the 'main stream' is not restricted to a strict sequential, chronological order, but can be done for each run independently, on a collection of computing farms. This allows for easy scaling of the processing.
  The reconstructed data is fed into a subsequent stripping job that writes out DST files. On the order of 100 files are written with some of them containing

Revision 232006-12-12 - UlrikEgede

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 14 to 14
 
<!-- PDFSTART -->

Introduction

Deleted:
<
<
The task force should examine the needs and requirements of streaming at the trigger/DAQ level as well as in the stripping process. The information required to perform analysis for the stripping should be revisited and the consequences for the workflows and the information stored should be reported. The issue surrounding the archiving of selection and reconstruction and how to correlate them with the code version used should be addressed. Extensive use cases should be given to support chosen solutions.
 
Changed:
<
<
You can also read the full text of the streaming task force remit.

All discussions within the group take place in the Hypernews.

>
>
When data is collected from the LHCb detector the raw data will in quasi real time be transferred to the LHCb associated Tier 1 sites for the reconstruction to produce rDST files. The rDST files are used for stripping jobs where events are selected for physics analysis. Event selected in this way are written into DST files and distributed in identical copies to all the Tier 1 sites. These files are then accessible for physics analysis by individual collaborators. The stripping stage might be repeated several times a year with refined selection algorithms.

This report examines the needs and requirements of streaming at the data collection level as well as in the stripping process. We also look at how the information for the stripping should be persisted and what bookkeeping information is required. Several use cases are analysed for the development of a set of recommendations.

The work leading to this report is based on the streaming task force remit available as an appendix.

More background for the discussions leading to the recommendations in this report can be found in the [[https://hypernews.cern.ch/HyperNews/LHCb/get/streamingTaskForce.html][Streaming Task Force Hypernews]].

 

Definition of words

  • A stream refers to the collection of events that are stored in the same physical file for a given run period. Not to be confused with I/O streams in a purely computing context (e.g. streaming of objects into a Root file)
Changed:
<
<
  • A selection is the output of a given selection during the stripping. There will be one or more selections in a given stream.
>
>
  • A selection is the output of a given selection during the stripping. There will be one or more selections in a given stream. It is expected that a selection should have a typical (large) size of $10^6 (10^7)$ events in 2 ifb. This means a reduction reduction factor of $2 \times 10^4 (10^3)$ compared to the 2 kHz ingoing stream or an equivalent rate of 0.1 (1.0) Hz.
 

Use cases

Changed:
<
<
A set of use cases to catch the requirements for the streaming. For each of them the way they will access the data, the information required and the implications on all the areas of the streaming task force remit should be commented on.

Additional issues

In addition to the issues raised in the streaming task force remit the following issues should be considered:
  • How to account for the luminosity corresponding to the data in a given selection.
  • There is a need to understand the limitations from the computing side in terms of duplication of data and limits on the total number of streams.
  • The initial running period should be considered as a special case.
  • Ease of carrying out calibration/analysis tasks should be the driving issue for the final proposal.
  • We have to understand (measure) the overheads associated with sparse data access. According to Markus Frank, there are large I/O overheads in reading single events (~32kB per TES container), but also large CPU overheads when Root opens a file (reading of dictionaries etc.). This lattter problem is being addressed by the Root team, with the introduction of a flag to disable reading of the streaming information.
  • Develop a methodology of how to group selections into streams.
>
>
A set of use cases to catch the requirements for the streaming were analysed:

The analysis related to the individual use cases is documented in the Wiki pages related to the streaming task force.

 

Experience from other experiments

Other experiments with large data volumes will have valuable experience. Below follows an overview of what is done elsewhere.

D0

Deleted:
<
<
In D0 the data from the detector has two streams.

The first stream is of very low rate and selected in their L3 trigger. It is reconstructed more or less straight away (seems similar to the tasks we might use the monitoring farm for).

The second stream contains all triggered data (so includes all of the first stream). Internally the stream is written to 4 files at any given time but there is no difference in the type of events going to each of them. This stream is buffered until the first stream has finished processing the run and updated the conditions. It is also checked that the new conditions have migrated to the remote centres and that they (by manual inspection) look reasonable. When the green light is given (typically in less than 24h) the reconstruction takes place at 4 remote sites (hence the 4 files above).

 
Changed:
<
<
For analysis jobs there is a stripping proceedure which selects events in the DST files but do not make copies of them. So an analysis will read something similar to our ETC files. This aspect is not working well. A huge load is experienced on the data servers due to large overheads in connection with reading sparse data.

Until now reprocessing of a specific type of physics data has not been done alone but a reprocessing of all B triggers are planned.

>
>
In D0 the data from the detector has two streams. The first stream is of very low rate and selected in their L3 trigger. It is reconstructed more or less straight away and its use seems similar to the tasks we will perform in the monitoring farm. The second stream contains all triggered data (including all of the first stream). Internally the stream is written to 4 files at any given time but there is no difference in the type of events going to each of them. The stream is buffered until the first stream has finished processing the run and updated the conditions. It is also checked that the new conditions have migrated to the remote centres and that they (by manual inspection) look reasonable. When the green light is given (typically in less than 24h) the reconstruction takes place at 4 remote sites (hence the 4 files above).

For analysis jobs there is a stripping proceedure which selects events in the DST files but do not make copies of them. So an analysis will read something similar to our ETC files. This aspect is not working well. A huge load is experienced on the data servers due to large overheads in connection with reading sparse data.

Until now reprocessing of a specific type of physics data has not been done but a reprocessing of all B triggers are planned. This will require reading sparse events once from the stream with all the raw data from the detector.

 

BaBar

Changed:
<
<
In BaBar there are a few different streams from the detector. A few for detector calibration like e+ e- -> e+ e- (Bhabha events) which are prescaled to give the correct rate independent of luminosity. The dominant stream where nearly all physics come from is the multi-hadron stream. The large multi-hadron stream is not processed until the calibration constants are ready from the processing of the calibration events.

BaBar initailly operated with a system of rolling calibrations where calibrations for a given run n were used for the reconstruction of run n+1. In this way the full statistics was used for the calibrations, there was no double processing of events but the conditions were always one run late. The system was abandoned as it was hard to administrate, was undefined for the first run after a break and could migrate problems in one bad run into the next good run.

The reconstructed data is fed into a subsequent stripping job that writes out DST files. The order of 25 files are written with some of them containing multiple selections. One of the streams contain all multi-hadron events. If a selection has either low priority or if its rejection rate is too poor an ETC file is written instead with pointers into the stream containing all events.

Data are stripped multiple times to reflect new and updated selections. Total reprocessing was frequent in the beginning but can now be years apart. It has only ever been done on the full multi-hadron sample.

>
>
In BaBar there are a few different streams from the detector. A few for detector calibration like $e^+ e^- \rightarrow e^+ e^-$ (Bhabha events) are prescaled to give the correct rate independent of luminosity. The dominant stream where nearly all physics come from is the hadronic stream. This large stream is not processed until the calibration constants are ready from the processing of the calibration streams for a given run.

BaBar initailly operated with a system of rolling calibrations where calibrations for a given run n were used for the reconstruction of run n+1. In this way the full statistics was used for the calibrations, there was no double processing of events but the conditions were always one run late. The system was abandoned as it was hard to administrate, was undefined for the first run after a break and could migrate problems in one bad run into the next good run.

The reconstructed data is fed into a subsequent stripping job that writes out DST files. On the order of 100 files are written with some of them containing multiple selections. One of the streams contain all hadronic events. If a selection has either low priority or if its rejection rate is too poor an ETC file is written instead with pointers into the stream containing all hadronic events.

Data are stripped multiple times to reflect new and updated selections. Total reprocessing was frequent in the beginning but can now be years apart. It has only ever been done on the full hadronic sample.

 

Proposal

Here follows the recomendations of the task force.

Streams from detector

Changed:
<
<
A single bulk stream should be written from the online farm. The advantage of this compared to a solution where several stream are written based on triggers are:
  • Event duplication is in all cases avoided within a single subsequent selection. If a selection involves picking events from more than one detector stream there is no way to avoid duplication of events. To sort this out later in an analysis would be error prone.
The disadvantages are
  • It becomes harder to reprocess a smaller amount of the dataset according to the HLT selections (it might involve sparse reading). Experience from past experiments shows that this rarely happens.
  • It is not possible to give special priority to a specific high priority analysis with a narrow exclusive trigger. As nearly every analysis will rely on larger selections for their result (normalisation to J/Psi signal, flavour tagging calibration) this seems an unlikely scenario to ever become possible.
>
>
A single bulk stream should be written from the online farm. The advantage of this compared to a solution where several streams are written based on triggers are:
  • Event duplication is in all cases avoided within a single subsequent selection. If a selection involves picking events from more than one detector stream there is no way to avoid duplication of events. To sort this out later in an analysis would be error prone.
 
Changed:
<
<
With later more exclusive HLT selections the arguments might change and could at that point force a rethink.

Many experiments use a hot stream for providing calibration and monitoring of the detector. See the BaBar and D0 sections belwo. In LHCb this should be completely covered within the monitoring farm.

>
>
The disadvantages are
  • It becomes harder to reprocess a smaller amount of the dataset according to the HLT selections (it might involve sparse reading). Experience from past experiments shows that this rarely happens.
  • It is not possible to give special priority to a specific high priority analysis with a narrow exclusive trigger. As nearly every analysis will rely on larger selections for their result (normalisation to $J/\Psi$ signal, flavour tagging calibration) this seems in any case an unlikely scenario.

With later more exclusive HLT selections the arguments might change and could at that point force a rethink.

Many experiments use a hot stream for providing calibration and monitoring of the detector as described in the sections on how streams are treated in BaBar and D0. In LHCb this should be completely covered within the monitoring farm. To be able to debug problems with alignment and calibration performed in the monitoring farm a facility should be developed to persist the events used for this task. These events would effectively be a second very low rate stream. The events would only be useful for debugging the behaviour of tasks carried out in the monitoring farm.

 

Processing timing

Changed:
<
<
To avoid a backlog it seems required that the time between data is collected and reconstructed is kept to a minimum. As the first stripping will take place at the same time this means that all calibration required for this has to be done in the monitoring farm. It might be adviseable to delay the processing for a short period (8 hours?) allowing shifters to give a green light for reconstruction. If problems are discovered a run will be marked as bad and the reconstruction abandoned. A short time delay will also allow conditions required for the reconstruction to migrate to the T1 sites
>
>
To avoid a backlog it is required that the time between data is collected and reconstructed is kept to a minimum. As the first stripping will take place at the same time this means that all calibration required for this has to be done in the monitoring farm. It is adviseable to delay the processing for a short period (8 hours?) allowing shifters to give a green light for reconstruction. If problems are discovered a run will be marked as bad and the reconstruction abandoned.
 

Number of streams in stripping

Changed:
<
<
Considering the low level of overlap between different selections as documented in the page STFSelectionCorrelations it is a clear recomendation that we group selections into a small number of streams. Compared to a single stream it has some clear advantages:
  • Limited sparse reading of files. All selections will make up 10% or more of a given file.
  • No need to use ETC files as part of the stripping. This will make data management on the Grid much easier (no need to know the location of files pointed to as well).
>
>
Considering the low level of overlap between different selections as documented in the page [[STFSelectionCorrelations][the appendix on correlations]] it is a clear recomendation that we group selections into a small number of streams. Compared to a single stream it has some clear advantages:
  • Limited sparse reading of files. All selections will make up 10% or more of a given file.
  • No need to use ETC files as part of the
stripping. This will make data management on the Grid much easier (no need to know the location of files pointed to as well).
  • There are no overheads associated with sparse data access. Currently there are large I/O overheads in reading single events (32kB per TES container), but also large CPU overheads when Root opens a file (reading of dictionaries etc.). This lattter problem is being addressed by the ROOT team, with the introduction of a flag to disable reading of the streaming information.
 The disadvantages are very limited.
Changed:
<
<
  • An analysis might cover more than one stream making it harder to deal with double counting of events. Lets take the Bs->mu mu analysis as an example. The signal will come from the two-body stream while the BR normalisation will come from the J/Psi stream. In this case the double counting doesn't matter though so the objection is not real. If the signal itself is extracted from more than one stream there is a design error in the stripping for that analysis.
  • Data will be duplicated. According to the analysis based on the DC04 TDR selections the duplication will be very limited. If we are limited in available disk space we should reconsider the mirroring of all stripped data to all T1's instead (making all data available at 5 out of 6 sites will save 17% disk space).
>
>
  • An analysis might cover more than one stream making it harder to deal with double counting of events. Lets take the $B_s \rightarrow \mu^+    mu^-$ analysis as an example. The signal will come from the two-body stream while the BR normalisation will come from the $J/\Psi$ stream. In this case the double counting doesn't matter though so the objection is not real. If the signal itself is extracted from more than one stream there is a design error in the stripping for that analysis.
  • Data will be duplicated. According to the analysis based on the DC04 TDR selections the duplication will be very limited. If we are limited in available disk space we should reconsider the mirroring of all stripped data to all T1's instead (making all data available at 5 out of 6 sites will save 17% disk space).

The appendix on correlations shows that it will be fairly easy to divide the data into streams. The full correlation table can be created automatically followed by a manual grouping based mainly on the correlations but also on analysis that naturally belong together. No given selection should form less than 10% of a stream to avoid too sparse reading.

In total one might expect around 20 streams from the stripping, each with around $10^7$ events in 2 ifb of integrated luminosity. This can be broken down as:

  • Around 20 physics analysis streams of $10^7$ events each. There will most likely be significant variation in size between the individual streams.
  • Random events that will be used for developing new selections. To get reasonable statistics for a selection with a reduction factor of $10^5$ a sample of $10^7$ events will be required. This will make it equivalent to a single large selection.
  • A stream for understanding the trigger. This stream is likely to have a large overlap with the physics streams but for efficient trigger studies this is can't be avoided.
  • A few streams for detailed calibration of alignment, tracking and particle identification.
  • A stream with random triggers after L0 to allow for the development of new code in the HLT. As a narrow exclusive HLT trigger might have a rejection factor of $10^5$ (corresponding to 10 Hz) a sample of $10^7$ is again a reasonable size.

Monte Carlo data

Data from inclusive and "coctail" simulations will pass through the stripping process as well. To avoid complicating the system is recommended to process these events in the same way as the data. Why this will produce some selections that are irellevant for the simulation sample being processed the management overheads involved in doing anything else will be excessive.
 

Meta data in relation to selection and stripping

As outlined in the use cases every analysis requires additional information about what is analysed apart from the information in the events themselves.
Changed:
<
<

Information required in bookkeeping database

The bookkeeping database will be the essential place for meta data on selections for a given analysis. The following tasks should be possible:
  • Get the exact runs that went into a given selection
  • For a list of runs obtain the equivalent luminosity and B counting numbers.
>
>

Bookkeeping information required

From a database with the meta data from the stripping is should be possible to:
  • Get a list of the exact files that went into a given selection. This might not translate directly into runs as a given run will have its rDST data spread across several files and a problem could be present with just one of them.
  • For an arbitrary list of files that went into a selection obtain some B counting numbers that can be used for normalising branching ratios. This number might be calculated during the stripping phase.
 
  • To correct the above numbers when a given file turns unreadable (ie should know exactly which runs contributed to a given file).
  • When the stripping was performed to be able to recover the exact conditions used during the stripping.
Changed:
<
<
It seems urgent to start a review of exactly what extra information is required in the bookkeeping as well as how the information is accessed from the command line, from Ganga, from within a Gaudi job etc. A working solution for this should be in place for the first data.
>
>
It is urgent to start a review of exactly what extra information is required for this type of bookkeeping information as well as how the information is accessed from the command line, from Ganga, from within a Gaudi job etc. A working solution for this should be in place for the first data.
 

Information required in Conditions database

Changed:
<
<
  • Trigger conditions for any event. Preferably this should be in the form of a simple identifier to a set of trigger conditions. What the identifier corresponds to will be stored in CVS. An identifier should never be re-used in later releases for a different set of trigger conditions to avoid confusion.
  • Identification of good and bad runs. The definition of bad might need to be more fine grained as some analysis will be able to cope with spcific problems (like no RICH info).
>
>
The following information is required from the conditions database during the analysis phase.

Trigger conditions for any event should be stored. Preferably this should be in the form of a simple identifier to a set of trigger conditions. What the identifier corresponds to will be stored in CVS. An identifier should never be re-used in later releases for a different set of trigger conditions to avoid confusion.

Identification of good and bad runs. The definition of bad might need to be more fine grained as some analysis will be able to cope with spcific problems (like no RICH info). This information belongs in the Conditions database rather than in the bookkeeping as the classification of good and bad might change at a time after the stripping has taken place. Also it might be required to identify which runs were classified as good at some time in the past to judge if some past analysis was affected by what was later identified as bad data. when selecting data for an analysis this information should be available thus putting a requirement on the bookkeeping system to be able to interrogate the conditions.

 

Procedure for including selections in the stripping

Changed:
<
<
Strippings are quite painful to organise. Although there exists an LHCb note describing the (somewhat obsolete) guidelines to follow when providing a new selection, and there are released python tools that check these guidelines, only 3 out of 29 preselections were compliant in the DC04 stripping tests.
  • Tools should be provided that automate the subscription of a selection to the stripping.
  • The actual cuts applied in the selections should be considered as the responsibility of the physics WGs
    • We suggest the nomination of stripping coordinators in each WG. They are likely to be the same person as the "standard particles" coordinators.

Timescale

  • An outline proposal for the October s/w week
  • A finalised proposal to be ready in December 2006
>
>
The note LHCb-2004-031 describe the (somewhat obsolete) guidelines to follow when providing a new selection and there are released Python tools that check these guidelines. However, the experience with organising stripping jobs is poor: for DC04 only 3 out of 29 preselections were compliant in the tests and for DC06 it is a long battle to obtain a stripping job with sufficient reduction and with fast enough execution time. To ease the organisation:
  • Tools should be provided that automate the subscription of a selection to the stripping.
  • The actual cuts applied in the selections should be considered as the responsibility of the physics WGs
  • We suggest the nomination of stripping coordinators in each WG. They are likely to be the same person as the "standard particles" coordinators.
  • If a subscribed selection fails automatic tests for a new round of stripping it is unsubscribed and a notification sent to the coordinator.
 
<!-- PDFSTOP -->

Updated:

Changed:
<
<
-- PatrickKoppenburg - 27 Sep 2006 -- PatrickKoppenburg - 28 Sep 2006
>
>
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1165927380" name="32bc47d7fddd326831f0424345ba37ef.png" path="32bc47d7fddd326831f0424345ba37ef.png" size="579" user="UnknownUser" version=""
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1153476804" name="8ca4d69364f47c4ba2a7783b0f54e0f0.png" path="8ca4d69364f47c4ba2a7783b0f54e0f0.png" size="537" user="UnknownUser" version=""
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1165929059" name="cb49732765e9ee2a69ff7c3286639bf7.png" path="cb49732765e9ee2a69ff7c3286639bf7.png" size="487" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1165927381" name="8ec8f1234e57f139a068e89eb3b2e5fa.png" path="8ec8f1234e57f139a068e89eb3b2e5fa.png" size="273" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1165927380" name="b6270bfbee0e25d7d565330d2e47bbba.png" path="b6270bfbee0e25d7d565330d2e47bbba.png" size="333" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1165927380" name="190242186166c1bed8070cf90c11b750.png" path="190242186166c1bed8070cf90c11b750.png" size="397" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1165927379" name="396a8c0267f4feffdaadf71c2911d8bc.png" path="396a8c0267f4feffdaadf71c2911d8bc.png" size="275" user="UnknownUser" version=""

Revision 222006-10-11 - unknown

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 121 to 121
 

Timescale

  • An outline proposal for the October s/w week
Added:
>
>
 
  • A finalised proposal to be ready in December 2006

<!-- PDFSTOP -->
Line: 130 to 131
 -- PatrickKoppenburg - 27 Sep 2006 -- PatrickKoppenburg - 28 Sep 2006
Deleted:
<
<
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
Deleted:
<
<
META FILEATTACHMENT attr="" autoattached="1" comment="Updated Software Week Report talk" date="1159466004" name="061005.pdf" path="061005.pdf" size="342474" user="Main.PatrickKoppenburg" version="2"
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1153476804" name="8ca4d69364f47c4ba2a7783b0f54e0f0.png" path="8ca4d69364f47c4ba2a7783b0f54e0f0.png" size="537" user="UnknownUser" version=""

Revision 212006-09-28 - unknown

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 48 to 48
 
  • Ease of carrying out calibration/analysis tasks should be the driving issue for the final proposal.
  • We have to understand (measure) the overheads associated with sparse data access. According to Markus Frank, there are large I/O overheads in reading single events (~32kB per TES container), but also large CPU overheads when Root opens a file (reading of dictionaries etc.). This lattter problem is being addressed by the Root team, with the introduction of a flag to disable reading of the streaming information.
  • Develop a methodology of how to group selections into streams.
Deleted:
<
<
 

Experience from other experiments

Other experiments with large data volumes will have valuable experience. Below follows an overview of what is done elsewhere.
Line: 114 to 113
 
  • Trigger conditions for any event. Preferably this should be in the form of a simple identifier to a set of trigger conditions. What the identifier corresponds to will be stored in CVS. An identifier should never be re-used in later releases for a different set of trigger conditions to avoid confusion.
  • Identification of good and bad runs. The definition of bad might need to be more fine grained as some analysis will be able to cope with spcific problems (like no RICH info).
Added:
>
>

Procedure for including selections in the stripping

Strippings are quite painful to organise. Although there exists an LHCb note describing the (somewhat obsolete) guidelines to follow when providing a new selection, and there are released python tools that check these guidelines, only 3 out of 29 preselections were compliant in the DC04 stripping tests.
  • Tools should be provided that automate the subscription of a selection to the stripping.
  • The actual cuts applied in the selections should be considered as the responsibility of the physics WGs
    • We suggest the nomination of stripping coordinators in each WG. They are likely to be the same person as the "standard particles" coordinators.
 

Timescale

  • An outline proposal for the October s/w week
Line: 122 to 126
 
<!-- PDFSTOP -->

Updated:

Changed:
<
<
>
>
-- PatrickKoppenburg - 27 Sep 2006 -- PatrickKoppenburg - 28 Sep 2006
 
Changed:
<
<
>
>
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
Changed:
<
<
META FILEATTACHMENT attr="" autoattached="1" comment="Software Week Report talk" date="1159372206" name="061005.pdf" path="061005.pdf" size="368132" user="Main.PatrickKoppenburg" version="1"
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="Updated Software Week Report talk" date="1159466004" name="061005.pdf" path="061005.pdf" size="342474" user="Main.PatrickKoppenburg" version="2"
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1153476804" name="8ca4d69364f47c4ba2a7783b0f54e0f0.png" path="8ca4d69364f47c4ba2a7783b0f54e0f0.png" size="537" user="UnknownUser" version=""

Revision 202006-09-28 - FredericTeubert

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 10 to 10
 
Changed:
<
<
>
>
 
<!-- PDFSTART -->

Introduction

Line: 31 to 31
 

Use cases

A set of use cases to catch the requirements for the streaming. For each of them the way they will access the data, the information required and the implications on all the areas of the streaming task force remit should be commented on.
Changed:
<
<
>
>
 

Revision 192006-09-27 - unknown

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 116 to 116
 

Timescale

Deleted:
<
<
  • First feedback to the collaboration at a suitable meeting in September
 
  • An outline proposal for the October s/w week
  • A finalised proposal to be ready in December 2006

<!-- PDFSTOP -->

Updated:

Changed:
<
<
>
>

 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="Software Week Report talk" date="1159372206" name="061005.pdf" path="061005.pdf" size="368132" user="Main.PatrickKoppenburg" version="1"
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1153476804" name="8ca4d69364f47c4ba2a7783b0f54e0f0.png" path="8ca4d69364f47c4ba2a7783b0f54e0f0.png" size="537" user="UnknownUser" version=""

Revision 182006-09-26 - UlrikEgede

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"

Printed version

Deleted:
<
<
<!-- PDFSTART -->
 

Authors

Line: 13 to 12
 
Added:
>
>
<!-- PDFSTART -->
 

Introduction

The task force should examine the needs and requirements of streaming at the trigger/DAQ level as well as in the stripping process. The information required to

Revision 172006-09-26 - UlrikEgede

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 120 to 120
 
  • An outline proposal for the October s/w week
  • A finalised proposal to be ready in December 2006
Changed:
<
<
>
>
<!-- PDFSTOP -->
  Updated:
Deleted:
<
<
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1159262791" name="ba2aaad037cb8503d5a03835b90cde59.png" path="ba2aaad037cb8503d5a03835b90cde59.png" size="564" user="UnknownUser" version=""
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1153476804" name="8ca4d69364f47c4ba2a7783b0f54e0f0.png" path="8ca4d69364f47c4ba2a7783b0f54e0f0.png" size="537" user="UnknownUser" version=""
Deleted:
<
<
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1159262792" name="346f86fe31fe956c3d0ad511af423d4a.png" path="346f86fe31fe956c3d0ad511af423d4a.png" size="466" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1159262792" name="f0c57582dd1a285fc0434f8b40538c33.png" path="f0c57582dd1a285fc0434f8b40538c33.png" size="421" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1159262792" name="88a0780af21da4ad89d6153cb03f0fb0.png" path="88a0780af21da4ad89d6153cb03f0fb0.png" size="744" user="UnknownUser" version=""

Revision 162006-09-26 - UlrikEgede

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Changed:
<
<

Streaming task force

>
>
 
Added:
>
>
Printed version
 
Added:
>
>
<!-- PDFSTART -->
 

Authors

Line: 11 to 13
 
Changed:
<
<
>
>

Introduction

 The task force should examine the needs and requirements of streaming at the trigger/DAQ level as well as in the stripping process. The information required to perform analysis for the stripping should be revisited and the consequences
Line: 120 to 120
 
  • An outline proposal for the October s/w week
  • A finalised proposal to be ready in December 2006
Changed:
<
<

>
>
  Updated:
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1159262791" name="ba2aaad037cb8503d5a03835b90cde59.png" path="ba2aaad037cb8503d5a03835b90cde59.png" size="564" user="UnknownUser" version=""
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1153476804" name="8ca4d69364f47c4ba2a7783b0f54e0f0.png" path="8ca4d69364f47c4ba2a7783b0f54e0f0.png" size="537" user="UnknownUser" version=""
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1159262792" name="346f86fe31fe956c3d0ad511af423d4a.png" path="346f86fe31fe956c3d0ad511af423d4a.png" size="466" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1159262792" name="f0c57582dd1a285fc0434f8b40538c33.png" path="f0c57582dd1a285fc0434f8b40538c33.png" size="421" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1159262792" name="88a0780af21da4ad89d6153cb03f0fb0.png" path="88a0780af21da4ad89d6153cb03f0fb0.png" size="744" user="UnknownUser" version=""

Revision 152006-09-25 - UlrikEgede

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Deleted:
<
<
 

Streaming task force

Added:
>
>

Authors

 The task force should examine the needs and requirements of streaming at the trigger/DAQ level as well as in the stripping process. The information required to perform analysis for the stripping should be revisited and the consequences
Line: 15 to 25
  All discussions within the group take place in the Hypernews.
Added:
>
>

Definition of words

  • A stream refers to the collection of events that are stored in the same physical file for a given run period. Not to be confused with I/O streams in a purely computing context (e.g. streaming of objects into a Root file)
  • A selection is the output of a given selection during the stripping. There will be one or more selections in a given stream.
 

Use cases

A set of use cases to catch the requirements for the streaming. For each of them the way they will access the data, the information required and the implications on all the areas of the streaming task force remit should be commented on.
Line: 26 to 40
 
Deleted:
<
<

Definition of words

  • A stream refers to the collection of events that are stored in the same physical file for a given run period. Not to be confused with I/O streams in a purely computing context (e.g. streaming of objects into a Root file)
  • A selection is the output of a given selection during the stripping. There will be one or more *selection*s in a given stream.
  • A category describes the type of a given stream. It can be either monitoring, calibration or physics.
 

Additional issues

In addition to the issues raised in the streaming task force remit the following issues should be considered:
  • How to account for the luminosity corresponding to the data in a given selection.
Line: 41 to 50
 
  • Develop a methodology of how to group selections into streams.
Deleted:
<
<

Emerging ideas

This covers emerging ideas we can test the use cases against. Might very well change as we go along.

Streams from detector

A hot and a bulk stream will be provided from the detector. The bulk stream contain all HLT triggered events and hence the hot stream is a subset. The hot stream should be sufficiently small to allow quick processing and potential reprocessing. It will most likely contain calibration data where the data volumes are too large for the monitoring. The identification of events for the hot stream has to be identified in the trigger.
  • We should also consider the arguments against having a hot stream, and against a single bulk stream. Experience from some previous experiments suggests that there will be pressure to include all interesting physics triggers in the hot stream, which will then be the only stream actually looked at by physics analysis. On the other hand, having a single bulk stream means that any re-processing will have huge data-handling overheads: even if we only want to reprocess the 200 Hz of exclusive triggers, we will have to load the full 2kHz data sample.

Processing timing

To avoid a backlog it seems required that the time between data is collected and reconstructed is kept to a minimum. As the first stripping will take place at the same time this means that all calibration required for this has to be done in the monitoring farm.

Number of streams in stripping

Has to be a balance between the replication of data and the overhead when reading. If only very few selections in a given stream the use of ETC files becomes redundant.

Information required in bookkeeping database

The bookkeeping database will be the essential place for meta data on selections for a given analysis. the following tasks should be possible:
  • Get the exact runs that went into a given selection
  • For a list of runs obtain the equivalent luminosity and B counting numbers.
  • To correct the above numbers when a given file turns unreadable (ie should know exactly which runs contributed to a given file).

Information required in Conditions database

Trigger conditions for any event
 

Experience from other experiments

Other experiments with large data volumes will have valuable experience. Below follows an overview of what is done elsewhere.
Line: 81 to 73
  Data are stripped multiple times to reflect new and updated selections. Total reprocessing was frequent in the beginning but can now be years apart. It has only ever been done on the full multi-hadron sample.
Added:
>
>

Proposal

Here follows the recomendations of the task force.

Streams from detector

A single bulk stream should be written from the online farm. The advantage of this compared to a solution where several stream are written based on triggers are:
  • Event duplication is in all cases avoided within a single subsequent selection. If a selection involves picking events from more than one detector stream there is no way to avoid duplication of events. To sort this out later in an analysis would be error prone.
The disadvantages are
  • It becomes harder to reprocess a smaller amount of the dataset according to the HLT selections (it might involve sparse reading). Experience from past experiments shows that this rarely happens.
  • It is not possible to give special priority to a specific high priority analysis with a narrow exclusive trigger. As nearly every analysis will rely on larger selections for their result (normalisation to J/Psi signal, flavour tagging calibration) this seems an unlikely scenario to ever become possible.

With later more exclusive HLT selections the arguments might change and could at that point force a rethink.

Many experiments use a hot stream for providing calibration and monitoring of the detector. See the BaBar and D0 sections belwo. In LHCb this should be completely covered within the monitoring farm.

Processing timing

To avoid a backlog it seems required that the time between data is collected and reconstructed is kept to a minimum. As the first stripping will take place at the same time this means that all calibration required for this has to be done in the monitoring farm. It might be adviseable to delay the processing for a short period (8 hours?) allowing shifters to give a green light for reconstruction. If problems are discovered a run will be marked as bad and the reconstruction abandoned. A short time delay will also allow conditions required for the reconstruction to migrate to the T1 sites

Number of streams in stripping

Considering the low level of overlap between different selections as documented in the page STFSelectionCorrelations it is a clear recomendation that we group selections into a small number of streams. Compared to a single stream it has some clear advantages:
  • Limited sparse reading of files. All selections will make up 10% or more of a given file.
  • No need to use ETC files as part of the stripping. This will make data management on the Grid much easier (no need to know the location of files pointed to as well).
The disadvantages are very limited.
  • An analysis might cover more than one stream making it harder to deal with double counting of events. Lets take the Bs->mu mu analysis as an example. The signal will come from the two-body stream while the BR normalisation will come from the J/Psi stream. In this case the double counting doesn't matter though so the objection is not real. If the signal itself is extracted from more than one stream there is a design error in the stripping for that analysis.
  • Data will be duplicated. According to the analysis based on the DC04 TDR selections the duplication will be very limited. If we are limited in available disk space we should reconsider the mirroring of all stripped data to all T1's instead (making all data available at 5 out of 6 sites will save 17% disk space).

Meta data in relation to selection and stripping

As outlined in the use cases every analysis requires additional information about what is analysed apart from the information in the events themselves.

Information required in bookkeeping database

The bookkeeping database will be the essential place for meta data on selections for a given analysis. The following tasks should be possible:
  • Get the exact runs that went into a given selection
  • For a list of runs obtain the equivalent luminosity and B counting numbers.
  • To correct the above numbers when a given file turns unreadable (ie should know exactly which runs contributed to a given file).
  • When the stripping was performed to be able to recover the exact conditions used during the stripping.

It seems urgent to start a review of exactly what extra information is required in the bookkeeping as well as how the information is accessed from the command line, from Ganga, from within a Gaudi job etc. A working solution for this should be in place for the first data.

Information required in Conditions database

  • Trigger conditions for any event. Preferably this should be in the form of a simple identifier to a set of trigger conditions. What the identifier corresponds to will be stored in CVS. An identifier should never be re-used in later releases for a different set of trigger conditions to avoid confusion.
  • Identification of good and bad runs. The definition of bad might need to be more fine grained as some analysis will be able to cope with spcific problems (like no RICH info).
 

Timescale

  • First feedback to the collaboration at a suitable meeting in September
  • An outline proposal for the October s/w week
  • A finalised proposal to be ready in December 2006
Deleted:
<
<

Members

The members of the task force are:
 

Updated:

Changed:
<
<
-- PatrickKoppenburg - 06 Sep 2006
>
>
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1153476804" name="8ca4d69364f47c4ba2a7783b0f54e0f0.png" path="8ca4d69364f47c4ba2a7783b0f54e0f0.png" size="537" user="UnknownUser" version=""

Revision 142006-09-18 - MarcoCattaneo

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 27 to 27
 

Definition of words

Changed:
<
<
  • A stream refers to the collection of events that are stored in the same physical file for a given run period.
>
>
  • A stream refers to the collection of events that are stored in the same physical file for a given run period. Not to be confused with I/O streams in a purely computing context (e.g. streaming of objects into a Root file)
 
  • A selection is the output of a given selection during the stripping. There will be one or more *selection*s in a given stream.
  • A category describes the type of a given stream. It can be either monitoring, calibration or physics.
Line: 37 to 37
 
  • There is a need to understand the limitations from the computing side in terms of duplication of data and limits on the total number of streams.
  • The initial running period should be considered as a special case.
  • Ease of carrying out calibration/analysis tasks should be the driving issue for the final proposal.
Changed:
<
<
  • We have to understand (measure) the overheads associated with sparse data access. According to Markus Frank, there are large I/O overheads in reading single events (~32kB per TES container), but also large CPU overheads when Root opens a file (reading of dictionaries etc.)
>
>
  • We have to understand (measure) the overheads associated with sparse data access. According to Markus Frank, there are large I/O overheads in reading single events (~32kB per TES container), but also large CPU overheads when Root opens a file (reading of dictionaries etc.). This lattter problem is being addressed by the Root team, with the introduction of a flag to disable reading of the streaming information.
 
  • Develop a methodology of how to group selections into streams.

Revision 132006-09-06 - unknown

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 39 to 39
 
  • Ease of carrying out calibration/analysis tasks should be the driving issue for the final proposal.
  • We have to understand (measure) the overheads associated with sparse data access. According to Markus Frank, there are large I/O overheads in reading single events (~32kB per TES container), but also large CPU overheads when Root opens a file (reading of dictionaries etc.)
  • Develop a methodology of how to group selections into streams.
Added:
>
>
 

Emerging ideas

This covers emerging ideas we can test the use cases against. Might very well change as we go along.
Line: 98 to 99
  Updated:
Added:
>
>
-- PatrickKoppenburg - 06 Sep 2006
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1153476804" name="8ca4d69364f47c4ba2a7783b0f54e0f0.png" path="8ca4d69364f47c4ba2a7783b0f54e0f0.png" size="537" user="UnknownUser" version=""

Revision 122006-08-01 - UlrikEgede

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 57 to 57
 

Information required in Conditions database

Trigger conditions for any event
Added:
>
>

Experience from other experiments

Other experiments with large data volumes will have valuable experience. Below follows an overview of what is done elsewhere.

D0

In D0 the data from the detector has two streams.

The first stream is of very low rate and selected in their L3 trigger. It is reconstructed more or less straight away (seems similar to the tasks we might use the monitoring farm for).

The second stream contains all triggered data (so includes all of the first stream). Internally the stream is written to 4 files at any given time but there is no difference in the type of events going to each of them. This stream is buffered until the first stream has finished processing the run and updated the conditions. It is also checked that the new conditions have migrated to the remote centres and that they (by manual inspection) look reasonable. When the green light is given (typically in less than 24h) the reconstruction takes place at 4 remote sites (hence the 4 files above).

For analysis jobs there is a stripping proceedure which selects events in the DST files but do not make copies of them. So an analysis will read something similar to our ETC files. This aspect is not working well. A huge load is experienced on the data servers due to large overheads in connection with reading sparse data.

Until now reprocessing of a specific type of physics data has not been done alone but a reprocessing of all B triggers are planned.

BaBar

In BaBar there are a few different streams from the detector. A few for detector calibration like e+ e- -> e+ e- (Bhabha events) which are prescaled to give the correct rate independent of luminosity. The dominant stream where nearly all physics come from is the multi-hadron stream. The large multi-hadron stream is not processed until the calibration constants are ready from the processing of the calibration events.

BaBar initailly operated with a system of rolling calibrations where calibrations for a given run n were used for the reconstruction of run n+1. In this way the full statistics was used for the calibrations, there was no double processing of events but the conditions were always one run late. The system was abandoned as it was hard to administrate, was undefined for the first run after a break and could migrate problems in one bad run into the next good run.

The reconstructed data is fed into a subsequent stripping job that writes out DST files. The order of 25 files are written with some of them containing multiple selections. One of the streams contain all multi-hadron events. If a selection has either low priority or if its rejection rate is too poor an ETC file is written instead with pointers into the stream containing all events.

Data are stripped multiple times to reflect new and updated selections. Total reprocessing was frequent in the beginning but can now be years apart. It has only ever been done on the full multi-hadron sample.

 

Timescale

  • First feedback to the collaboration at a suitable meeting in September
Line: 75 to 97
 

Updated:

Changed:
<
<
>
>
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1153476804" name="8ca4d69364f47c4ba2a7783b0f54e0f0.png" path="8ca4d69364f47c4ba2a7783b0f54e0f0.png" size="537" user="UnknownUser" version=""

Revision 112006-07-26 - MarcoCattaneo

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 43 to 43
 

Emerging ideas

This covers emerging ideas we can test the use cases against. Might very well change as we go along.

Streams from detector

Changed:
<
<
A hot and a bulk stream will be provided from the detector. The bulk stream contain all HLT triggered events and hence the hot stream is a subset. The hot stream should be suffciently small to allow quick processing and potential reprocessing. It will most likely contain calibration data where the data volumes are too large for the monitoring. The identification of events for the hot stream has to be identified in the trigger.
>
>
A hot and a bulk stream will be provided from the detector. The bulk stream contain all HLT triggered events and hence the hot stream is a subset. The hot stream should be sufficiently small to allow quick processing and potential reprocessing. It will most likely contain calibration data where the data volumes are too large for the monitoring. The identification of events for the hot stream has to be identified in the trigger.
  • We should also consider the arguments against having a hot stream, and against a single bulk stream. Experience from some previous experiments suggests that there will be pressure to include all interesting physics triggers in the hot stream, which will then be the only stream actually looked at by physics analysis. On the other hand, having a single bulk stream means that any re-processing will have huge data-handling overheads: even if we only want to reprocess the 200 Hz of exclusive triggers, we will have to load the full 2kHz data sample.
 

Processing timing

To avoid a backlog it seems required that the time between data is collected and reconstructed is kept to a minimum. As the first stripping will take place at the same time this means that all calibration required for this has to be done in the monitoring farm.

Number of streams in stripping

Revision 102006-07-25 - UlrikEgede

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 20 to 20
 
Changed:
<
<
>
>
 
  • STFUsecasePilotRun - MC - Perform a detector calibration analysis with data from the 2007 pilot run.
Added:
>
>
 

Definition of words

  • A stream refers to the collection of events that are stored in the same physical file for a given run period.
Line: 37 to 38
 
  • The initial running period should be considered as a special case.
  • Ease of carrying out calibration/analysis tasks should be the driving issue for the final proposal.
  • We have to understand (measure) the overheads associated with sparse data access. According to Markus Frank, there are large I/O overheads in reading single events (~32kB per TES container), but also large CPU overheads when Root opens a file (reading of dictionaries etc.)
Added:
>
>
  • Develop a methodology of how to group selections into streams.

Emerging ideas

This covers emerging ideas we can test the use cases against. Might very well change as we go along.

Streams from detector

A hot and a bulk stream will be provided from the detector. The bulk stream contain all HLT triggered events and hence the hot stream is a subset. The hot stream should be suffciently small to allow quick processing and potential reprocessing. It will most likely contain calibration data where the data volumes are too large for the monitoring. The identification of events for the hot stream has to be identified in the trigger.

Processing timing

To avoid a backlog it seems required that the time between data is collected and reconstructed is kept to a minimum. As the first stripping will take place at the same time this means that all calibration required for this has to be done in the monitoring farm.

Number of streams in stripping

Has to be a balance between the replication of data and the overhead when reading. If only very few selections in a given stream the use of ETC files becomes redundant.

Information required in bookkeeping database

The bookkeeping database will be the essential place for meta data on selections for a given analysis. the following tasks should be possible:
  • Get the exact runs that went into a given selection
  • For a list of runs obtain the equivalent luminosity and B counting numbers.
  • To correct the above numbers when a given file turns unreadable (ie should know exactly which runs contributed to a given file).

Information required in Conditions database

Trigger conditions for any event
 

Timescale

  • First feedback to the collaboration at a suitable meeting in September
Line: 55 to 74
 

Updated:

Changed:
<
<
>
>
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1153476804" name="8ca4d69364f47c4ba2a7783b0f54e0f0.png" path="8ca4d69364f47c4ba2a7783b0f54e0f0.png" size="537" user="UnknownUser" version=""

Revision 92006-07-24 - MarcoCattaneo

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 36 to 36
 
  • There is a need to understand the limitations from the computing side in terms of duplication of data and limits on the total number of streams.
  • The initial running period should be considered as a special case.
  • Ease of carrying out calibration/analysis tasks should be the driving issue for the final proposal.
Added:
>
>
  • We have to understand (measure) the overheads associated with sparse data access. According to Markus Frank, there are large I/O overheads in reading single events (~32kB per TES container), but also large CPU overheads when Root opens a file (reading of dictionaries etc.)
 

Timescale

  • First feedback to the collaboration at a suitable meeting in September
Line: 57 to 58
 

META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
Changed:
<
<
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152887889" name="d0bd959ca56e5859190c90ee81387c4d.png" path="d0bd959ca56e5859190c90ee81387c4d.png" size="544" user="UnknownUser" version=""
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1153476804" name="8ca4d69364f47c4ba2a7783b0f54e0f0.png" path="8ca4d69364f47c4ba2a7783b0f54e0f0.png" size="537" user="UnknownUser" version=""

Revision 82006-07-21 - unknown

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 19 to 19
 A set of use cases to catch the requirements for the streaming. For each of them the way they will access the data, the information required and the implications on all the areas of the streaming task force remit should be commented on.
Changed:
<
<
>
>
 

Revision 72006-07-20 - MarcoCattaneo

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 23 to 23
 
Changed:
<
<
  • STFUsecasePilotRun - Perform a detector calibration analysis with data from the 2007 pilot run.
>
>
  • STFUsecasePilotRun - MC - Perform a detector calibration analysis with data from the 2007 pilot run.
 

Definition of words

  • A stream refers to the collection of events that are stored in the same physical file for a given run period.

Revision 62006-07-20 - UlrikEgede

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"

Streaming task force

Deleted:
<
<

Remit

 The task force should examine the needs and requirements of streaming at the trigger/DAQ level as well as in the stripping process. The information required to perform analysis for the stripping should be revisited and the consequences
Line: 14 to 13
  You can also read the full text of the streaming task force remit.
Added:
>
>
All discussions within the group take place in the Hypernews.
 

Use cases

A set of use cases to catch the requirements for the streaming. For each of them the way they will access the data, the information required and the implications on all the areas of the streaming task force remit should be commented on.
Changed:
<
<
>
>
 

Definition of words

  • A stream refers to the collection of events that are stored in the same physical file for a given run period.
Line: 51 to 54
 

Updated:

Changed:
<
<
>
>
 
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152887889" name="d0bd959ca56e5859190c90ee81387c4d.png" path="d0bd959ca56e5859190c90ee81387c4d.png" size="544" user="UnknownUser" version=""

Revision 52006-07-14 - unknown

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 16 to 16
 

Use cases

A set of use cases to catch the requirements for the streaming. For each of them the way they will access the data, the information required and the implications on all the areas of the streaming task force remit should be commented on.
Changed:
<
<
>
>
 
Changed:
<
<
>
>
 
Changed:
<
<

Definiton of words

>
>

Definition of words

 
  • A stream refers to the collection of events that are stored in the same physical file for agiven run period.
  • A selection is the output of a given selection during the stripping. There will be one or more *selection*s in a given stream.
  • A category describes the type of a given stream. It can be either monitoring, calibration or physics.

Revision 42006-07-14 - UlrikEgede

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"
Line: 12 to 12
 surrounding the archiving of selection and reconstruction and how to correlate them with the code version used should be addressed. Extensive use cases should be given to support chosen solutions.
Changed:
<
<

Streams to T0

>
>
You can also read the full text of the streaming task force remit.
 
Changed:
<
<
  • How many for physics: 1, 4 (exclusive, D*. dimuon, inclusive), ...
  • Preliminary understanding of level of overlap if number of streams exceed 1
  • Role of "hot" stream of reconstructed events for physics monitoring
    • Consequences for start of processing of RAW data
  • Calibration streams
    • Consequence for start of processing of RAW data
  • Amount of data

Stripping

  • How many streams?
  • How to organise streams
  • First thoughts on correlation matrix
  • Preselection similarities
  • Use of "calibration" streams

Stripping Procedure

  • How best to store event selection info
    • Event header
    • Event tag collection
    • what information is necessary
  • Output of stripping jobs
    • B & D candidates
    • Physics analysis requirements
  • Process sequencing? e.g. DaVinci -> Brunel -> DaVinci ??
    • consequences of sequencing options on data output and workflows

Trigger & Stripping cuts

  • How and where to store event selection cuts (for reconstruction, trigger and stripping)
    • Conditions DB vs CVS
  • How to ensure consistency between code version & cuts used
>
>

Use cases

A set of use cases to catch the requirements for the streaming. For each of them the way they will access the data, the information required and the implications on all the areas of the streaming task force remit should be commented on.

Definiton of words

  • A stream refers to the collection of events that are stored in the same physical file for agiven run period.
  • A selection is the output of a given selection during the stripping. There will be one or more *selection*s in a given stream.
  • A category describes the type of a given stream. It can be either monitoring, calibration or physics.

Additional issues

In addition to the issues raised in the streaming task force remit the following issues should be considered:
  • How to account for the luminosity corresponding to the data in a given selection.
  • There is a need to understand the limitations from the computing side in terms of duplication of data and limits on the total number of streams.
  • The initial running period should be considered as a special case.
  • Ease of carrying out calibration/analysis tasks should be the driving issue for the final proposal.
 

Timescale

Changed:
<
<
A finalised proposal that has been discussed with the collaboration should be ready in December 2006.
>
>
  • First feedback to the collaboration at a suitable meeting in September
  • An outline proposal for the October s/w week
  • A finalised proposal to be ready in December 2006
 

Members

The members of the task force are:
Line: 62 to 53
 Updated:
Added:
>
>
 
Added:
>
>
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152885599" name="9169488754d56a4376b2284c256e81ab.png" path="9169488754d56a4376b2284c256e81ab.png" size="462" user="UnknownUser" version=""
META FILEATTACHMENT attr="" autoattached="1" comment="" date="1152887889" name="d0bd959ca56e5859190c90ee81387c4d.png" path="d0bd959ca56e5859190c90ee81387c4d.png" size="544" user="UnknownUser" version=""

Revision 22006-07-03 - UlrikEgedeSecondary

Line: 1 to 1
 
META TOPICPARENT name="ComputingModel"

Streaming task force

Remit

Changed:
<
<
Task force should examine the needs and requirements of streaming at the trigger/DAQ level and also at the stripping process. The information need to perform analysis at the stripping should be revisited and the consequences for the workflows & the information stored should be reported. The issue surrounding the archiving of selection & recons cuts should be addressed & how to correlate with the code version used. Extensive use case example should be given to support chosen solutions.
>
>
The task force should examine the needs and requirements of streaming at the trigger/DAQ level as well as in the stripping process. The information required to perform analysis for the stripping should be revisited and the consequences for the workflows and the information stored should be reported. The issue surrounding the archiving of selection and reconstruction and how to correlate them with the code version used should be addressed. Extensive use cases should be given to support chosen solutions.
 

Streams to T0

Changed:
<
<
  • how many for physics: 1, 4 (exclusive, D*. dimuon, inclusive), ...
  • Prelim understanding of nos of overlap if nos of streams > 1
  • "Hot" stream of recons evts for physics monitoring
  • consequence for start of processing of RAW data
>
>
  • How many for physics: 1, 4 (exclusive, D*. dimuon, inclusive), ...
  • Preliminary understanding of level of overlap if number of streams exceed 1
  • Role of "hot" stream of reconstructed events for physics monitoring
    • Consequences for start of processing of RAW data
 
  • Calibration streams
Changed:
<
<
  • consequence for start of processing of RAW data
  • amount of data
>
>
    • Consequence for start of processing of RAW data
  • Amount of data
 

Stripping

Changed:
<
<
  • how many streams ?
  • how to organise streams
  • first thoughts on correlation matrix
  • preselection similarities
  • "calibration" streams
>
>
  • How many streams?
  • How to organise streams
  • First thoughts on correlation matrix
  • Preselection similarities
  • Use of "calibration" streams
 

Stripping Procedure

Changed:
<
<
  • how best to store event selection info
  • event header
  • event tag collection
>
>
  • How best to store event selection info
    • Event header
    • Event tag collection
 
  • what information is necessary
Changed:
<
<
  • output of stripping jobs
>
>
  • Output of stripping jobs
 
  • B & D candidates
Changed:
<
<
  • physics analysis reqts
  • process sequencing? e.g. DaVinci -> Brunel -> DaVinci ??
  • consequences of sequencing options on
    • data output
    • workflows
>
>
    • Physics analysis requirements
  • Process sequencing? e.g. DaVinci -> Brunel -> DaVinci ??
    • consequences of sequencing options on data output and workflows
 

Trigger & Stripping cuts

Changed:
<
<
  • how & where to store event selection cuts (for recons, trigger & stripping)
  • conditions DB vs CVS
  • how to ensure consistency between code version & cuts used
>
>
  • How and where to store event selection cuts (for reconstruction, trigger and stripping)
    • Conditions DB vs CVS
  • How to ensure consistency between code version & cuts used

Timescale

A finalised proposal that has been discussed with the collaboration should be ready in December 2006.
 

Members

The members of the task force are:
Changed:
<
<
  • Ulrik Egede (chair)
  • Patrick Koppenburg
  • Gerhard Raven
  • ... tbc ...

-- UlrikEgede - 30 Jun 2006

>
>


Updated:

 

Revision 12006-06-30 - UlrikEgedeSecondary

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="ComputingModel"

Streaming task force

Remit

Task force should examine the needs and requirements of streaming at the trigger/DAQ level and also at the stripping process. The information need to perform analysis at the stripping should be revisited and the consequences for the workflows & the information stored should be reported. The issue surrounding the archiving of selection & recons cuts should be addressed & how to correlate with the code version used. Extensive use case example should be given to support chosen solutions.

Streams to T0

  • how many for physics: 1, 4 (exclusive, D*. dimuon, inclusive), ...
  • Prelim understanding of nos of overlap if nos of streams > 1
  • "Hot" stream of recons evts for physics monitoring
  • consequence for start of processing of RAW data
  • Calibration streams
  • consequence for start of processing of RAW data
  • amount of data

Stripping

  • how many streams ?
  • how to organise streams
  • first thoughts on correlation matrix
  • preselection similarities
  • "calibration" streams

Stripping Procedure

  • how best to store event selection info
  • event header
  • event tag collection
  • what information is necessary
  • output of stripping jobs
  • B & D candidates
  • physics analysis reqts
  • process sequencing? e.g. DaVinci -> Brunel -> DaVinci ??
  • consequences of sequencing options on
    • data output
    • workflows

Trigger & Stripping cuts

  • how & where to store event selection cuts (for recons, trigger & stripping)
  • conditions DB vs CVS
  • how to ensure consistency between code version & cuts used

Members

The members of the task force are:
  • Ulrik Egede (chair)
  • Patrick Koppenburg
  • Gerhard Raven
  • ... tbc ...

-- UlrikEgede - 30 Jun 2006

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback