Streaming of data in LHCb

  • Set PDFTITLE = Streaming of data in LHCb
  • Set PDFSUBTITLE = LHCb-2006-XXXX
  • Set toclevels = 0
Streaming of data in LHCb

LHCb-2007-001

COMP

XX-Oct-2006


Streaming of data in LHCb


Copyright 2005 Your Company Inc.
All rights reserved

align="centrer" Abstract

Table of Contents

Printed version

Authors

Introduction

When data is collected from the LHCb detector, the raw data will be transferred in quasi real time to the LHCb associated Tier 1 sites for the reconstruction to produce rDST files. The rDST files are used for stripping jobs where events are selected for physics analysis. Events selected in this way are written into DST files and distributed in identical copies to all the Tier 1 sites. These files are then accessible for physics analysis by individual collaborators. The stripping stage might be repeated several times a year with refined selection algorithms.

This report examines the needs and requirements of streaming at the data collection level as well as in the stripping process. We also look at how the information for the stripping should be made persistent and what bookkeeping information is required. Several use cases are analyzed for the development of a set of recommendations.

The work leading to this report is based on the streaming task force remit available as an appendix.

More background for the discussions leading to the recommendations in this report can be found in the Streaming Task Force Hypernews.

Definition of words

  • A stream refers to the collection of events that are stored in the same physical file for a given run period. Not to be confused with I/O streams in a purely computing context (e.g. streaming of objects into a Root file).
  • A selection is the output of a given selection during the stripping. There will be one or more selections in a given stream. It is expected that a selection should have a typical (large) size of 106 (107) events in 2 fb-1. This means a reduction factor of 2 x 104 (103) compared to the 2 kHz input stream or an equivalent rate of 0.1 (1.0) Hz.

Use cases

A set of use cases to capture the requirements for the streaming were analyzed:

The analysis related to the individual use cases is documented in the Wiki pages related to the streaming task force.

Experience from other experiments

Other experiments with large data volumes have valuable experience. Below are two examples of what is done elsewhere.

D0

In D0 the data from the detector has two streams. The first stream is of very low rate and selected in their L3 trigger. It is reconstructed more or less straight away and its use is similar to the tasks we will perform in the monitoring farm. The second stream contains all triggered data (including all of the first stream). Internally the stream is written to 4 files at any given time but there is no difference in the type of events going to each of them. The stream is buffered until the first stream has finished processing the run and updated the conditions. It is also checked that the new conditions have migrated to the remote centers and that they (by manual inspection) look reasonable. When the green light is given (typically in less than 24h) the reconstruction takes place at 4 remote sites (hence the 4 files above).

For analysis jobs there is a stripping procedure which selects events in the DST files but does not make copies of them. So an analysis will read something similar to our ETC files. This aspect is not working well. A huge load is experienced on the data servers due to large overheads in connection with reading sparse data.

Until now reprocessing of a specific type of physics data has not been done but a reprocessing of all B triggers is planned. This will require reading sparse events once from the stream with all the raw data from the detector.

BaBar

In BaBar there are a few different streams from the detector. A few for detector calibration like e+e- → e+e- (Bhabha events) are prescaled to give the correct rate independent of luminosity. The dominant stream where nearly all physics come from is the hadronic stream. This large stream is not processed until the calibration constants are ready from the processing of the calibration streams for a given run.

BaBar initially operated with a system of rolling calibrations where calibrations for a given run n were used for the reconstruction of run n+1, using the so called 'AllEvents' stream. In this way the full statistics was available for the calibrations, there was no double processing of events but the conditions were always one run late. A consequence of this setup was that runs had to be processed sequentially, in chronological order, introducing scaling problems. The scaling problems were worsened by the fact that individual runs were processed on large farms of CPUs, and harvesting the calibration data, originating from the large number of jobs running in parallel, introduced a severe limit on the scalability of the processing farm. These limits on scalability were successfully removed by splitting the process of rolling calibrations from the processing of the data. Since the calibration only requires a very small fraction of the events recorded, these events could easily be separated by the trigger. Next this calibration stream is processed (in chronological order) as before, producing a rolling calibration. As the event rate is limited, scaling of this 'prompt calibration' pass is not a problem. Once the calibration constants for a given run have been determined in this way and have been propagated into a conditions database, the processing of the 'main stream' for that run is possible. Note that in this system the processing of the main physics data uses the calibrations constants obtained from the same run, and the processing of the 'main stream' is not restricted to a strict sequential, chronological order, but can be done for each run independently, on a collection of computing farms. This allows for easy scaling of the processing.

The reconstructed data is fed into a subsequent stripping job that writes out DST files. On the order of 100 files are written with some of them containing multiple selections. One of the streams contains all hadronic events. If a selection has either low priority or if its rejection rate is too poor an ETC file is written instead with pointers into the stream containing all hadronic events.

Data are stripped multiple times to reflect new and updated selections. Total reprocessing was frequent in the beginning but can now be years apart. It has only ever been done on the full hadronic sample.

Proposal

Here follows the recommendations of the task force.

Streams from detector

A single bulk stream should be written from the online farm. The advantage of this compared to a solution where several streams are written based on triggers is:
  • Event duplication is in all cases avoided within a single subsequent selection. If a selection involves picking events from more than one detector stream there is no way to avoid duplication of events. To sort this out later in an analysis would be error prone.

The disadvantages are:

  • It becomes harder to reprocess a smaller amount of the dataset according to the HLT selections (it might involve sparse reading). Experience from past experiments shows that this rarely happens.
  • It is not possible to give special priority to a specific high priority analysis with a narrow exclusive trigger. As nearly every analysis will rely on larger selections for their result (normalization to J/ψ signal, flavor tagging calibration) this seems in any case an unlikely scenario.

With more exclusive HLT selections later in the lifetime of LHCb the arguments might change and could at that point force a rethink.

Many experiments use a hot stream for providing calibration and monitoring of the detector as described in the sections on how streams are treated in BaBar and D0. In LHCb this should be completely covered within the monitoring farm. To be able to debug problems with alignment and calibration performed in the monitoring farm a facility should be developed to persist the events used for this task. These events would effectively be a second very low rate stream. The events would only be useful for debugging the behavior of tasks carried out in the monitoring farm.

Processing timing

To avoid a backlog it is required that the time between when data is collected and reconstructed is kept to a minimum. As the first stripping will take place at the same time this means that all calibration required for this has to be done in the monitoring farm. It is advisable to delay the processing for a short period (8 hours?) allowing shifters to give a green light for reconstruction. If problems are discovered a run will be marked as bad and the reconstruction postponed or abandoned.

Number of streams in stripping

Considering the low level of overlap between different selections, as documented in the page the appendix on correlations, it is a clear recommendation that we group selections into a small number of streams. This has some clear advantages compared to a single stream:
  • Limited sparse reading of files. All selections will make up 10% or more of a given file.
  • No need to use ETC files as part of the stripping. This will make data management on the Grid much easier (no need to know the location of files pointed to as well).
  • There are no overheads associated with sparse data access. Currently there are large I/O overheads in reading single events (32kB per TES container), but also large CPU overheads when Root opens a file (reading of dictionaries etc.). This latter problem is being addressed by the ROOT team, with the introduction of a flag to disable reading of the streaming information.

The disadvantages are very limited:

  • An analysis might cover more than one stream making it harder to deal with double counting of events. Lets take the Bs → μ+μ- analysis as an example. The signal will come from the two-body stream while the BR normalization will come from the J/ψ stream. In this case the double counting doesn't matter though so the objection is not real. If the signal itself is extracted from more than one stream there is a design error in the stripping for that analysis.
  • Data will be duplicated. According to the analysis based on the DC04 TDR selections the duplication will be very limited. If we are limited in available disk space we should reconsider the mirroring of all stripped data to all T1's instead (making all data available at 5 out of 6 sites will save 17% disk space).

The appendix on correlations shows that it will be fairly easy to divide the data into streams. The full correlation table can be created automatically followed by a manual grouping based mainly on the correlations but also on analyses that naturally belong together. No given selection should form less than 10% of a stream to avoid too sparse reading.

In total one might expect around 30 streams from the stripping, each with around 107 events in 2 fb-1 of integrated luminosity. This can be broken down as:

  • Around 20 physics analysis streams of 107 events each. There will most likely be significant variation in size between the individual streams.
  • Random events that will be used for developing new selections. To get reasonable statistics for a selection with a reduction factor of 105 a sample of 107 events will be required. This will make it equivalent to a single large selection.
  • A stream for understanding the trigger. This stream is likely to have a large overlap with the physics streams but for efficient trigger studies this can't be avoided.
  • A few streams for detailed calibration of alignment, tracking and particle identification.
  • A stream with random triggers after L0 to allow for the development of new code in the HLT. As a narrow exclusive HLT trigger might have a rejection factor of 105 (corresponding to 10 Hz) a sample of 107 is again a reasonable size.

Monte Carlo data

Data from inclusive and "cocktail" simulations will pass through the stripping process as well. To avoid complicating the system is recommended to process these events in the same way as the data. While this will produce some selections that are irrelevant for the simulation sample being processed, the management overheads involved in doing anything else will be excessive.

Meta data in relation to selection and stripping

As outlined in the use cases every analysis requires additional information about what is analyzed apart from the information in the events themselves.

Bookkeeping information required

From a database with the meta data from the stripping is should be possible to:
  • Get a list of the exact files that went into a given selection. This might not translate directly into runs as a given run will have its rDST data spread across several files and a problem could be present with just one of them.
  • For an arbitrary list of files that went into a selection obtain some B counting numbers that can be used for normalizing branching ratios. This number might be calculated during the stripping phase.
  • To correct the above numbers when a given file turns unreadable (i.e. should know exactly which runs contributed to a given file).
  • When the stripping was performed to be able to recover the exact conditions used during the stripping.

It is urgent to start a review of exactly what extra information is required for this type of bookkeeping information as well as how the information is accessed from the command line, from Ganga, from within a Gaudi job etc. A working solution for this should be in place for the first data.

Information required in Conditions database

The following information is required from the conditions database during the analysis phase.

Trigger conditions for any event should be stored. Preferably this should be in the form of a simple identifier to a set of trigger conditions. What the identifier corresponds to will be stored in CVS. An identifier should never be re-used in later releases for a different set of trigger conditions to avoid confusion.

Identification of good and bad runs. The definition of bad might need to be more fine grained as some analysis will be able to cope with specific problems (like no RICH info). This information belongs in the Conditions database rather than in the bookkeeping as the classification of good and bad might change at a time after the stripping has taken place. Also it might be required to identify which runs were classified as good at some time in the past to judge if some past analysis was affected by what was later identified as bad data. When selecting data for an analysis this information should be available thus putting a requirement on the bookkeeping system to be able to interrogate the conditions.

Procedure for including selections in the stripping

The note LHCb-2004-031 describes the (somewhat obsolete) guidelines to follow when providing a new selection and there are released Python tools that check these guidelines. However, the experience with organizing stripping jobs is poor: for DC04 only 3 out of 29 preselections were compliant in the tests and for DC06 it is a long battle to obtain a stripping job with sufficient reduction and with fast enough execution time. To ease the organization:

  • Tools should be provided that automate the subscription of a selection to the stripping.
  • The actual cuts applied in the selections should be considered as the responsibility of the physics WGs.
  • We suggest the nomination of stripping coordinators in each WG. They are likely to be the same person as the "standard particles" coordinators.
  • If a subscribed selection fails automatic tests for a new round of stripping it is unsubscribed and a notification sent to the coordinator.

Updated:

Appendix

Contents:

Use case for flavour tagging calibration.

Outline of use case

The calibration of flavour tagging requires the analysis of a large number of channels of data. Some of them are potentially triggered in multiple ways like the semi-leptonic decays. It also requires to know the phase space used by each particular analysis, i.e., what Pt region, what sort of triggered events (TIS,TOS,TOB), etc...

Selection

Many selections are needed for the flavour tagging calibration. Channels like B+->J/Psi K+, Bs->Ds mu nu (X), Bd->D*mu nu (X), B+->D0bar mu nu (X), or B+->D0bar pi, Bs->Ds pi are deemed to be very useful to calibrate the tagging.

Issues

It may be posible that the "tagging group" provides a "calibrated" tool as a function of the relevant phase space variables, hence only a limited number of people in LHCb would do the calibration. This is the model at TeVatron. But, it could also be that this is not posible, then a large number of people may need to access a large number of files. Talking to the "tagging people" I got the impresion that this is not yet settled.

Number of streams from the detector

Most probably there would be no benefit in having several streams in this case.

Information required on trigger, stripping and luminosity

The calibration of the tagging needs to know how the event was triggered, and in particular, needs to classify the B candidate into TIS, TOS, TOB categories. I don't think information on luminosity is needed.

Number of streams in the stripping

In this analysis, few streams should be better, as most probably it would be needed to access a large number of files anyway.

Information stored for each event

For the large number of control samples used in the flavour tagging there would be an advantage of storing the B candidate that selected the event. In this way the subsequent flavour tagging could be performed in an analysis without concern about running all the selection algorithms again.

There could be a need for storing both flavour tag information and some event weight as well.

Contents:

Use case for sparse analysis

Outline of use case

The main point for the $B_s \rightarrow \mu^+\mu^-$ is to measure the branching ratio. Due to the clean signature of a two-body decay with muons a very tight selection can be written corresponding to a low rate. For estimating the background, sidebands are required for the dimuon invariant mass distribution as well as separate samples of other 2-body B decays. The signal efficiency will somehow be measured from $B \rightarrrow J/\Psi X$ events.

Selection

The selection places requirements on the tracking for accuracy in the vertexing and particle ID for the muon identification. The category for the stream where this selection belongs is physics.

Issues

Number of streams from the detector

The signal and the $B \rightarrrow J/\Psi X$ control events will all come from the dimuon trigger. However the $B \rightarrow h^{+} h^{-}$ events that will be used as a control sample for the particle ID are part of the exclusive stream. As there might be a common stripping between the signal and the $B \rightarrow h^{+} h^{-}$ events individual stream would cause trouble with double counting. This arguments seems to favour a single stream that all subsequent stripping is done from.

Information required on trigger, stripping and luminosity

As this is a branching ratio measurement, strict control of the dataset used is required.

The branching ratio will be measured relative to a known braching ratio to avoid the uncertainty from the B cross section. The $B^{+} \rightarrow J/\Psi K^{+}$ channel is the most likely channel. Eventually one might want to normalise to $BR(B_{s} \rightarrow J/\Psi  \varphi)$ to avoid the uncertainty from the Bs production fraction. The relative trigger efficiency for the control and the signal channel will depend on trigger conditions so these need to be retrieved for every single run in the dataset - even for for runs where there might be no events selected. If this information is stored in a kind of run header we need to carry this one forward even for empty runs. If the information is external the information about the dataset selection should be kept separately.

Information on luminosity is only required to be able to state the size of the dataset used for the measurement. It will not be used in the actual measurement.

Number of streams in the stripping

If there is a low number of streams then the signal selection will form only a very small number of events inside a much larger stream. Hence the analysis will require more resources in terms of access to more data (what is the penalty from this?). If the selection can be combined with the other 2-body selections that anyway will be used as a control sample this might be less of an issue.

Information stored for each event

Given the very simple final state there seems to be a limited benefit of storing any extra information in the DST from the stripping.


Contents:

Use case for a physics analysis with a high branching ratio

Outline of use case

This is high branching ratio channel which means that including sidebands the final number of selected events is of the same order as the number of events selected in the stripping - ie no additional selection possible in a subsequent analysis. The CP asymmetry is small (measuring gamma), which requires a very good understanding of any CP-faking effect in the background and the tagging.

CP biases

In the following there are many mentions of PID. What actually counts is a good knowledge of CP-biases due to the detector. The flavour will be determined from the charge of the slow and fast pions and the charge of the kaon. Biases can be induced by charge-dependent kaon-ID, the different interaction rat of positive and negative particles in the detector and left-right asymetries in the detector efficiency. All these efefcts can be controlled using the PID-blind D* sample, which is why they are all labelled as "PID".

Expected number of events

The branching ratio for $B^0\rightarrow D^{*\pm}\pi^{\mp}$ with $D^{*+} \rightarrow D^{0} \pi^{+}$ and $D^{0} \rightarrow K^{-} \pi^{+}$ is (0.28%) * (68%) * (3.8%) = 7.2 * 10-5. Folding in the reconstruction, selection and trigger efficiencies Lisa & UE expect 206'000 signal events per year and B/S < 0.3 (lhcb-2003-126). If one allows for more than only the fast pion the number of events may even double with B/S < 0.5 (lhcb-2001-153). All the B/S ratios are based on inclusive BB only. If one assumes that BB forms 50% of the HLT output, all these B/S have to be multiplied by about 2.

This is all without sidebands. To achieve a very good understanding of the CP asymmetry in the background a very large B mass side band will be needed (how large? Let's take 10 times the tight window.). Sidebands of about 4 times the window will also be taken for the D* and D0. Taking into account the fraction of real D* and D0 in the background I estimate that the background will be multiplied by 10*2*1.5=30. One could also control CP asymmetries in the background by subtracting events reconstructed using a wrong sign D*. This would get another factor 2. Adding other D decay modes would also increase this number, but I ignore that for the moment.

This leads to a total signal of 200k and 40 (0.3*2*30*2) times as much background for the exclusive case. That's 0.8 Hz with all final cuts applied, a sizable fraction of the approximately 5Hz (=0.1*200Hz/(4 WG)) that can be devoted to the CP studies after the stripping.

Consequences for stripping

There is little room left for loosening the cuts in the stripping and hence a requirement from this channel is that the reconstruction and the PID have to be as good as possible already in the stripping. There will be several iterations before the stripping selection is OK but at the time of the stripping used for the final fit the PID and the tracking has to be properly calibrated.

Selection

The category for the stream where this selection belongs is physics. This selection does not rely on tight PID cuts, yet the effect of the kaon-ID needs to be measured in data. The tagging performance also needs to be measured. Finally the time resolution is needed, but is not as crucial as for Bs studies. Here comes the additional complication that one uses a slow pion that can be an upstream track. I assume it does not contribute significantly to the vertex position.

Issues

Number of streams from the detector

This channel is likely to be selected in the trigger as well by the exclusive B trigger as by the inclusive D*. It is thus essential that these two "streams" are stripped together.

Information required on trigger, stripping and luminosity

As this is a CP measurement an absolute normalization of the BR is not necessary. Yet one has to make sure that a proper PID calibration is available for each run period as the quality of PID might change with external conditions.

As this is a self-tagging decay there is no need of external information to get the tagging efficiency.

Number of streams in the stripping

This selection will have a strong overlap with all D*-based physics selections, which are all designed for measuring gamma. It thus makes sense to store them all in the same stream. Other D*-based selections are the tagging calibration D*mu and the inclusive D* PID calibration stream. Although this data will be used as input in the analysis it seems not to be needed to have it available in the final fit. As the workflow is likely to be different I'd prefer to have them in three streams.

Information stored for each event

Given the large number of events to analyse and the large combinatorics it would be very helpful to already have the stripping candidates stored in the DST. Since harldly any additional cut can be applied offline it does little sense to repeat the selection process in a user job. -- UlrikEgede - 14 Jul 2006 -- PatrickKoppenburg - 20 Jul 2006 -- PatrickKoppenburg - 21 Jul 2006
Contents:

Use case for calibration of the detector

Outline of use case

The RICH needs to be calibrated before RAW data can be processed for physics. The RICH will also be used in HLT exclusive selections. Many of the calibration constants used on-line in EFF will not be the same as in off-line data reconstruction for physics analysis. Here we concentrate on RICH calibrations for off-line physics analysis.

RICH reconstruction depends crucially on quality of tracking, thus calibration of tracking must be finalized before RICH calibration is performed.

There are several layers of RICH calibration: image distortions in HPDs due to magnetic field, alignment of mirrors and photo-detectors, refractive index (varies with pressure and gas mixture composition), single photon efficiency and resolution per HPD tube, PDFs for likelihood functions used in particle identification. Since they depend on each other, multiple passes through the data may be needed to calibrate all reconstruction steps.

Selection

Most of RICH calibrations can be performed on any type of events which populate the area of the RICH detectors with tracks with saturated Cherenkov rings (v/c around 1). It is desirable to exclude events which were kept via exclusive HLT triggers which use, and therefore also bias, the RICH information.

Calibration/assessment of high level RICH analysis (particle identification) will require D* data, since they provide source of kaons and pions identified without use of the RICH.

Issues

Many details of the calibration of the tracking system and of the RICH detectors are fuzzy at this point. For example, how often various constants will need to be changed? Can any of them be fixed from hardware info or special calibration runs? Which calibrations will run in the monitoring farm, and which will require dedicated processing at TIER-1 centers? In any case, special stream of calibration data will be needed. It could be created already in monitoring farm, where it could undergo initial processing. DSTs from such processing could be saved and transferred to TIER-1 for further calibration steps. Alternatively, all processing of calibration stream could happen in TIER-1 centers, with only RAW data being transferred off the pit.

It would be beneficial if more than one subdetector shared a calibration stream. This is likely to be the case for tracking and lower levels of RICH calibration which can be done on any types of data which contain some high momentum tracks. The muon detector will require a muon enriched sample, which would be also fine for tracking and RICH except that rate of such events may be insufficient especially with lower luminosities.

At least two reconstruction passes through the data will be needed. The first pass will serve mainly to calibrate the tracking system. If it happens in the monitoring farm, information from the other detectors could also be reconstructed for monitoring purposes. The second pass would be done with final tracking constants and saved in DST format as RICH calibrations jobs are likely to reread these data several times.

The amount of data in the calibration stream will be a fraction of the total RAW data output from the experiment and will depend on the requirements of the subdetectors and amount of computing power available for calibration purposes.

The most likely scenario is one where the monitoring farm will be used to check if the alignment and calibration constants in effect are good enough to allow the data to go to reconstruction. This quality check might indeed be formed by redoing the alignment and "subtracting" it from the current one. If any major difference is present the new aligment constants might be applied before the reconstruction. Initially this will always require manual intervention.

To enable debugging of the behaviour in the monitoring farm it should be possible to persist the data used for calibration. This will require a separate low rate stream from the detector.

Timing

Quick processing of calibration stream(s) is essential, since reconstruction of all data awaits the calibration results.

Number of streams from the detector

As outlined above lower levels of RICH calibrations can be performed on any type of RICH unbiased data with some high Pt tracks, while the higher level will require D* events. Two separate selections are likely, since the D* events will be incoming at lower rate.

Information required on trigger, stripping and luminosity

Calibration streams will not require bookkeeping needed to determine integrated luminosity. Information of selections used in stripping of calibration data will be needed only if different types of selections are mixed in together into common calibration stream.

Information stored for each event

Tracking information will need to be saved once the calibration stream is processed with final tracking constants. It is likely that it will be beneficial to save tracks also from preliminary processing of the calibration data.

Contents:

Use case describing how to recover from a bug in the stripping

Outline of use case

A given analysis use as its incoming dataset data from two different processings. Summer conferences are close and there is no time to get a consistent dataset. At the last moment a bug is found in the selection for part of the data causing a known in-efficiency. What is required in terms of information for correcting for this to the affected part of the data.

We need to consider here what can potentially go wrong in the stripping for a given selection. We should then consider in what cases it is possible to recover from this with a new stripping rather than a total reprocessing of the data.

Selection

Issues

Number of streams from the detector

Pros and cons for this particular analysis for having one or more streams from the detector.

Information required on trigger, stripping and luminosity

Number of streams in the stripping

Pros and cons for this particular analysis of many/few streams.

Information stored for each event

Pros and cons for storing composite candidates and/or other extra data in the DST for the selection. -- UlrikEgede - 14 Jul 2006
Contents:

Use case for the development of a new selection

Outline of use case

For a new selection access to unstripped data is a requirement. The sample has to be large enough that the rejection factor for even a sparse selection can be calculated with a reasonable accuracy.

Selection

This is effectively a random selection. It is important that the selection is not just a few runs but to some degree samples the run period in question. This could be achieved by having a truly random selection at a rate which keeps it no larger than other typical selections.

Issues

It has to be ensured tha the selection will run on the rDST and not only the DST. Thus a small sample of rDST data should be made available on disk as well.

We might want to have a selection with random T0 or T0+T1 events as well to allow a new analysis to optimise their HLT selection.

Timing

This is for developing the next generation of selections. Hence the production of this selection shoud be done in parallel with all the other selections.

Number of streams from the detector

As the selection is random it will have very little overlap with any other selection. It should clearly have its own stream.

Information required on trigger, stripping and luminosity

Information stored for each event

Maybe one should store information about what other selections passed each event. That would help in subsequent decisisons about if the selection should be grouped with other selections for larger streams.

Contents:

Use case to perform a detector calibration analysis with data from the 2007 pilot run.

Outline of use case

The 2007 pilot run will be used to calibrate the detector with physics collisions data for the first time. It is likely that the number of events will be small and spread over many runs. The raw data will be buggy (mis-cablings, encoding errors etc.) and the code will not have all the required functionality. Many reprocessings will therefore be necessary

Selection

In the early days it is likely that the HLT will be in 'monitoring mode', classifying events but not necessarily rejecting them. Early datasets will contain events of the following types:
  • Interaction trigger (physics collision or beam gas).
  • Calibration trigger (e.g. calorimeter laser events)
  • Beam crossing trigger (downscaled, for occupancy studies, muon halo etc.)
Some of these triggers will not require propagation to Tier0/Tier1 but can be processed directly in the online monitoring farm. Others will be of interest only for calibrating specific detectors, with sub-detector specific code. Only the interaction triggers will be reconstructed through Brunel - they will be used for detector alignment and hopefully a first look at physics quantities.

Issues

Although the physics collision rate will be small, the trigger rate may well be as large as 2kHz. The pilot run data will have to be easily accessible by many diverse people. The official production chain (which will need to be commissioned) is only one of many possible applications wanting to access the raw data. Reprocessing will be frequent and should be made possible with minimal overhead

Number of streams from the detector

Pros and cons for this particular analysis for having one or more streams from the detector.

Information required on trigger, stripping and luminosity

Number of streams in the stripping

Pros and cons for this particular analysis of many/few streams.

Information stored for each event

Pros and cons for storing composite candidates and/or other extra data in the DST for the selection.

-- MarcoCattaneo - 20 Jul 2006 Warning: Can't find topic LHCb.STFUsecaseMoreThanOneMode
Latex rendering error!! dvi file was not created.


-- UlrikEgede - 26 Sep 2006

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng 346f86fe31fe956c3d0ad511af423d4a.png   manage 0.5 K 2006-09-26 - 11:32 UnknownUser  
PNGpng 372d83f335bf18ce08594e70e9e32866.png   manage 0.5 K 2006-09-26 - 11:40 UnknownUser  
PNGpng 512b06dc89e8b46b0fd29f904feaaa84.png   manage 0.5 K 2006-09-26 - 11:40 UnknownUser  
PNGpng 88a0780af21da4ad89d6153cb03f0fb0.png   manage 0.7 K 2006-09-26 - 11:32 UnknownUser  
PNGpng 8ca4d69364f47c4ba2a7783b0f54e0f0.png   manage 0.5 K 2006-09-26 - 11:32 UnknownUser  
PNGpng 9169488754d56a4376b2284c256e81ab.png   manage 0.5 K 2006-09-26 - 11:32 UnknownUser  
PNGpng ba2aaad037cb8503d5a03835b90cde59.png   manage 0.6 K 2006-09-26 - 11:32 UnknownUser  
PNGpng f0c57582dd1a285fc0434f8b40538c33.png   manage 0.4 K 2006-09-26 - 11:32 UnknownUser  
Edit | Attach | Watch | Print version | History: r7 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2006-09-26 - UlrikEgede
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback