Difference: FileSummaryRecord (1 vs. 9)

Revision 92014-04-14 - RobLambert

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb File Summary Record

Line: 81 to 81
 
Added:
>
>

What do I need to do to enable FSRs correctly?

  • FSRs are Gaudi constructs that can only live on Root-formatted output files, like SIM/DST etc.
  • To write FSRs needs special background services to be in place (an extra event store) and an extra writer to be added to each file (to write the FSRs).
  • IOHelper (and IOExample) are helper classes which wrap up all of the various things required to enable FSRs, which includes the special event store and special writers.
  • Most configurables with an "OutputFile"-like configurable option/slot use IOHelper correctly to add FSRs when required. These configurables might have a "WriteFSR" option to them, which by default is usually "True".
  • LHCbApp(), the most basic LHCb application, itself uses IOHelper to set up the underlying services. Since most LHCb applications also configure LHCbApp, you usually don't need to do anything here.
  • If you are not using LHCbApp, or some other automated output of an existing configurable you will need to call the right things from IOHelper yourself.
  • outputAlgs is a method of IOHelper which returns a list of algorithms, one of which will be an OutputStream-like object, another with be a RecordStream-like object. In case you are adding your own writer in the middle of a sequence, you need this method.
  • outStream is another method which does not return anything, but does automatically add write algorithms to the ApplicationMgr().OutStream, which is a sequence of algorithms run after all others.
  • setupServices() is another method which configures all the correct services, it is used automatically by LHCbApp.
  • For more details use the Doxygen or SVN of IOHelper, or use the in-built python help.
 

What's there in your FSR? How to check if everything is OK?

Revision 82014-04-03 - RobLambert

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb File Summary Record

Line: 170 to 170
 
  • the contained object must have non-zero number of members to interface correctly with what ROOT expects from objects within its trees, some of that information can be stored as metadata of the branch, and some is stored in the leaves, once per data member,
  • The EventCountFSR is a very simple FSR which stores two long long ints.
  • If 8 of them are stored to a file as a tree with only 2 bifurcations at each branch, instead of storing 8*2 packed ints we are now storing
Changed:
<
<
    • Size on file: 1+2+4+8 * (string + pointer + container + root-specific-rubbish ) + 8*2 packed ints
    • Size in memory: 1+2+4+8 * (string + pointer + container + root-specific-rubbish + bucket + cache) + 8*2 packed ints
>
>
    • Size on file: (1+2+4+8) * (string + pointer + container + root-specific-rubbish ) + 8*2 packed ints
    • Size in memory: (1+2+4+8) * (string + pointer + container + root-specific-rubbish + bucket + cache) + 8*2 packed ints
    • Mathematically: [Sum( depth=1 to deepest) branches^depth]*overhead + branches^deepest *2 packed ints
 
    • Note that the root-specific-rubbish + bucket + cache is likely to be of order 10kB and so is vastly bigger than the size of the two packed ints.
Changed:
<
<
  • The deeper the tree and the broader the tree, the exponentially more memory is required to store these objects.
>
>
  • The deeper the tree the exponentially more memory is required to store these objects. The broader the tree, the polynomially more memory is required.
 

It is silly to think about it, but adding one low-level FSR right now is more expensive on memory usage and file size than adding the same information to every event, where it will be compressed away. So FSRs should only be used for the MINIMAL possible use cases which are not addressable in any other way.

Revision 72014-04-01 - RobLambert

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb File Summary Record

Line: 69 to 69
 
  • A new "event" store: since Root files have no concept of metadata, and Gaudi did not have a concept of metadata either, we implemented FSRs as a secondary event store, a new tree kept in the same root file, starting with "FileRecords"
  • New DataObjects: since Gaudi can only write DataObjects to Root files using the existing framework, we needed to add every metadata concept we can think of into the event class model, creating dedicated DataObjects
  • New services/algorithms: this datastore needs algorithms which fill it, and services which provide accessors to the data.
Changed:
<
<
  • Automatic Propagation: FSRs are expected to be transparently passed from file to file hand-over-hand by the dedicated services. File metadata only increases and
>
>
  • Automatic Propagation: FSRs are expected to be transparently passed from file to file hand-over-hand by the dedicated services. File metadata only increases, and is preserved in a tree structure using the GUID of the input files as directories.
 

Resources:

Revision 62014-04-01 - RobLambert

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb File Summary Record

Line: 172 to 172
 
  • If 8 of them are stored to a file as a tree with only 2 bifurcations at each branch, instead of storing 8*2 packed ints we are now storing
    • Size on file: 1+2+4+8 * (string + pointer + container + root-specific-rubbish ) + 8*2 packed ints
    • Size in memory: 1+2+4+8 * (string + pointer + container + root-specific-rubbish + bucket + cache) + 8*2 packed ints
Added:
>
>
    • Note that the root-specific-rubbish + bucket + cache is likely to be of order 10kB and so is vastly bigger than the size of the two packed ints.
 
  • The deeper the tree and the broader the tree, the exponentially more memory is required to store these objects.

Revision 52014-03-25 - RobLambert

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb File Summary Record

Line: 42 to 42
 
  • Gaudi does store some provenance information about files, in terms of links to older file GUIDs
  • The provenance of an opened file is known to Gaudi via a GUID associated to folders in the TES. If a folder location was ever written to a file in the past, it can be resurrected through a local xml catalog which links file GUIDs to physical file names (PFNs).
Changed:
<
<
Metadata? So, in a sense Gaudi does keep some metadata about files. However, the metatadata automatically stored with Gaudi is stored for every event. Events have their own metatdata, which may be compressed away by Root compression, but in principle the even does not care about file information.
>
>
Metadata? So, in a sense Gaudi does keep some metadata about files. However, the metatadata automatically stored with Gaudi is stored for every event. Events have their own metatdata, which may be compressed away by Root compression, but in principle the event does not care about file information.
  LHCb users: If you are a user familiar with the LHCb usage of the Gaudi framework, you will have noticed even more odd things:
  • LHCb does store file metadata, in the file, for example in the RecHeader, GenHeader, such as:
Line: 96 to 96
 
  • Topology: Since this is iterated over many production applications, the resulting tree is very deep and broad, but also sparse:
    • deep: there are many levels in the tree
    • broad: there are many branches at each level in the tree
Changed:
<
<
    • sparse: each branch will have only a few entries, often only one double or int at the lowest hanging level
>
>
    • sparse: each branch will have only a few entries, often only one double or int at the lowest hanging level, and there is nto duplication to multiple "events" each file appears only once.
 
  • Oh: We end up with a bush rather than a tree.

Services and integration into Gaudi

Line: 156 to 156
 
    • ROOT files are efficient at storing shallow, narrow, dense trees.
      • shallow: not very many levels
      • narrow: not many branches per level
Changed:
<
<
      • dense: many hundreds or thousands of entries per event
>
>
      • dense: many hundreds or thousands of entries per event, many thousands of almost identical event structures.
 
    • The further you push ROOT away from this regime, the more file size grows, and the more memory is consumed on reading/writing the files.

  • Packing, compressing, allocating:

Revision 42014-03-24 - RobLambert

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb File Summary Record

Changed:
<
<
This page is gathering requirements for a "File Summary Record" as emerged from several discussion during the June '08 Software week. The purpose is to trigger a brainstorming leading to an implementation proposal to be described as well on this page.
>
>
  • FileSummaryRecord:
    • AKA FSR is a section of data within an LHCb data file reserved for metadata about the file content.
  • Frustrating Summary Record:
    • due to the ongoing problems of memory management and lack of maturity of this concept, I have often used this acronym instead.
 
Changed:
<
<

Use cases

>
>
....... Unsustainable

The Event store, Gaudi and Root files

LHCb data formats:

  • LHCb data are stored in MDF files (see RawEvent) or in ROOT files.
  • ROOT format starts with a single "Tree" of data, off which we hang "Branches" of C++ classes and containers, and finally "Leaves" of the member data held by those C++ classes.
  • LHCb ROOT files store data in a tree saved as "Event", which is why all locations start with "/Event/..."

Root Users: If you are a user familiar with ROOT, you will have noticed some strange features of Root files. For example:

  • the number of events stored in a root file is only known if you finish reading over the entire file
  • there is no file-header within which you are free to add metadata about the file, counters, and other information
  • To store data within a root file can be very costly, or very cheap, depending on how the compression handles it, and how many identical copies of data you are storing.

Metadata? In a sense Root files have no concept of user metadata. Although they store information about which ROOT version was used and can store entire arbirtrary C++ classes they lack the simple abilities of several other file types of up-front metadata in a dedicated file header or footer (see, for example, XML or .doc, or .pdf, .jpeg ...).

Gaudi users: If you are a user familiar with the Gaudi framework, you will also have noticed some odd things:

  • Gaudi uses the baseclass DataObject to store events to root trees, and most of these classes are created automatically from xml descriptions by the Gaudi Object Description, or GOD.
  • Each opened file is allocated a GUID (grid-unique-identification) independently of whether it ever goes to the grid
  • GUIDs are not-completely-unique-but-probably-pretty-unique hashes of the file content which are used instead of the filename to identify files and objects within Gaudi
  • Gaudi does store some provenance information about files, in terms of links to older file GUIDs
  • The provenance of an opened file is known to Gaudi via a GUID associated to folders in the TES. If a folder location was ever written to a file in the past, it can be resurrected through a local xml catalog which links file GUIDs to physical file names (PFNs).

Metadata? So, in a sense Gaudi does keep some metadata about files. However, the metatadata automatically stored with Gaudi is stored for every event. Events have their own metatdata, which may be compressed away by Root compression, but in principle the even does not care about file information.

LHCb users: If you are a user familiar with the LHCb usage of the Gaudi framework, you will have noticed even more odd things:

  • LHCb does store file metadata, in the file, for example in the RecHeader, GenHeader, such as:
    • the version of the software used,
    • the database tags applied in previous processing steps
    • This metadata is stored for every event

Metadata? In an ideal future-proof metadata system, files would arrive with the full knowledge of how they were produced, such that if you have access to the file, you know everything. In LHCb you require several systems to interrogate that information, the book-keeping, the Dirac production system, the released software, the TWiki pages...

Advantages to storing all metadata per event:

  • Events and their provenance can be uniquely identified
  • Events are self-contained, if an event makes it through your filtering system you are guaranteed to have its metadata.
  • You can forget about the file containing a given event

Disadvantages to storing all metadata per event:

  • If an event does not make it through your filter, its metadata is also lost.
  • If metadata are common to thousands of events this is wasteful in disk space and memory usage.
  • Adding new metadata becomes prohibitive in terms of file size.

The FileSummaryRecord

The concept: The concept of a file-summary-record (FSR) is a relatively new addition to the Gaudi Framework, it is the combination of several things.

  • A new "event" store: since Root files have no concept of metadata, and Gaudi did not have a concept of metadata either, we implemented FSRs as a secondary event store, a new tree kept in the same root file, starting with "FileRecords"
  • New DataObjects: since Gaudi can only write DataObjects to Root files using the existing framework, we needed to add every metadata concept we can think of into the event class model, creating dedicated DataObjects
  • New services/algorithms: this datastore needs algorithms which fill it, and services which provide accessors to the data.
  • Automatic Propagation: FSRs are expected to be transparently passed from file to file hand-over-hand by the dedicated services. File metadata only increases and

Resources:

What's there in your FSR? How to check if everything is OK?

<!-- SyntaxHighlightingPlugin -->
SetupProject DaVinci
$APPCONFIGROOT/scripts/CheckFSRs.py <somefilename>
<!-- end SyntaxHighlightingPlugin -->

Automatic propagation: A SHRUBBERY

  • TES: FSRs in the current TES, for the current file yet to be written are stored under "/FileRecords"
  • Opened Files: On opening a new file, all its FSRs are duplicated into the tree under "/FileRecords/<GUID>/..."
  • Topology: Since this is iterated over many production applications, the resulting tree is very deep and broad, but also sparse:
    • deep: there are many levels in the tree
    • broad: there are many branches at each level in the tree
    • sparse: each branch will have only a few entries, often only one double or int at the lowest hanging level
  • Oh: We end up with a bush rather than a tree.

Services and integration into Gaudi

To understand which services are required to be active and their options in order to propagate FSRs correctly, one can inspect what is done by IOHelper().setupServices(). SVN DOxygen

Interaction with applications/configurables

 
Changed:
<
<

Luminosity recording

>
>
  • IOHelper and IOExtension
    • have the keyword writeFSR embedded into the outputAlgs and outStream methods.
    • Activating this flag provokes the additional writing of the FSRs onto your output file
 
Changed:
<
<

Goal

>
>
  • LHCb Applications:
    • DaVinci, Moore, L0App, FileMerger, DSTConf, and many other LHCb configurables have a WriteFSR slot.
    • In the case the configurable is used directly to create output, this is propagated to the relevant IOHelper method.
 
Changed:
<
<
During the discussion on Thursday 19th June (see slides here), it appeared that a number of counters could be provided by the HLT, in order to count the occurrence of certain types of events. These counters will later be used by Gaudi in order to provide a relative estimate of the luminosity. An absolute calibration would allow to compute the absolute luminosity corresponding to the analysed dataset.
>
>
  • What happens in Brunel?:
    • In real data. LumiFSRs are initially created from counters created by the HLT.
 
Changed:
<
<

Counters recording

>
>
  • What happens in DaVinci?:
    • FSRs are propagated from DST to DST
    • FSRs are merged in production when thousands of files are used as input
    • Luminosity FSRs can be combined and compared with the conditions database calibrations to arrive at a measure of the luminosity with uncertainties.
 
Deleted:
<
<
Rather than recording this information in interval of time ("cycles") and later apply corrections depending on the fraction of events of a given cycle in the analysed dataset, it was proposed that the information could be stored in the data itself. As this information is computed by the HLT farm nodes, it was proposed that for each HLT-yes event the differential counters (from the preceding HLT-yes event) can be added as a separate bank.
 
Changed:
<
<
Multiple parallel streams of MDF files can be written in parallel and events will not appear in chronological order on the files. Nevertheless when integrating on a full dataset, the sum of all counters represent statistically the total number of events of the given type encountered in the HLT for this dataset. As MDF writing is a simple streaming of banks, it was proposed not to integrate the counters at this stage.
>
>

Applications/use cases

 
Changed:
<
<

Counters integration

>
>

- LumiCounters

 
Changed:
<
<
When the RAW-MDF file is processed to produce an RDST, the counters of all events are summed up. The set of counters thus obtained will be written out in the output stream(s) of the application after the last event has been processed. This special record is called the "File Summary Record" (FSR). It is proposed not to be an event record, but be accessible separately at any time of the processing.
>
>
  • In calculating the luminosity the events you did not select have equal importance to the events you did select.
  • This means that some non-event-wise metadata is 100% necessary, and FSRs are the solution implemented in LHCb.
  • To ensure that information on the total number of events seen by a job is not lost, and to eventually calculate the luminosity, the following metadata are required:
  1. Counters: various counters on event rates from lumi-limited well-calibrated lines
  2. EventCountFSR: a counter of how many events were seen from each file in order to validate that all events were read correctly
  3. Time stamps: the first event time of a file, and the last event time of a file must be known to use the conditions database calibrations
  4. Run numbers: the run numbers present in a certain file must be known.
 
Changed:
<
<
For all subsequent processing (e.g. stripping, file merging, user analysis etc...) the FSRs of all input files of an application are summed up in order to create the FSR of the output file of the application. The job can print out the FSR at finalisation time as well as including it automatically in the output stream(s).
  • The implementation must handle all possible use cases, including reconstruction of stripped events starting from an SETC pointing to RAW-MDF files.
>
>
Each of these FSRs is ensured to be mergable such that the memory and file-size overhead is kept down.
  • Limits on mergability:
    1. Brunel can not have processed events form different runs into the same DST.
    2. Merging across runs later in processing has had problems in the past, but nominally it should work with the latest production software.
 
Deleted:
<
<

Processing summary counters

 
Changed:
<
<

Goal

>
>

Future Applications/use cases:

  • (note that these twiki do not exist yet, because this is still only suggested work at the time of writing)
  • IOFSR: recording metadata about the file provenance direct into the FSR. This could eventually solve the memory explosion problem.
  • GeneratorLevelFSRs: propagate all generator counters for perfect generator-level-cut efficiency evaluation. NB: this will need a lot of merging.
 
Changed:
<
<
Currently in production jobs, the logfile of the application is analysed (by a set of "grep"s) in order to extract summary information. An FSR would allow to get automatically all these counters and additional information without relying on the format of the printout. In addition, reading it from the output file would verify that the file has been properly closed and is readable.
>
>

Comparison in pictures

 
Changed:
<
<

Summary information

>
>
A Root TreeAn FSR Bush
 
Changed:
<
<
The processing part of the FSR would contain a set of counters (required events, processed events, output events, skipped events...), as well as provenance information (list of GUIDs successfully processed, application name and version, options file name, specific additional options if needed...). Note that some counters are global, but a few might be stream-dependent (number of events written out); they could be added to the summary by each output stream writer at finalisation time.
>
>

Major issue: Memory and file-size explosion

 
Changed:
<
<
A (possibly complex) requirement would be that (part of ) the FSR be written out after each event in order to be able to easily retrieve information even if the application crashed (overwriting the previous information).
>
>
  • Memory-explosion.
    • As we discussed above, the FSR tree we create is very deep and broad, but also sparse.
    • ROOT files are efficient at storing shallow, narrow, dense trees.
      • shallow: not very many levels
      • narrow: not many branches per level
      • dense: many hundreds or thousands of entries per event
    • The further you push ROOT away from this regime, the more file size grows, and the more memory is consumed on reading/writing the files.
 
Changed:
<
<

Processing FSR usage

>
>
  • Packing, compressing, allocating:
  • ROOT files auto-compress themselves, and auto-adjust their caches, assuming that they are shallow, narrow, dense trees, and this is a very good assumption.
  • LHCb files often hold very much the same information from one event to the next, especially in high-occupancy situations.
  • ROOT optimizes the memory layout of reading and writing files after 10 events have been written.read.
  • FSRs, however, are only written/read once, and so the maximum size is always used as a cache and compression is completely pointless, it only costs disk space and CPU time.
 
Changed:
<
<
After finalisation of the application (that should print out the Processing FSR info), production jobs will read the FSR from the output file(s) in order to verify the correct termination of the application as well as to prepare the bookkeeping summary information. This will use a simple dedicated GaudiPython application for example.
>
>
  • Extreme example:
  • in a simplified tree model, each branch is a string of the branch name and a pointer to the keyed container object.
  • the contained object must have non-zero number of members to interface correctly with what ROOT expects from objects within its trees, some of that information can be stored as metadata of the branch, and some is stored in the leaves, once per data member,
  • The EventCountFSR is a very simple FSR which stores two long long ints.
  • If 8 of them are stored to a file as a tree with only 2 bifurcations at each branch, instead of storing 8*2 packed ints we are now storing
    • Size on file: 1+2+4+8 * (string + pointer + container + root-specific-rubbish ) + 8*2 packed ints
    • Size in memory: 1+2+4+8 * (string + pointer + container + root-specific-rubbish + bucket + cache) + 8*2 packed ints
  • The deeper the tree and the broader the tree, the exponentially more memory is required to store these objects.
 
Deleted:
<
<
When processing a file, the content of the FSR will allow to cross-check the number of events actually processed with the number of events on the input files. The processing FSR of the newly created file(s) will replace that of the original files. Additional counters may integrate the number of original events.
 
Changed:
<
<

User counters

>
>
It is silly to think about it, but adding one low-level FSR right now is more expensive on memory usage and file size than adding the same information to every event, where it will be compressed away. So FSRs should only be used for the MINIMAL possible use cases which are not addressable in any other way.
 
Deleted:
<
<
Very often processing applications are accumulating a number of counters that are used at finalisation time for printing summary tables (including specific computation of efficiencies with their errors etc...). If however the job is split (e.g. by ganga), the user might not be interested in the individual summary but in the full statistics summary. Parsing the logfiles in order to re-create a logfile "as if" the job had not been split is an impossible task. Saving these counters at finalisation time (in the data or in a separate output stream) would allow an easy merging of the jobs' results. The resulting set of counters can then be printed out by the same code that is used at finalisation time.
 
Changed:
<
<
For it to work, one should be able to run the same application that created the FSR without an event loop: i.e. all components holding counters should be instantiated, initialised from the FSR and finalised.
>
>

Merging!:

 
Changed:
<
<

General requirements

>
>
  • To overcome the memory usage problems, we must translate the deep, broad, sparse trees into narrow, shallow, dense, trees.
  • This is accomplished by merging FSRs together into a single top-level FSR.
  • In production this must be done whenever many files are concatenated into one file.
  • For user productions of large sparse file mergings, for example a sub-selection of 1/1000 events, and in particular with microDSTs, this should also be done.
  • Since the growth of the memory usage is exponential in terms of depth, even reducing the depth by one level can have a massive improving effect.
 
Changed:
<
<
In order to be flexible enough, it is desirable that as from the moment the information is in a POOL file, the counters are accessible by name rather than by a frozen structure. This is probably not desirable for MDF files and the decoding algorithm should be in synch with the HLT output format in order to produce named counters. Counters should be very simple objects implementing standard mathematical operators (+,-,*,/) and comparison operators.
>
>

Related talks:

 
Changed:
<
<
The FSR record should be separate from the event stream in order to be accessed independently. A mechanism should be put in place in order to read it from the input file just after opening it, adding the counters part to previous ones and writing it out at the end of processing. It should be automatically included in all output streams (maybe not for user counters?) and one should have the possibility to write it out separately (e.g. user counters).
>
>

Related Pages:

 
Changed:
<
<

Implementation proposals

The framework should offer the following:
  • A specialised DataObject that can be used to store and accumulate the counters
  • A DataService (and associated Transient Store) where algorithms can put/get/accumulate these counters during the job
  • An extension of the event persistency service that:
    • Saves the counters to the event data file just before the file is closed
    • Reads the counters from input data files when the files are opened, and puts them in the new Transient Store
  • New "file open" and "file close" incidents that can be used by algorithms to declare callbacks that would combine the counters from the input files with those being accumulated in the job. It is here that the intelligence of how to combine counters would be put.
>
>

 
Changed:
<
<
The persistency solution should:
  • Provide direct access to the counter records
  • Handle also the use case where the input file is an ETC.
>
>
-- RobLambert - 24 Mar 2014
 
Deleted:
<
<
-- PhilippeCharpentier - 25 Jun 2008
 \ No newline at end of file
Added:
>
>
META FILEATTACHMENT attachment="Slide1.PNG" attr="" comment="A root tree" date="1395676409" name="Slide1.PNG" path="Slide1.PNG" size="227073" user="rlambert" version="1"
META FILEATTACHMENT attachment="Slide2.PNG" attr="" comment="A root bush" date="1395676452" name="Slide2.PNG" path="Slide2.PNG" size="323129" user="rlambert" version="1"

Revision 32008-07-02 - MarcoCattaneo

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb File Summary Record

Line: 57 to 57
 The FSR record should be separate from the event stream in order to be accessed independently. A mechanism should be put in place in order to read it from the input file just after opening it, adding the counters part to previous ones and writing it out at the end of processing. It should be automatically included in all output streams (maybe not for user counters?) and one should have the possibility to write it out separately (e.g. user counters).

Implementation proposals

Added:
>
>
The framework should offer the following:
  • A specialised DataObject that can be used to store and accumulate the counters
  • A DataService (and associated Transient Store) where algorithms can put/get/accumulate these counters during the job
  • An extension of the event persistency service that:
    • Saves the counters to the event data file just before the file is closed
    • Reads the counters from input data files when the files are opened, and puts them in the new Transient Store
  • New "file open" and "file close" incidents that can be used by algorithms to declare callbacks that would combine the counters from the input files with those being accumulated in the job. It is here that the intelligence of how to combine counters would be put.

The persistency solution should:

  • Provide direct access to the counter records
  • Handle also the use case where the input file is an ETC.
  -- PhilippeCharpentier - 25 Jun 2008 \ No newline at end of file

Revision 22008-06-27 - MarcoCattaneo

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb File Summary Record

Line: 23 to 23
  When the RAW-MDF file is processed to produce an RDST, the counters of all events are summed up. The set of counters thus obtained will be written out in the output stream(s) of the application after the last event has been processed. This special record is called the "File Summary Record" (FSR). It is proposed not to be an event record, but be accessible separately at any time of the processing.
Changed:
<
<
For all subsequent processing (e.g. stripping, fiel merging, user analysis etc...) the FSRs of all input files of an application are summed up in order to create the FSR of the output file of the application. The job can print out the FSR at finalisation time as well as including it automatically in the output stream(s).
>
>
For all subsequent processing (e.g. stripping, file merging, user analysis etc...) the FSRs of all input files of an application are summed up in order to create the FSR of the output file of the application. The job can print out the FSR at finalisation time as well as including it automatically in the output stream(s).
  • The implementation must handle all possible use cases, including reconstruction of stripped events starting from an SETC pointing to RAW-MDF files.
 

Processing summary counters

Revision 12008-06-25 - PhilippeCharpentier

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="LHCbComputing"

LHCb File Summary Record

This page is gathering requirements for a "File Summary Record" as emerged from several discussion during the June '08 Software week. The purpose is to trigger a brainstorming leading to an implementation proposal to be described as well on this page.

Use cases

Luminosity recording

Goal

During the discussion on Thursday 19th June (see slides here), it appeared that a number of counters could be provided by the HLT, in order to count the occurrence of certain types of events. These counters will later be used by Gaudi in order to provide a relative estimate of the luminosity. An absolute calibration would allow to compute the absolute luminosity corresponding to the analysed dataset.

Counters recording

Rather than recording this information in interval of time ("cycles") and later apply corrections depending on the fraction of events of a given cycle in the analysed dataset, it was proposed that the information could be stored in the data itself. As this information is computed by the HLT farm nodes, it was proposed that for each HLT-yes event the differential counters (from the preceding HLT-yes event) can be added as a separate bank.

Multiple parallel streams of MDF files can be written in parallel and events will not appear in chronological order on the files. Nevertheless when integrating on a full dataset, the sum of all counters represent statistically the total number of events of the given type encountered in the HLT for this dataset. As MDF writing is a simple streaming of banks, it was proposed not to integrate the counters at this stage.

Counters integration

When the RAW-MDF file is processed to produce an RDST, the counters of all events are summed up. The set of counters thus obtained will be written out in the output stream(s) of the application after the last event has been processed. This special record is called the "File Summary Record" (FSR). It is proposed not to be an event record, but be accessible separately at any time of the processing.

For all subsequent processing (e.g. stripping, fiel merging, user analysis etc...) the FSRs of all input files of an application are summed up in order to create the FSR of the output file of the application. The job can print out the FSR at finalisation time as well as including it automatically in the output stream(s).

Processing summary counters

Goal

Currently in production jobs, the logfile of the application is analysed (by a set of "grep"s) in order to extract summary information. An FSR would allow to get automatically all these counters and additional information without relying on the format of the printout. In addition, reading it from the output file would verify that the file has been properly closed and is readable.

Summary information

The processing part of the FSR would contain a set of counters (required events, processed events, output events, skipped events...), as well as provenance information (list of GUIDs successfully processed, application name and version, options file name, specific additional options if needed...). Note that some counters are global, but a few might be stream-dependent (number of events written out); they could be added to the summary by each output stream writer at finalisation time.

A (possibly complex) requirement would be that (part of ) the FSR be written out after each event in order to be able to easily retrieve information even if the application crashed (overwriting the previous information).

Processing FSR usage

After finalisation of the application (that should print out the Processing FSR info), production jobs will read the FSR from the output file(s) in order to verify the correct termination of the application as well as to prepare the bookkeeping summary information. This will use a simple dedicated GaudiPython application for example.

When processing a file, the content of the FSR will allow to cross-check the number of events actually processed with the number of events on the input files. The processing FSR of the newly created file(s) will replace that of the original files. Additional counters may integrate the number of original events.

User counters

Very often processing applications are accumulating a number of counters that are used at finalisation time for printing summary tables (including specific computation of efficiencies with their errors etc...). If however the job is split (e.g. by ganga), the user might not be interested in the individual summary but in the full statistics summary. Parsing the logfiles in order to re-create a logfile "as if" the job had not been split is an impossible task. Saving these counters at finalisation time (in the data or in a separate output stream) would allow an easy merging of the jobs' results. The resulting set of counters can then be printed out by the same code that is used at finalisation time.

For it to work, one should be able to run the same application that created the FSR without an event loop: i.e. all components holding counters should be instantiated, initialised from the FSR and finalised.

General requirements

In order to be flexible enough, it is desirable that as from the moment the information is in a POOL file, the counters are accessible by name rather than by a frozen structure. This is probably not desirable for MDF files and the decoding algorithm should be in synch with the HLT output format in order to produce named counters. Counters should be very simple objects implementing standard mathematical operators (+,-,*,/) and comparison operators.

The FSR record should be separate from the event stream in order to be accessed independently. A mechanism should be put in place in order to read it from the input file just after opening it, adding the counters part to previous ones and writing it out at the end of processing. It should be automatically included in all output streams (maybe not for user counters?) and one should have the possibility to write it out separately (e.g. user counters).

Implementation proposals

-- PhilippeCharpentier - 25 Jun 2008

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback