Difference: CondDBBrainstormingMeeting (1 vs. 2)

Revision 22007-05-08 - MarcoClemencic

Line: 1 to 1
 
META TOPICPARENT name="Main.MarcoClemencic"

Conditions Database Brainstorming Meeting (7/5/2007) Work in progress, under construction

Added:
>
>
Participants: Olivier Callot, Marco Cattaneo, Philippe Charpentier, Marco Clemencic, Joel Closier, Gloria Corti, Clara Gaspar, Patrick Koppenburg, Thomas Ruf
 

Discussed Topics

Added:
>
>

Partitioning

The CondDB software framework allows to combine several COOL databases (partitions) to form a unique picture which is then seen via the entry point in CondDBCnvSvc. The COOL databases can be combined essentially in two ways
partitioning
one or more sub-trees of the global picture can be found in different databases while all the rest is found in a base database. It is similar to what happens on a unix filesystem where the root partition (mounted on '/') correspond to that base database and other partitions (mounted on directories like '/usr' or '/home') correspond to the other databases.
layering
two or more COOL databases can be piled up and the access to a lower layer occurs only if the information is not found in one of the layers above.
The result of the two types of combinations can be then combined to other COOL databases as if it was a plain COOL database.

The standard LHCb Conditions Database will consist of 3 COOL databases combined using the "partitioning" feature:

DDDB
base partition (providing the main entry point). The content is essentially static and corresponds roughly to the former XmlDDDB, with the informations that are common to both reconstruction and simulation.
LHCBCOND
partition accessed via the directory "/Conditions". It contains multi-version conditions like alignments and all the conditions that are produced "off-line".
ONLINE
partition accessed via the directory "/Conditions/Online". It is the COOL database which is populated in the pit with only single-version conditions (like reading from probes).

Simulation

The simulation will need the detector description, values for alignments and run parameters like magnet polarity. This informations (except for the ideal geometry and descriptions of materials, etc.) do not need to be the same as those found in the database used for real events, and, in many cases, they have to be different.

To disentangle the conditions that are (or may be) different between real data and simulated events from those objects that are in common, we keep the common XML (geometry, materials, etc.) in the COOL database DDDB, while the structure of LHCBCOND and ONLINE will be copied into a COOL database dedicated to simulation: SIMCOND. The database for conditions for simulation will contain multi-version conditions also for the part emulating ONLINE.

To control which conditions are used for the simulated events, we will use mainly the event time (chosen to be close to the event time of real data to avoid possible conflicts with the content of DDDB). If needed we can change the tag to be used too.

Test Beam (and other special cases)

The geometry and structure description for test-beam reconstruction will be stored on different private databases that will replace DDDB (and/or the other databases) in the user job options. The private databases can be stored on any back-end (Oracle, MySQL, SQLite) and will not be replicated to Tier-1s. If grid-based analysis is needed, the databases could be distributed in the form of SQLite file as it happens with the package SQLDDDB v2rx (used by Brunel v31r3).

Deployment/Replication

We need a copy of the databases used for reconstruction at Tier-0, the pit and Tier-1s. The replication will be done through Oracle Streams set up by the LCG 3D group.

DDDB and LHCBCOND will be accessible in read-write mode only from CERN and then replicated to read-only copies to the pit and to Tier-1s. ONLINE will be writable only at the pit, where will be populated automatically by PVSS, and will be replicated to CERN, from where a replication task will propagate the updates to the Tier-1s.

SIMCOND will be replicated in the same way as DDDB and LHCBCOND to be accessible for user analysis. Simulation jobs, running at Tier-2s, will use a SQLite-based snapshot of SIMCOND.

 

Tag Naming Convention

The name for a tag is important to allow the user as well as the maintainer to know what the information associated to the tag was used for.
Line: 13 to 48
 
  • Reprocessing-2009-2

We can also use something like "Online-2008-1" to identify the tag that has to be used in the on-line farm during data-taking and for the quasi-online first reconstruction.

Added:
>
>

Tag Usage Cycle

Real Data

The tag to be used in the online farm has to be defined by somebody (librarian? manager?) and communicated to the online system and to the production.

In the pit, a snapshot of the database corresponding to the chosen tag will be created to be used for the next run (requires a new configuration of the system). The HLT processes do not need to be notified of the tag name used because the snapshot they will use consists of only one version (the one defined by the used tag).

Reconstruction job will use the same tag used by the Event Filter Farm.

Simulation

For simulation both event time and tag have to be chosen and used to configure the jobs. The procedure for production (still to be implemented) should include the creation and deployment of the database snapshot.

Accounting and Book-keeping

It is very important to record the version/tag of the databases that was used to reconstruct/produce events. This information will be recorded in both process headers and bookkeeping database.

To fill the process headers, a tool will be in place that will discover all the pairs (partition,tag) used by the process. The information will be stored as vector of pairs of strings.

In the bookkeeping database there will be place only for one tag.
For real data, it will be the LHCBCOND tag. This tag should be enough to define the tag for DDDB, since the latter is rarely changing and a change in it will certainly mean a new tag in LHCBCOND too.
For simulation we can use the SIMCOND tag (same argument as for real data applies). The name and version of the configuration (defining simulated event time, etc.) needs to be stored too.

The Special Case of VELO Halves Alignment

The alignment of VELO halves will consist of two parts:
  • reading of motors positions (single version condition to be stored in ONLINE)
  • fine alignment (multi version condition to be stored in LHCBCOND)
For all other detector elements, there will be only one AlignmentCondition, but for VeloLeft and VeloRight it is not possible because it would mean that, depending on the context, we have to get the alignment conditions from ONLINE or from LHCBCOND. The solution would be to use the combination of the two alignment conditions, where the fine alignment (in LHCBCOND) is defined as the difference between the actual position (measured off-line) and the one obtained applying the motors positions (in ONLINE) to the nominal position (in DDDB).

Deployment of Conditions Database and Synchronization with Reconstruction

During data-taking the ONLINE database is populated in real-time and replicated quasi instantaneously to Tier-0 and then to Tier-1s (it has still to be defined if every 30 min or in real-time). We have to be sure that the reconstruction is using the version of the DB that we want.

For DDDB and LHCBCOND, we can use LCG software tags matching the conditions database tags, which allow us to control to which site the jobs are actually going to be processed.

We do not have tags in ONLINE, so we cannot use them control when the replicated copy is recent enough to be used to process a given data file.
To check if the database is recent enough, we can insert a special condition (marker) periodically during a running period and once at the end of each run. The latest interval of validity for that condition will tell when the replica was last updated. Since we need to know only if the replica has been updated with conditions more recent than the event we are processing, we can check if the current valid marker condition has an end of validity that is not infinite (max value). In case the check fails, the job will fail. (It is possible that we trigger the exit of the job only if the check fails on the first event, but wait for some time if the check fails after few events; unfortunately it is technically very difficult)

  -- MarcoClemencic - 08 May 2007 \ No newline at end of file

Revision 12007-05-08 - MarcoClemencic

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="Main.MarcoClemencic"

Conditions Database Brainstorming Meeting (7/5/2007) Work in progress, under construction

Discussed Topics

Tag Naming Convention

The name for a tag is important to allow the user as well as the maintainer to know what the information associated to the tag was used for.

Since the main reason for a tag is to use a new version of the database for a "processing" or "re-processing", we should put that string in the tag name. It is also useful to distinguish the tag used for different years, even if the tag applies in year X will contain the most recent version of the information used in a previous year Y.

The tag names will look like:

  • Processing-2008-1
  • Reprocessing-2009-2

We can also use something like "Online-2008-1" to identify the tag that has to be used in the on-line farm during data-taking and for the quasi-online first reconstruction.

-- MarcoClemencic - 08 May 2007

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback