Online To Offline (O2O) HOWTO (THIS PAGE IS OBSOLETE)

Complete: 3

Introduction

This page describes the steps required by the subdetector experts to create an object for use in the Online-to-Offline (O2O) transform in CMS. For the actual execution of O2O by the O2O administrator, read this page.

In CMS, online data (that is, data written by processes using the running detector) is stored in subdetector-specific schemas on the online database called OMDS (Online Master Data Storage). Offline calibration and alignment data are stored in databases using POOL-ORA, an object-relational mapping engine that persistently stores C++ objects in a relational database such as Oracle, CMS.MySQL, or sqlite. Because the format of the online and offline data is different it neccessitates the O2O transform process for this kind of data.

The O2O transform procedure is not required for calibration and alignment data that are written directly in the offline databases, such as ORCON for HLT operations and ORCOFF for reconstruction.

CMS will initially use Oracle for both its online and offline databases so these examples will focus on this database technology.

The methods described here are relevant for CMSSW_0_5_0 and later.

Creating an Offline Object

Offline objects are C++ objects that are part of the CMSSW CondFormats subsystem. The object defines the structure of the data, and contains no information about how it should be accessed. There are two kinds of information that identify a data object once it is in the offline database: the IOV and the Metadata. IOV is the "Interval of Validity" and is the time range for which the data object is valid. This time is given either in run-numbers or timestamps. Metadata is a string used to identify the IOV sequence of the data. The format of this string is up to the subdetector group. Because these are the only two ways that the offline object is identified the object must contain the data for the entire subdetector. There is no way to query within the object itself, such as selecting only a subset of channels. (But you are free to create objects that only hold a needed subset of channels)/ The channel selection is an in-memory C++ operation.

Given the previous considerations, there is quite a bit of flexibility in how you design the offline object. But a more complex data structure within the object will result in a more complex mapping to the table structures in the relational database, which may impact performance. For this reason it is advisable to keep the object structure as simple as possible.

For this example we will use the EcalPedestals. In C++, the object looks like this:

!EcalPedestals.h
class EcalPedestals {
 public:
  EcalPedestals();
  ~EcalPedestals();
  struct Item {
    float mean_x12;
    float rms_x12;
    float mean_x6;
    float rms_x6;
    float mean_x1;
    float rms_x1;
  };
  std::map<uint32_t, Item> m_pedestals;
};

This object defines the struct Item which stores mean and RMS of the pedestal at 3 gains, and an std::map of channel number (EBDetID::rawid()) to these Items. For the ECAL barrel we expect this map to contain 61200 entries.

A dictionary library must be created that contains this object following the standard procedure. In CMSSW this object is in the CondFormats/EcalObjects package by convention the dictonary containing the object will be called CondFormatsEcalObjects.

Generating Mapping XML

POOL-ORA can use an XML file to define the mapping of the C++ data structure to a relational database structure. Data primitives (int, float, double) must be mapped to table columns of a compatable type. POOL-ORA handles the type conversions for you, and can generate a mapping XML with some default table and column names.

However, because these default table and column names are machine-generated they are often not human-friendly. Because O2O will involve hand-writing some SQL it may be convenient to customize the default mapping XML a bit to have some friendly names.

Another reason we need to create a custom mapping XML is because the CMS tool used to attach IOV and Metadata information to our stored objects assumes two things about the tables used to store the object:

  1. The top-level table uses a column called IOV_VALUE_ID as its primary key
  2. The top-level table has a column called TIME used to define its IOV. The TIME column has the meaning of Data Taking Time.

Item 1 will be handled here by making a custom mapping XML, item 2 will be dealt with later. We can make a custom mapping XML in a few steps:

1. Generate the default mapping XML

The CMS tool create_default_mapping found in CondTools/Utilities/bin can do this. It uses two arguments: the name of the object and the name of the dictionary.

create_default_mapping EcalPedestals CondFormatsEcalObjects

This generates an XML file called EcalPedestals-mapping-cmsdefault.xml.

!EcalPedestals-mapping-cmsdefault.xml
<?xml version='1.0' encoding="UTF-8"?>
<!DOCTYPE Mapping SYSTEM "InMemory">
<Mapping version="EcalPedestals-1.0" >
  <Class table="ECALPEDESTALS" id_columns="ID" name="EcalPedestals" >
    <Container table="ECALPEDESTALS_M_PE_1" id_columns="ID_ID" name="m_pedestals" position_column="POS" >
      <Object table="ECALPEDESTALS_M_PE_1" id_columns="POS ID_ID" name="EcalPedestals::Item" >
        <Primitive column="M_PEDESTALS_ECALPEDESTALS_IT_1" name="mean_x1" />
        <Primitive column="M_PEDESTALS_ECALPEDESTALS_IT_2" name="mean_x12" />
        <Primitive column="M_PEDESTALS_ECALPEDESTALS_IT_3" name="mean_x6" />
        <Primitive column="M_PEDESTALS_ECALPEDESTALS_IT_4" name="rms_x1" />
        <Primitive column="M_PEDESTALS_ECALPEDESTALS_IT_5" name="rms_x12" />
        <Primitive column="M_PEDESTALS_ECALPEDESTALS_IT_6" name="rms_x6" />
      </Object >
      <Primitive column="M_PEDESTALS_UNSIGNED_INT" name="unsigned int" />
    </Container >
  </Class >
</Mapping >

The analysis of this mapping XML is as follows:

Lines Meaning
1 Standard XML header
2 DTD (type of XML this is)
3 The mapping version. Important for schema evolution.
4 Class definition. The top-level table is defined here as ECALPEDESTALS (table=). The primary key to this table is a column called "ID" (id_columns=). The object it refers to is called EcalPedestals (name=).
5 Another table is defined here called ECALPEDESTALS_M_PE_1 with primary key ID_ID. This is for the std::map which we called m_pedestals (name=). Storing the std::map also requires an ordering column here called POS (pos_column=).
6 The rows of the previous table contain objects (struct) of type EcalPedestals::Item (name=). An instance of this struct is identified in the table by "ID_ID" and "POS" in the table ECALPEDESTALS_M_PE_1
7-13 The mapping of the EcalPedestal::Item's C++ primitives (floats) to columns in the table ECALPEDESTALS_M_PE_1.
15 The rows of the previous table also contain a primitive whose name was (unfortunately) generated to be "unsigned int". This is the channel identifier in our std::map. It will be stored in a colum called "UNSIGNED_INT".
16+ Closing tags.

2. Edit and save the customized mapping XML

There are three goals when creating our custom mapping XML.

  1. Make sure the top-level table uses IOV_VALUE_ID as its primary key.
  2. Change all the table and column names to human-friendly values
  3. We are creating a custom version, so we should note it in the version attribute and prepare for possible schema evolution.

Goal 1 implies we change the element's id_columns attribute to IOV_VALUE_ID. We should also change the id_columns of the other tables to match this, meaning ID_ID also becomes IOV_VALUE_ID.

Goal 2 implies we rename all table and column attributes to something of our liking.

Goal 3 means we should consider the fact that one day this object might change. We should include a number in the version that can be incremented if this happens in the future.

The following is the result of editing EcalPedestals-mapping-cmsdefault.xml:

!EcalPedestals-mapping-custom_1.0.xml
<?xml version='1.0' encoding="UTF-8"?>
<!DOCTYPE Mapping SYSTEM "InMemory">
<Mapping version="custom_1.0" >
  <Class table="ECALPEDESTALS" id_columns="IOV_VALUE_ID" name="EcalPedestals" >
    <Container table="ECALPEDESTALS_ITEM" id_columns="IOV_VALUE_ID" name="m_pedestals" position_column="POS" >
      <Object table="ECALPEDESTALS_ITEM" id_columns="POS IOV_VALUE_ID" name="EcalPedestals::Item" >
        <Primitive column="MEAN_X1" name="mean_x1" />
        <Primitive column="MEAN_X12" name="mean_x12" />
        <Primitive column="MEAN_X6" name="mean_x6" />
        <Primitive column="RMS_X1" name="rms_x1" />
        <Primitive column="RMS_X12" name="rms_x12" />
        <Primitive column="RMS_X6" name="rms_x6" />
      </Object >
      <Primitive column="DET_ID" name="unsigned int" />
    </Container >
  </Class >
</Mapping >

This file has been saved with the name EcalPedestals-mapping-custom_1.0.xml and committed to CMSSW CVS in CondTools/Ecal/src. The file name is following a convention ObjectName -mapping- Version .xml that is used in later programs.

Offline DB Tables

Your offline tables can be created by the person responsible for the offline DB once you can provide them the mapping XML.

Filling Offline Tables (SQL)

Two tables are created from the EcalPedestals-mapping-custom_1.0.xml file. They have the following structure:

ECALPEDESTALS
Column Name ORACLE Type Description
IOV_VALUE_ID NUMBER(10) Unique identifier for an EcalPedestals object
TIME NUMBER(10) Run number at which the data is no longer valid

ECALPEDESTALS_ITEM
Column Name ORACLE Type Description
IOV_VALUE_ID NUMBER(10) Unique identifier for an EcalPedestals object
POS NUMBER(10) Index in the std::map
DET_ID NUMBER(10) Channel identifier for ECAL
MEAN_X1 BINARY_FLOAT Pedestal mean at gain 1
RMS_X1 BINARY_FLOAT Pedestal RMS at gain 1
MEAN_X6 BINARY_FLOAT Pedestal mean at gain 6
RMS_X6 BINARY_FLOAT Pedestal RMS at gain 6
MEAN_X12 BINARY_FLOAT Pedestal mean at gain 12
RMS_X12 BINARY_FLOAT Pedestal RMS at gain 12

The filling of the offline tables is to be done by means of a PL/SQL procedure in Oracle. You must provide the O2O administrator SQL queries on your online tables whose result is this offline object tables.

Because some of the columns required for offline are artificial in the context of the online schema, e.g. the POS column, you may have to make use of some more advanced SQL to generate the values correctly. The following SQL queries on the ECAL online schema result in the above offline tables:

/* query to fill ECALPEDESTALS table */

  SELECT 
    miov.iov_id iov_value_id, 
    riov.run_num time
  FROM 
    location_def ldef, run_type_def rdef, run_tag rtag, run_iov riov,
    /* inline view selects the mon_run_iov with the greatest subrun_num */
    (SELECT iov_id, run_iov_id, 
       MAX(subrun_num) KEEP (DENSE_RANK FIRST ORDER BY subrun_num ASC)
       FROM mon_run_iov GROUP BY iov_id, run_iov_id) miov
  WHERE
      miov.run_iov_id = riov.iov_id
  AND riov.tag_id = rtag.tag_id
  AND rdef.def_id = rtag.run_type_id
  AND ldef.def_id = rtag.location_id
  AND ldef.location='H4'
  AND rdef.run_type='PEDESTAL'
  AND rdef.config_tag='PEDESTAL-STD'
  AND rdef.config_ver=1
;

/* query to fill ECALPEDESTALS_ITEM table */

  SELECT 
    dat.iov_id iov_value_id,
    /* Creates a column of row numbers ordered by logic_id, for std::vec */
    ROW_NUMBER() OVER (PARTITION BY dat.iov_id ORDER BY dat.logic_id ASC) pos,
    cv.id1 det_id,
    dat.ped_mean_g12 mean_x12,
    dat.ped_rms_g12 rms_x12,
    dat.ped_mean_g6 mean_x6,
    dat.ped_rms_g6 rms_x6,
    dat.ped_mean_g1 mean_x1,
    dat.ped_rms_g1 rms_x1
  FROM 
    location_def ldef, run_type_def rdef, run_tag rtag, run_iov riov,
    /* inline view selects the mon_run_iov with the greatest subrun_num */
    (SELECT iov_id, run_iov_id, 
       MAX(subrun_num) KEEP (DENSE_RANK FIRST ORDER BY subrun_num ASC)
       FROM mon_run_iov GROUP BY iov_id, run_iov_id) miov,
    mon_pedestals_dat dat, channelview cv
  WHERE dat.iov_id = miov.iov_id
    AND cv.logic_id = dat.logic_id
    AND cv.name='Offline_det_id'
    AND miov.run_iov_id = riov.iov_id
    AND riov.tag_id = rtag.tag_id
    AND rdef.def_id = rtag.run_type_id
    AND ldef.def_id = rtag.location_id
    AND ldef.location='H4'
    AND rdef.run_type='PEDESTAL'
    AND rdef.config_tag='PEDESTAL-STD'
    AND rdef.config_ver=1
;

A detailed explanation of these queries would go deeply into the structure of the ECAL online DB schema, which will not be done. Please note the following things about these queries:

  1. There are two of them, one for each table.
  2. The data which appears in the SELECT clause is aliased to have the same names as which is required in the offline tables (e.g. IOV_VALUE_ID, TIME)
  3. IOV_VALUE_ID is taken from another unique identifier in the ECAL online schema, iov_id. If you do not have such a unique identifier for your objects already you will have to generate one.
  4. The POS column must be numbers from 1..N where N is the number of channels in the data set. This is generated in the query with the ROW_NUMBER() OVER (PARTITION BY dat.iov_id ORDER BY dat.logic_id ASC) line. This generates numbers 1..N for every dat.iov_id in the result, which is exactly what we need.

The CMS O2O administrator will support you in creating these queries and will create the PL/SQL procedures from them.

Review Status

Reviewer/Editor and date Comments
Main.egeland - 11 Aug 2006 page author
Main.tomalini - 09 Oct 2006 page content last edited
JennyWilliams - 03 Apr 2007 moved page into swguide

Responsible: FrancescaCavallari
Last reviewed by: Reviewer

Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r16 - 2010-03-19 - unknown



 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback