TWiki> LHCb Web>BKDev>ProcessinPass (revision 8)EditAttachPDF

A new criterion to make queries on files: the Processing Pass

This is the set of application version and condition DB tags which have been used to produce a file and its ancestors.

The processing pass is common for all the files of the same production.

By consequence, a processing pass corresponds to a set of productions. A complementary table has been introduced in the new schema, with the information relative to each production, its processing pass and, if it is the case, the processing pass of the input productions.

Computation of the Processing Pass during the migration

The computation of the Processing Pass for already existing data has been done on the basis of the information available in the BK.

  • Make a first loop on the DC06 productions (about 600 prods.). For each of them compute the Pass Index. This is defined as the set of application name and version and cond db tag used in the production (and only in that production, regardless of the input data!).
In the current production system, many different workflow can be followed to produce files. See the picture Production workflow diagram for a diagram of the possible workflows and a classification of the production types in there.

  • On the basis of the CONFIGVERSION and of the event type, set the Processing Pass Group, following the indications provided by Philippe (there is no way to get this information from the BK database or from any other source).
  • Store the Pass Index in the Pass_Index table. More in detail, the content of the Pass_Index table is:


the content of the Pass_index table is shown in the attached file pass_index.html below. As the table shows, many different Pass Indeces belong to the same group. This means that they can be mixed at processing stage.

  • Add a new line in the Prod2Pass table with: production number and PassId. This table is simply a dictionary to associate to each production its Pass Index.
Once finished the loop, the Pass_index and the prod2pass tables have been populated. In these tables we have stored the information about the production itself, but not about the input data, so the story of the files is not complete. What we need now is a further table to store the Processing Pass of the production and also of the input production: we will call it the Total Processing Pass and it will be stored in the Processing_Pass table.
  • Make a second loop on the DC06 productions. For each production go to the Prod2Pass table and pick up the corresponding PassId, then go to the Pass_Index table and retrieve the information about the Pass Index and the Group. If the production has any input data, repeat the same for the input production. Compute the Total Processing Pass as a concatanation of the Processing Pass (this is the Group Description field) of the input production and the one of the current production. Of course, if there is no input, the Total Processing Pass is the same as the Group Description. Store all this info in the Processing_Pass table. See the attached file processing_pass.html which displays the content of this table. In the meeting of 2/07/08 we said that it would be useful to add the event type and the simulation conditions. This has to be done (by elisa)
  • Finally, all the available Processing Passes are obtained making a query like: select distinct processing_pass.totalprocpass from processing_pass; see the result in the attached file total_processing_pass.html

Meeting on 2nd July 2008

Present: Philippe, Joel, Zoltan and Elisa

It was discussed how to compute the Processing Pass on the fly when new data are registered in the BK.

During the migration of the data to the new schema, the processing pass has been computed on the basis of the information contained in the BK. But in the future, for simulated and for real data, the processing pass can be computed at the same moment when a new production is started.

What is the information needed to compute the processing pass?

The information necessary to compute the processing pass is:

  • The application name and version of the programs used in a given production, for all the steps (ex. simulation, digitization, reconstruction etc.) and the condition DB tag.
  • The Processing Pass Group of the input productions.

Who can provide this info?

The production manager when he sets the workflow for a new production.

How can this info be sent to the BK manager?

The BK will provide an interface (Zoltan proposes a method of the BK client) to be used by the production manager to send this information to the BK. When starting a new production, the production manager knows the versions of the applications to be used in the production and he knows the Processing Pass Group of the input data. ( Actually, the Processing Pass Group should be used by the production manager to select the input data. See below for more details on this point). Before starting the production, he will send the information to the BK manager. The BK Manager will do the following:

  • Read the application name and version and condition DB tag and compare with the content of the Pass_Index table. If the row already exists, it will pick up the corresponding PassId and associate it to the production. If the row doesn't exist, it will add a new row and generate a new PassId. Then it will insert a new row in the Prod2Pass table, containing the production number and the PassId.
  • If the production has some input data, it will go to the Processing_Pass table and look for the Processing Pass of the input data.
  • Finally, it will add a new row in the Processing_Pass table with: the production number (this is the unique key of the table), the PassId (which links to the Pass_Index table to expand the information about program name and versions), the GroupId (this is the group the PassId belongs to), the Total Processing Pass (this is the concatanation of the processing pass of the input production and the processing pass of the current production), the event type and the DAQId.

In addition to the information necessary to compute the Processing Pass, we decided to add also the Event Type and the DAQId (i.e. the identifier of the Simulation Conditions (for MC) or of the Data Taking conditions (real rata)). This is not necessary for the computation of the processing pass, nevertheless this information is available at this point of the workflow, and it was decided to store also this information in the Processing_Pass table for sake of completeness. In this way, each row of the Processing_Pass table provides a full characterization of a production: the processing pass index (PassId), the Processing Pass Group, the simulation conditions (or data taking conditions) , the event type.

How can the production manager select the input datasets for a new production

It has been proposed to use the BK to select the input datasets for a new production. Currently, the selection of the input data is based on a regular expression. This method is not safe, because it relies on the convention used to name files, which could change in some moment.

The new proposal is that the production manager queries the BK to obtain the input datasets. The BK should provide an interface to make the query. This could be a method of the BK client.
The parameters to be passed in this query are:

  • the Processing Pass Group
  • the DAQId
  • the Event Type
  • the File Type

Meeting on 22nd September 2008

Participants: Philippe, Joel, Zoltan and Elisa

How the processing pass is computed online and then inserted into the BK database

First of all, a description of the tables used to implement the processing pass: the diagram of the schema is displayed in the attached picture ppass_tables.png, at the bottom of this page.

the pass_index table : each row consists of a unique pass ID and a set of application names and versions (see picture pass_index.html attached). Each of the passes listed in the pass_index table belongs to a pass group (the association is defined by the groupid key, as shown in the diagram). Passes of the same pass group are by definition compatible, even if the application version is different.

In the pass_group table each row consists of a group ID and a description, which is a user friendly name reminding the type of data processing (ex. stripping v31. or sim_reco_v31 etc...). See picture PASS_GROUP.html attached to this page.

Finally, the processing_pass table has a row for each production, where we store the corresponding passid (which connects to the pass_index table) and the total processing pass. The total processing pass is a string composed by the processing pass of the input files + the processing pass of the current production (see the attached file below: processin_pass.html). This is the relevant information for users to make queries.

In the meeting it was decided to add a column to the processing_pass table, with the simulation conditions ID. It was also decided to modify the definition of 'total processing pass' , adding all the pass descriptions of the input files, from the simulation. Currently only the previous step is included, so for example the total processing pass of a stripping production is 'DC06-Reco_v30 + DC06-Stripping_v31' while it should be: 'DC06-Sim + DC06-Reco_v30 + DC06-Stripping_v31'.

Moreover, the total processing pass should not be given as a string (ex. DC06-Reco_v30 + DC06-Stripping_v31) but as a sequence of the group IDs separated by some separator: ex. '3<5'.

How it works in practice:

when a new production is set the BK interface does the following operations:

  • Looks into the pass_index table to see if that pass is already present. If yes: it picks the ID of the pass and also the ID of the pass group. If not, it inserts a new row generating a new ID.
  • HERE I HAVE A QUESTION: The group ID associated to this new pass can be known a priori, or not? I think no. If it is not known a priori, it has to be set as unknown. Afterward, it will be added manually.
  • Inserts a new entry in the PROCESSING_PASS table. Here we need: the production number, the simulation conditions (this is known, as it is in the settings of the production), the pass index ID (this is known from the previous operation) and the total processing pass. Here comes the tricky part..
    The total processing pass is by definition the processing pass of the input files + the processing pass of the current production. The processing pass of the input file is known (it is also in the settings of the production), whereas the processing pass of the current production has to be computed. How? We have 2 possibilities:
    1- easy case: the pass index was already in the PASS_INDEX table. So we already have the group ID and we can immediately build the total processing pass as the group ID of the input productions + the group ID of the current production.
    2- the not so easy case: the pass index was NOT in the PASS_INDEX table. In this case, we do NOT know the group ID of the current production, as it was set as unknown. So, the total processing pass will be, for ex., '3 < X', if 3 is the group ID of the input. Later, the group ID of this new pass index will be added manually in the PASS_INDEX table, and a trigger will take care of updating the column 'total processing pass' of the PROCESSING_PASS table. This is feasible because the PROCESSING_PASS table is connected to the PASS_INDEX table with the pass ID key, so any modification of an entry relative to a given pass ID in the PASS_INDEX table can trigger an operation in the PROCESSING_PASS table.

Some remarks

  • Who should update the PASS_INDEX table setting the group ID? I guess someone who has knowledge about the compatibility of application versions (production manager?) Very important to establish who has to do that!

  • When a new production is started, the BK interface immediately inserts the new entry relative to this production into the PROCESSING_PASS table. Then, when the jobs will be registered to the BK, in the xml files we will only provide the information about the production, whereas all the information about processing pass and simulation condition will be redundant.

  • The attributes 'simulation conditions ID' will be removed from the jobs table. In this way we can register a job or file in the BK even if the corresponding simulation conditions ID is still not in the Simulation_conditions table. This will avoid errors at registration time.
The simulation conditions ID, as the processing pass, will only be stored in the PROCESSING_PASS table.

Instructions for the Production Manager (PM)

Here we summarize the actions the PM should follow each time he starts a new production.

  • Define the work flow. This is the set of applications to be used in the new production.
  • Define the pass_index : this is the work flow + the versions of the applications + the pass group ID and insert this new row into the pass_index table of the BK, if it is not there.
  • Define the production. This requires to have a work flow defined and to make a query to the BK in order to get a set of input files. The set of input files depends on the following parameters: 1-Configuration 2-event type, 3-simulation conditions, 4- processing pass, 5- file type.
  • Enter the production in the BK. This is automatically done by the BK interface which is used to define the production. So it is part of the previous step.


To start a new stripping production, a set of input files is needed. Currently these files are retrieved in this way: the PM select a configuration and an event type and on the basis of these parameters he sets a regular expression which matches the names of these files in LFC. The disadvantage of this method is that capturing files with a regular expression is error prone (based on a naming convention, that could change..) and moreover it can happen that a file is in the LFC but it is not stored in the BK. In this case, when the job is sent to be registered in the BK, an error occurs because its input file is not present in the BK! this is an inconsistency which will be avoided if we ask for the list of input files directly in the BK.

According to what proposed in the meeting, the list of input files will be obtained with a query to the BK. The query requires the following parameters (see also above in the item relative to the instructions to the PM): 1-Configuration 2-event type, 3-simulation conditions, 4- processing pass, 5- file type.

The result of the query, together with the work flow previously defined by the PM, will set the new production.

Instructions for the Online people

As soon as they start a new run they should send the corresponding XML file with the following information:

1- Program name and version. The BK will read the XML, check if an entry already exists in the pass_index table with this program name and version. If yes, ok, just picks the pass index ID (it will need it to compute the processing pass, see point 3 below). If not, it adds a row in the pass_index table with this program name and version and with an undetermined group ID. It also will send an alarm to notify that a new row has been added to the pass_index table and that the group ID has to be filled (and possibly the pass description field). This operation can be done a posteriori and has to be done manually. Here there are 2 possibilities: 1-the new pass index belongs to a pass group which already exists in the pass_group table. Then the corresponding group ID has to be inserted into the pass_index table. 2- this pass index is not compatible with any of the groups in the pass_group table. In this case a new row has to be added in the pass_group table and a user friendly name has to be associated. An oracle sequence will generate a unique ID for the pass group. Then, this pass group ID will be inserted in the pass_index table.

2- DAQConditionId a set of strings defining the daq conditions. Also in this case there are 2 possibilities: 1-an entry already exists in the Data Taking Conditions table. 2- it doesn't exist: in this case a new entry has to be added and an alarm will be sent to notify that a user friendly description has to be associated to this new data taking period entry.

3- Processing Pass they should not provide any number for the production ID. This will be computed automatically by the BK. They should provide a pass index ID (see point 1) and the data taking conditions (see point 2). At this point the BK can check if there is any entry in the Processing Pass table. If yes, then the production number associated to that entry is taken as the production ID for that run and it is entered into the jobs table of the BK database. If not, it adds a new row of the Processing Pass table, generating a new production ID with an oracle sequence. This number will be used as production ID for the run. In the case of raw data, the total processing pass is just the processing pass of the current production (no input here!). This will be computed by the BK interface in the same way used for simulated data (see above).

Some remarks about the computation of the processing pass

Special cases:

  • Production 1959

The production 1959: this is the only production which runs Brunel on DIGI files to produce DST files. It is taken into account in the creation of the processing pass (methods: This is not a problem, just a remark

  • Productions 2000 and 2001

these productions are a special case because in the same production different versions of the application have been used. We have changed the production number for those jobs with Brunel v30. This is fixed

  • Productions with only SIM files

There are many productions which only have SIM files. Some of them are: 1329, 1330, 1331, 1332, 1333, 1334, 1335, 1336,1358, 1359, 1360, 1361, 1362, 1363, 1364, 1365, 1368. They are leftovers of buggy productions. To be ignored.

-- ElisaLanciotti - 02 Jul 2008

Topic attachments
I Attachment History Action Size Date Who Comment
HTMLhtm PASS_GROUP.htm r1 manage 0.6 K 2008-09-23 - 16:56 ElisaLanciotti The PASS_GROUP table
HTMLhtml pass_index.html r1 manage 6.4 K 2008-07-02 - 17:19 ElisaLanciotti The PASS_INDEX table content
PNGpng ppass_tables.png r1 manage 10.3 K 2008-09-23 - 14:59 ElisaLanciotti Schema of the Processing Pass tables
HTMLhtml processing_pass.html r2 r1 manage 38.0 K 2008-09-23 - 14:57 ElisaLanciotti PROCESSING_PASS table
PDFpdf prodDiagram.pdf r1 manage 225.3 K 2008-07-09 - 11:18 ElisaLanciotti Production workflow diagram
HTMLhtml total_processing_pass.html r1 manage 0.8 K 2008-07-02 - 17:52 ElisaLanciotti Possible Processing Passes (these the Total Processing Passes) available
Edit | Attach | Watch | Print version | History: r12 | r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r8 - 2008-09-23 - ElisaLanciotti
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback