TWiki> LHCb Web>LHCbComputing>BKDev>BKLimitations (revision 1)EditAttachPDF

Bookkeeping schema and Online data

When looking into how to use the current schema for online data, I cam to severe limitations (largely due to shortcuts) in the current schema, which could even affect any data type if even minor changes are done in some conventions. This should by all means be avoided in such a Database and hence the issue should be addressed as soon as possible.

Job referencing

The current schema contains as job information in the table /jobs: CONFIGNAME, CONFIGVERSION, JOBDATE and JOBID. An additional table /jobParams is related to /jobs via the JOBID and contains many parameters related to the job. One can immediately notice that the only parameter that characterise uniquely a job in /jobs is JOBID, which is an automatically generated identifier. A job doesn't have a "name" that is meant at being unique such that a query can be made on it. There a NAME parameter though in /jobParams. Is is supposed to be unique?

Looking at the web page, one could imagine such information exists as there is a "Job Lookup" form. Unfortunately this query relies on the file naming convention:

<file_name> = <job_name>_<step>.<extension>
<job_name> = <production>_<job_in_prod>
Similarly the "production lookup" relies on the above convention. For example the FETC in stripping jobs cannot be found with these queries... For example querying production 2000 uses the following URL:
http://volhcb05.cern.ch:8080/BkkServlets/DataSets?prev_fname=-&page_number=1&sql_user=%20f.filename%20like%20'%2500002000_%25_%25'&ndata_per_page=10

Even job queries (from the jobId link of a file is made through the file!). For example for job 12109574 the URL is

http://lhcbbk.cern.ch:8080/BkkServletsWrite/Select?job=/lhcb/production/DC06/phys-v2-lumi5-BcVegPy/00002015/DIGI/0000/00002015_00000001_4.digi

The primary cause of this limitation is that the job in /jobs doesn't have a NAME as a key. If we are to identify a "run" to a "job", one has to be able to query a run by its number without having to rely on naming conventions of the files! This is as well very bad for production as we have seen above. A job should have a NAME that is unique...

File insertion

Currently, although files are the primary source of queries, they are inserted a "outputfile" of jobs as shown in the following excerpt of an XML bookkeeping file:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Job SYSTEM "book.dtd">
<Job ConfigName="DC06" ConfigVersion="Stripping-v31-lumi2" Date="2007-11-28" Time="11:21">
  <TypedParameter Name="Production" Value="00002000" Type="Info"/>
  <TypedParameter Name="Job" Value="00000361" Type="Info"/>
  <TypedParameter Name="Name" Value="00002000_00000361_3" Type="Info"/>
  <TypedParameter Name="Location" Value="LCG.CERN.ch" Type="Info"/>
..........
  <TypedParameter Name="ExecTime" Value="17546.2892981" Type="Info"/>
  <InputFile    Name="/lhcb/production/DC06/v1r0/00002000/FETC/0000/ETC_00002000_00000361.root"/>
..........
  <OutputFile   Name="/lhcb/production/DC06/v1r0/00002000/SETC/0000/SETC_00002000_00000361_3.root" TypeName="SETC" TypeVersion="1">
    <Parameter  Name="EventType"     Value="10000000"/>
    <Parameter  Name="EventStat"       Value="1825"/>
    <Parameter  Name="Size"        Value="19939"/>
    <Quality Group="Production Manager" Flag="Not Checked"/>
    <Parameter  Name="MD5SUM"        Value="548417be0c97b535eced95b57bd1f6ce"/>
    <Parameter  Name="GUID"        Value="3A44ED5E-739D-DC11-BEAC-000E0C4D35D9"/>
  </OutputFile>
  <OutputFile   Name="/lhcb/production/DC06/v1r0/00002000/DST/0000/00002000_00000361_3.dst" TypeName="DST" TypeVersion="1">
    <Parameter  Name="EventType"     Value="10000000"/>
    <Parameter  Name="EventStat"       Value="1825"/>
    <Parameter  Name="Size"        Value="1128071349"/>
    <Quality Group="Production Manager" Flag="Not Checked"/>
    <Parameter  Name="MD5SUM"        Value="38fc9853ad1a0530e4871c0763abdad2"/>
    <Parameter  Name="GUID"        Value="6821205D-739D-DC11-BEAC-000E0C4D35D9"/>
  </OutputFile>
...........

Due to the limitation explained above

-- PhilippeCharpentier - 14 Dec 2007

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2007-12-14 - PhilippeCharpentier
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback