Bookkeeping brainstorming 9th June 2008

Participants : Philippe Charpentier, Zoltan Mathe, Elisa Lanciotti, Stuart Paterson, Andrew Smith, Markus Frank, Olivier Callot, Joel Closier, Beat Jost, Clara Gaspar, Patrick Koppenburg

  • Data Taking Period table

Add a new attribute: year. This will be necessary not to mix data from different years.
Definitely remove the period start and period end attributes.
About the 'description' of the data taking period: this field should be left blank. Then, when the run is registered in the BK database, if this is a new entry in the data taking period, the system will automatically assign a unique string to the description, Later, this string can be manually updated to a more meaningful string, since it will be exposed to end users when they make the selection for the datasets.
What should this description contain? It has been reminded again that the partition and the run type have to be stored in the jobs:ConfigName and jobs:ConfigVersion, not in the Data Taking Period. Ex. for real data the ConfigName-ConfigVersion will be: 'LHCb-Physics'. The description of the Data Taking Period should be set offline by some expert (Physics analysis coordinator?). Difficult to say now..
The impression is that it's difficult to define now many details of this table, while there are no data.

  • Processing Pass

Last implementation looks ok.

  • Quality parameters

The quality attributes should be related to the files on a file by file basis (then, why don't adding them into the files table?)
Patrick suggests to attend to the Data Quality Workshop (Tuesday 17 June at 9:00).
Finally it has not been decided which attribute to include in this table. For the time being, it's not implemented yet.

  • Queries on the Oracle tables or on the views

Zoltan showed the results of the comparisons between the queries on the Oracle tables, and queries on the materialized views. Of course, the queries on the views are faster (take the numbers from the slides..) Nevertheless, it was pointed out that maybe it is not necessary to have a system of views like the current one. Maybe it could be feasable to have some intermediate views, that is to build only the roottree table. The roottree table consists of all the possible combinations of the attributes that can be queried. This will make faster the first phase of the query, when the user chooses in cascade the possible values of the attributes to query. With this solution, the second phase of the query (since he click on submit and until he gets the list of datasets) would be as slow as it is now, because the file names are retrieved from the Oracle tables (not from the views!). The fact that the second phase of the query is a bit slower (Zoltan showed times around 1.7-2 seconds, and it can vary depending on the size of the query) is not worrying. The important point is to provide a system which offers a fast interface for the first phase of the query!
In the current implementation we have both the roottree table and also, corresponding to each row of the roottree, a jobfileinfo table containing the result of that query (file names and other interesting attributes). Of course this solution is extremely fast to make queries. But it has some disadvantages: a new file is not visible until the views are refreshed. So, if the intermediate solution of implementing only the roottree is fast enough, we could adopt it.
Zoltan will implement the roottree and measure the performance with this solution. Maybe we will add the production number in the files table, in order to make queries faster.

-- ElisaLanciotti - 10 Jun 2008

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2008-06-10 - ElisaLanciotti
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback