PandaDB

This section describes the organization and schema of ORACLE database of Panda. Contact Gancho Dimitrov at CERN for access information.

PandaDB (ORACLE) schema (created May 2009)

As of May 2009 PandaDB was migrated from BNL/MySQL to CERN/ORACLE; the PandaDBMySQL page describes the original MySQL PandaDB schema.

PandaDB consists of 4 jobs-tables and 3 extra tables (files,meta,cloudtasks). pandamon.cern.ch/describe?doc=*&table=^

4 job-tables: JOBSDEFINED4, JOBSACTIVE4, JOBSARCHIVED4, JOBSWAITING4 have the following columns and types:

column name column type notes
PandaID NUMBER(11) sequential ID generated from Oracle sequence object JOBSDEFINED4_PANDAID_SEQ, jobsDefined4
jobDefinitionID NUMBER(11) comes from CERN DB for managed production jobs, and defines job set IDs for analysis jobs
schedulerID VARCHAR2(128) ID identifying the pilot scheduler
pilotID VARCHAR2(128) ID assigned to pilot by scheduler
creationTime DATE generated by Oracle's SYSDATE function
creationHost VARCHAR2(128) set by Panda server
modificationTime DATE set by Panda server when job state changes or pilot update is received
modificationHost VARCHAR2(128) set by Panda server
AtlasRelease VARCHAR2(64) Release required to run the job. Example: Atlas-10.0.1
transformation VARCHAR2(250) Payload job script. Example: share/rome.1001.reco.MuonDigit.OverrideRE.trf
homepackage VARCHAR2(64) example: JobTransforms-10.0.1.5
prodSeriesLabel VARCHAR2(64) constrained to "DC2, Rome, DC3, pandatest"
prodSourceLabel VARCHAR2(64) possible values: "managed, regional, user, panda, test"
prodUserID VARCHAR2(250) ID of the user defined the job (user's certificateDN or e-mail address )
assignedPriority NUMBER(5) comes from CERN ProdDB for managed production jobs (assigned by production manager)
currentPriority NUMBER(5) indexed, assigned by production manager, usually the same as assignedPriority, can be modified by Panda server
attemptNr NUMBER(2) number of job attempts, ("0" in jobsDefined table for new jobs,"1" when job is moved to jobsActive table)
maxAttempt NUMBER(2) comes from CERN ProdDB for managed production jobs (is used to limit the number of failures assigned by production manager)
jobStatus VARCHAR2(15) status of the job,indexed (possible values: "defined,assigned, activated, sent, waiting, running, holding,transferring,finished, failed")
jobName VARCHAR2(128) comes from CERN ProdDB for managed production jobs
maxCpuCount NUMBER(9) comes from CERN ProdDB for managed production jobs
maxCpuUnit VARCHAR2(32) comes from CERN ProdDB for managed production jobs
maxDiskCount NUMBER(9) comes from CERN ProdDB for managed production jobs
maxDiskUnit CHAR(2) comes from CERN ProdDB for managed production jobs
ipConnectivity CHAR(3) comes from CERN ProdDB for managed production jobs
minRamCount NUMBER(9) comes from CERN ProdDB for managed production jobs
minRamUnit CHAR(2) comes from CERN ProdDB for managed production jobs
startTime DATE filled by Panda server
endTime DATE filled by Panda server
cpuConsumptionTime NUMBER(20) will be filled after job finishes
cpuConsumptionUnit VARCHAR2(128) will be filled after job finishes
commandToPilot VARCHAR2(250) used to send commands to pilot from Panda server, to kill the job for instance
transExitCode VARCHAR2(128) transformation exit code
pilotErrorCode NUMBER(6) pilot error code
pilotErrorlDiag VARCHAR2(250) pilot error diagnostics
exeErrorCode NUMBER(6) executor error code (number). Error codes and diagnostics will use the ones from: https://twiki.cern.ch/twiki/bin/view/Atlas/ProdSysErrorCodes
exeErrorDiag VARCHAR2(250) executor error diagnostics
supErrorCode NUMBER(6) supervisor error code
supErrorDiag VARCHAR2(250) supervisor error diagnostics
ddmErrorCode NUMBER(6) DDM error code
ddmErrorDiag VARCHAR2(250) DDM error diagnostics
brokerageErrorCode NUMBER(6) brokerage error code
brokerageErrorDiag VARCHAR2(250) brokerage error diagnostics
jobDispatcherErrorCode NUMBER(6) jobDispatcher error code
jobDispatcherErrorDiag VARCHAR2(250) jobDispatcher error diagnostics
taskBufferErrorCode NUMBER(6) taskBuffer error code
taskBufferErrorDiag VARCHAR2(250) taskBuffer error diagnostics
computingSite VARCHAR2(128) example: BU_ATLAS_Tier2
computingElement VARCHAR2(128) example: atlas.bu.edu
jobParameters CLOB parsed from jobDef xml and stored as a string
metadata CLOB metadata xml of output file (pool format)
prodDBlock VARCHAR2(250) name of dataset containing job input files
dispatchDBlock VARCHAR2(250) name of job's dispatch dataset; a prodDBlock may be broken down into smaller blocks for dispatch to sites
destinationDBlock VARCHAR2(250) name of job's destination dataset; is used to register the outputs of an associated set of jobs as belonging to one block to be saved at an archival destination
destinationSE VARCHAR2(250) destination storage element (archival destination) of job's output file(s)
nEvents VARCHAR2(9) number of events done/produced by the job
grid VARCHAR2(32) GRID job is submitted to (OSG,LCG,etc.)
cloud VARCHAR2(32) cloud (associated with Tier 1) job is submitted to (US,CA,etc.)
cpuConversion NUMBER(9,4) CPU conversion factor
sourceSite VARCHAR2(36) source site (usually CE) for file transfer
destinationSite VARCHAR2(36) destination site (usually SE) for file transfer
transferType VARCHAR2(10) type of file transfer
taskID NUMBER(9) task ID
cmtConfig VARCHAR2(250) cmt config
stateChangeTime DATE timestamp of the last state change
prodDBUpdateTime DATE timestamp of the last update in Oracle ProdDB
lockedby VARCHAR2(128) the name of the current writer of the record in ProdDB
relocationFlag NUMBER(1) flag for submitting special jobs (M5reco) to a given cluster
jobExecutionID NUMBER(11) unsigned job execution ID coming from Oracle ProdDB
VO VARCHAR2(16) Virtual Organization (for non-ATLAS support)
pilotTiming VARCHAR2(100) total time consumption of the pilot
workingGroup VARCHAR2(20) working group name, for jobs owned by working groups
processingType VARCHAR2(64) type of the job (evgen,simul,reco,reprocessing,etc.)
prodUserName VARCHAR2(60) The name eg. 'Joe Smith' extracted from the DN, so we can use '=' rather than LIKE (on prodUserID) to look up jobs by username
nInputFiles NUMBER(5) # of input files needed for analysis jobs
countryGroup VARCHAR2(20) country of the job submitter for regional queue restrictions/controls

File-table FILESTABLE4 structure:

column name column type notes
rowID NUMBER(11) autoincremented id of the row generated from the FILESTABLE4_ROW_ID_SEQ sequence
PandaID NUMBER(11) unsigned PandaID of the job, indexed in this table
GUID VARCHAR2(64) guid of the file, will be indexed?
lfn VARCHAR2(256) logical file name of input/output files
type VARCHAR2(64) constrained to "input, output, log"
dataset VARCHAR2(128) dataset name where input/output files belong to
status VARCHAR2(64) status of input/output files; whether job's input is at computingSite, job's output is available in destination datablock, index
prodDBlock VARCHAR2(250) name of datablock where job's input files(s) is part of
prodDBlockToken VARCHAR2(250) the token of job's prodDBlock
dispatchDBlock VARCHAR2(250) name of job's dispatch datablock; a prodDBlock may be broken down into smaller blocks for dispatch to sites,index
dispatchDBlockToken VARCHAR2(250) the token of job's dispatchDBlock
destinationDBlock VARCHAR2(250) name of job's destination datablock; is used to register the outputs of an associated set of jobs as belonging to one block to be saved at an archival destination, index
destinationDBlockToken VARCHAR2(250) the token of job's destinationDBlock
destinationSE VARCHAR2(250) destination storage element (archival destination) of job's output file(s)
fsize NUMBER(10) file size in bytes
md5sum CHAR(36) md5sum of the file
checksum CHAR(36) any checksum of the file (either adler32 or md5sum)

JOBPARAMSTABLE structure.

column name column type notes
PandaID NUMBER(11) unsigned PandaID of the job, indexed
jobparameters CLOB the parameters of the job

METATABLE structure.

column name column type notes
PandaID NUMBER(11) unsigned PandaID of the job, indexed
metaData CLOB Meta data produced by the job in XML format

CLOUDTASKS table structure.

column name column type notes
id NUMBER(9) NOT NULL auto_increment internal row counter, primary key
taskname VARCHAR2(128) NOT NULL default '' task name, unique key (taskname+taskid)
taskid NUMBER(9) default NULL task ID
cloud VARCHAR2(20) NOT NULL default '' cloud name
status VARCHAR2(20) NOT NULL default '' status of the task
tmod DATE NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP timestamp of modifiaction
tenter DATE NOT NULL default '01-01-1970 00:00:00' timestamp of creation

PandaDB ARCH ( ORACLE ) schema (created May 2009)

Panda ARCH DB consists of 4 tables with the same structure as the ones in the ATLAS_PANDA schema but with an only difference that the MODIFICATIONTIME column exists in all 4 tables. This is done on the purpose to partition all ARCH tables on based on this column.

JOBSARCHIVED, FILESTABLE_ARCH, JOBPARAMSTABLE_ARCH, METATABLE_ARCH

Note: the CLOB columns "JOBPARAMETERS" and "METADATA" in the JOBSARCHIVED table are present there just for back compatibility with the PANDA software and are not populated (used) any more. The data of them is stored into the JOBPARAMSTABLE_ARCH and METATABLE_ARCH tables.

Design Notes

Oracle partitioning on the large ARCH tables

For partitioning the Panda ARCH tables is used the RANGE partitioning method on the MODIFICATIONTIME column. In that way we break the table in smaller more manageable pieces which helps for the performance, backup, maintenance and archiving. Moreover, each table and index partition resides in a dedicated Oracle tablespace (separate DB file) per year basis.

  • ATLAS_PANDAARCH_DATA2006
  • ATLAS_PANDAARCH_DATA2007
  • ATLAS_PANDAARCH_DATA2008
  • ATLAS_PANDAARCH_DATA2009
  • ATLAS_PANDAARCH_DATA2010
  • ATLAS_PANDAARCH_DATA2011

more will be added with the time ...

Applied indexing

For maximum flexibility we decided to have all indexes on the ARCH tables partitioned as well (to each table partition there is a corresponding index partition ). Depending on the nature of the data of the columns on which the filtering conditions will be applied (the SQL WHERE clause ) the different type of indexing is applied. For the columns with not many distinct values we create Bitmap indexes and for the rest, normal B*tree index (some of them with COMPRESS option included)

For example, the JOBSARCHIVED has 11 indexed columns, 6 of which are B*tree and 5 are Bitmap

6 B*Tree indexes

CREATE INDEX ATLAS_PANDAARCH.JOBS_PANDAID_IDX ON ATLAS_PANDAARCH.JOBSARCHIVED(PANDAID) LOCAL CREATE INDEX ATLAS_PANDAARCH.JOBS_MODTIME_IDX ON ATLAS_PANDAARCH.JOBSARCHIVED(MODIFICATIONTIME) LOCAL CREATE INDEX ATLAS_PANDAARCH.JOBS_TASKID_IDX ON ATLAS_PANDAARCH.JOBSARCHIVED(TASKID) LOCAL COMPRESS 1 CREATE INDEX ATLAS_PANDAARCH.JOBS_JOBDEFID_IDX ON ATLAS_PANDAARCH.JOBSARCHIVED(JOBDEFID) LOCAL COMPRESS 1 CREATE INDEX ATLAS_PANDAARCH.JOBS_JOBDEFID_IDX ON ATLAS_PANDAARCH.JOBSARCHIVED(JOBDEFINITIONID) LOCAL COMPRESS 1 CREATE INDEX ATLAS_PANDAARCH.JOBS_JOBNAME_IDX ON ATLAS_PANDAARCH.JOBSARCHIVED(JOBNAME) LOCAL COMPRESS 1

5 Bitmap indexes

CREATE BITMAP INDEX ATLAS_PANDAARCH.JOBS_COMPSITE_IDX ON ATLAS_PANDAARCH.JOBSARCHIVED(COMPUTINGSITE) LOCAL CREATE BITMAP INDEX ATLAS_PANDAARCH.JOBS_STATUS_IDX ON ATLAS_PANDAARCH.JOBSARCHIVED(JOBSTATUS) LOCAL CREATE BITMAP INDEX ATLAS_PANDAARCH.JOBS_PRODDBLOCK_IDX ON ATLAS_PANDAARCH.JOBSARCHIVED(PRODDBLOCK) LOCAL CREATE BITMAP INDEX ATLAS_PANDAARCH.JOBS_PRODSOURCELABEL_IDX ON ATLAS_PANDAARCH.JOBSARCHIVED(PRODSOURCELABEL) LOCAL CREATE BITMAP INDEX ATLAS_PANDAARCH.JOBS_DESTINATIONDBLOCK_IDX ON ATLAS_PANDAARCH.JOBSARCHIVED(DESTINATIONDBLOCK) LOCAL

Tuning of the queries against the PANDA 'live' and archive data (Oracle hints in place)

We came across the challenge of finding a way for fast search deep in the PANDA archive. This is of great importance as the user requests are sent from a web application where a short query elapsed time is a must. The practice showed that relying on the MODIFICATIONTIME index is not enough as this was causing huge sequential IO (index-table). Even adding more indexes on the column of interest did not cause the Oracle Optimizer to use them. The best approach from my opinion in that case was to use the INDEX_COMBINE hint. Oracle does range scan (if B*Tree) on the indexes specified on the hint section, get the relevant ROWIDs from them and perform BITAND operation on them. At he end with the ROWIDs, it gets all relevant rows from the table. In that way the number of IOs are essentially reduced.

List of some of the hints put in place depending on what columns user wants to filter out:

  • 'pandaid' : ' INDEX_COMBINE(tab JOBS_MODTIME_IDX JOBS_PANDAID_IDX) '
  • 'taskid' : ' INDEX_COMBINE (tab JOBS_TASKID_IDX JOBS_MODTIME_IDX) '
  • 'jobdefinitionid' : ' INDEX_COMBINE(tab JOBS_MODTIME_IDX JOBS_JOBDEFID_IDX) '
  • 'jobname' : ' INDEX_COMBINE(tab JOBS_MODTIME_IDX JOBS_JOBNAME_IDX) '
  • 'produsername' : ' INDEX_COMBINE(tab JOBS_MODTIME_IDX JOBS_PRODUSERNAME_IDX) '
  • 'computingsite' : ' INDEX_COMBINE(tab JOBS_MODTIME_IDX JOBS_COMPSITE_IDX) '

example of a query with conditions: 90 days time window and certain TASKID - fetches about 10000 rows

SELECT /*+ index_COMBINE (tab JOBS_TASKID_IDX JOBS_MODTIME_IDX) */ * FROM ATLAS_PANDAARCH.JOBSARCHIVED tab WHERE SYSDATE - 90 <= modificationTime AND taskID = '40434';

Code depot

Development Team

Main.GanchoDimitrov, Main.YuriSmirnov


Major updates:

-- Main.GanchoDimitrov - 06 July 2009 -- Main.YuriSmirnov - 18 June 2009



Responsible: Main.GanchoDimitrov, Main.YuriSmirnov Main.NurcanOzturk

Topic attachments
I Attachment History Action Size Date Who Comment
PNGtiff JOBSARCH_partitions.tiff r1 manage 347.8 K 2009-07-03 - 12:11 GanchoDimitrov snapshot of the Oracle partitions of one of the ARCH tables
Microsoft Word filertf Panda_table_partitioning_TWIKI.rtf r1 manage 65.7 K 2009-07-03 - 16:37 GanchoDimitrov Script for creating the ATLAS_PANDAARCH partitioned tables
PNGtiff part_def.tiff r1 manage 75.9 K 2009-07-03 - 14:14 GanchoDimitrov SQL statement of the create RANGE partitioned table
Edit | Attach | Watch | Print version | History: r64 < r63 < r62 < r61 < r60 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r64 - 2013-02-14 - ValeriFine
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback