. Contact Gancho Dimitrov at CERN for access information.
PandaDB schema.
consists of 4 jobs-tables and 3 extra tables (files,meta,cloudtasks). pandamon.cern.ch/describe?doc=*&table=^
column name |
column type |
notes |
PandaID |
NUMBER(11) |
sequential ID generated from Oracle sequence object JOBSDEFINED4_PANDAID_SEQ, jobsDefined4 |
jobDefinitionID |
NUMBER(11) |
comes from CERN DB for managed production jobs, and defines job set IDs for analysis jobs |
schedulerID |
VARCHAR2(128) |
ID identifying the pilot scheduler |
pilotID |
VARCHAR2(128) |
ID assigned to pilot by scheduler |
creationTime |
DATE |
generated by Oracle's SYSDATE function |
creationHost |
VARCHAR2(128) |
set by Panda server |
modificationTime |
DATE |
set by Panda server when job state changes or pilot update is received |
modificationHost |
VARCHAR2(128) |
set by Panda server |
AtlasRelease |
VARCHAR2(64) |
Release required to run the job. Example: Atlas-10.0.1 |
transformation |
VARCHAR2(250) |
Payload job script. Example: share/rome.1001.reco.MuonDigit.OverrideRE.trf |
homepackage |
VARCHAR2(64) |
example: JobTransforms-10.0.1.5 |
prodSeriesLabel |
VARCHAR2(64) |
constrained to "DC2, Rome, DC3, pandatest" |
prodSourceLabel |
VARCHAR2(64) |
possible values: "managed, regional, user, panda, test" |
prodUserID |
VARCHAR2(250) |
ID of the user defined the job (user's certificateDN or e-mail address ) |
assignedPriority |
NUMBER(5) |
comes from CERN ProdDB for managed production jobs (assigned by production manager) |
currentPriority |
NUMBER(5) |
indexed, assigned by production manager, usually the same as assignedPriority, can be modified by Panda server |
attemptNr |
NUMBER(2) |
number of job attempts, ("0" in jobsDefined table for new jobs,"1" when job is moved to jobsActive table) |
maxAttempt |
NUMBER(2) |
comes from CERN ProdDB for managed production jobs (is used to limit the number of failures assigned by production manager) |
jobStatus |
VARCHAR2(15) |
status of the job,indexed (possible values: "defined,assigned, activated, sent, waiting, running, holding,transferring,finished, failed") |
jobName |
VARCHAR2(128) |
comes from CERN ProdDB for managed production jobs |
maxCpuCount |
NUMBER(9) |
comes from CERN ProdDB for managed production jobs |
maxCpuUnit |
VARCHAR2(32) |
comes from CERN ProdDB for managed production jobs |
maxDiskCount |
NUMBER(9) |
comes from CERN ProdDB for managed production jobs |
maxDiskUnit |
CHAR(2) |
comes from CERN ProdDB for managed production jobs |
ipConnectivity |
CHAR(3) |
comes from CERN ProdDB for managed production jobs |
minRamCount |
NUMBER(9) |
comes from CERN ProdDB for managed production jobs |
minRamUnit |
CHAR(2) |
comes from CERN ProdDB for managed production jobs |
startTime |
DATE |
filled by Panda server |
endTime |
DATE |
filled by Panda server |
cpuConsumptionTime |
NUMBER(20) |
will be filled after job finishes |
cpuConsumptionUnit |
VARCHAR2(128) |
will be filled after job finishes |
commandToPilot |
VARCHAR2(250) |
used to send commands to pilot from Panda server, to kill the job for instance |
transExitCode |
VARCHAR2(128) |
transformation exit code |
pilotErrorCode |
NUMBER(6) |
pilot error code |
pilotErrorlDiag |
VARCHAR2(250) |
pilot error diagnostics |
exeErrorCode |
NUMBER(6) |
executor error code (number). Error codes and diagnostics will use the ones from: https://twiki.cern.ch/twiki/bin/view/Atlas/ProdSysErrorCodes |
exeErrorDiag |
VARCHAR2(250) |
executor error diagnostics |
supErrorCode |
NUMBER(6) |
supervisor error code |
supErrorDiag |
VARCHAR2(250) |
supervisor error diagnostics |
ddmErrorCode |
NUMBER(6) |
DDM error code |
ddmErrorDiag |
VARCHAR2(250) |
DDM error diagnostics |
brokerageErrorCode |
NUMBER(6) |
brokerage error code |
brokerageErrorDiag |
VARCHAR2(250) |
brokerage error diagnostics |
jobDispatcherErrorCode |
NUMBER(6) |
jobDispatcher error code |
jobDispatcherErrorDiag |
VARCHAR2(250) |
jobDispatcher error diagnostics |
taskBufferErrorCode |
NUMBER(6) |
taskBuffer error code |
taskBufferErrorDiag |
VARCHAR2(250) |
taskBuffer error diagnostics |
computingSite |
VARCHAR2(128) |
example: BU_ATLAS_Tier2 |
computingElement |
VARCHAR2(128) |
example: atlas.bu.edu |
jobParameters |
CLOB |
parsed from jobDef xml and stored as a string |
metadata |
CLOB |
metadata xml of output file (pool format) |
prodDBlock |
VARCHAR2(250) |
name of dataset containing job input files |
dispatchDBlock |
VARCHAR2(250) |
name of job's dispatch dataset; a prodDBlock may be broken down into smaller blocks for dispatch to sites |
destinationDBlock |
VARCHAR2(250) |
name of job's destination dataset; is used to register the outputs of an associated set of jobs as belonging to one block to be saved at an archival destination |
destinationSE |
VARCHAR2(250) |
destination storage element (archival destination) of job's output file(s) |
nEvents |
VARCHAR2(9) |
number of events done/produced by the job |
grid |
VARCHAR2(32) |
GRID job is submitted to (OSG,LCG,etc.) |
cloud |
VARCHAR2(32) |
cloud (associated with Tier 1) job is submitted to (US,CA,etc.) |
cpuConversion |
NUMBER(9,4) |
CPU conversion factor |
sourceSite |
VARCHAR2(36) |
source site (usually CE) for file transfer |
destinationSite |
VARCHAR2(36) |
destination site (usually SE) for file transfer |
transferType |
VARCHAR2(10) |
type of file transfer |
taskID |
NUMBER(9) |
task ID |
cmtConfig |
VARCHAR2(250) |
cmt config |
stateChangeTime |
DATE |
timestamp of the last state change |
prodDBUpdateTime |
DATE |
timestamp of the last update in Oracle ProdDB |
lockedby |
VARCHAR2(128) |
the name of the current writer of the record in ProdDB |
relocationFlag |
NUMBER(1) |
flag for submitting special jobs (M5reco) to a given cluster |
jobExecutionID |
NUMBER(11) unsigned |
job execution ID coming from Oracle ProdDB |
VO |
VARCHAR2(16) |
Virtual Organization (for non-ATLAS support) |
pilotTiming |
VARCHAR2(100) |
total time consumption of the pilot |
workingGroup |
VARCHAR2(20) |
working group name, for jobs owned by working groups |
processingType |
VARCHAR2(64) |
type of the job (evgen,simul,reco,reprocessing,etc.) |
prodUserName |
VARCHAR2(60) |
The name eg. 'Joe Smith' extracted from the DN, so we can use '=' rather than LIKE (on prodUserID) to look up jobs by username |
nInputFiles |
NUMBER(5) |
# of input files needed for analysis jobs |
countryGroup |
VARCHAR2(20) |
country of the job submitter for regional queue restrictions/controls |
column name |
column type |
notes |
rowID |
NUMBER(11) |
autoincremented id of the row generated from the FILESTABLE4_ROW_ID_SEQ sequence |
PandaID |
NUMBER(11) unsigned |
PandaID of the job, indexed in this table |
GUID |
VARCHAR2(64) |
guid of the file, will be indexed? |
lfn |
VARCHAR2(256) |
logical file name of input/output files |
type |
VARCHAR2(64) |
constrained to "input, output, log" |
dataset |
VARCHAR2(128) |
dataset name where input/output files belong to |
status |
VARCHAR2(64) |
status of input/output files; whether job's input is at computingSite, job's output is available in destination datablock, index |
prodDBlock |
VARCHAR2(250) |
name of datablock where job's input files(s) is part of |
prodDBlockToken |
VARCHAR2(250) |
the token of job's prodDBlock |
dispatchDBlock |
VARCHAR2(250) |
name of job's dispatch datablock; a prodDBlock may be broken down into smaller blocks for dispatch to sites,index |
dispatchDBlockToken |
VARCHAR2(250) |
the token of job's dispatchDBlock |
destinationDBlock |
VARCHAR2(250) |
name of job's destination datablock; is used to register the outputs of an associated set of jobs as belonging to one block to be saved at an archival destination, index |
destinationDBlockToken |
VARCHAR2(250) |
the token of job's destinationDBlock |
destinationSE |
VARCHAR2(250) |
destination storage element (archival destination) of job's output file(s) |
fsize |
NUMBER(10) |
file size in bytes |
md5sum |
CHAR(36) |
md5sum of the file |
checksum |
CHAR(36) |
any checksum of the file (either adler32 or md5sum) |
structure.
structure.
table structure.
Panda ARCH DB consists of 4 tables with the same structure as the ones in the ATLAS_PANDA schema but with an only difference that the MODIFICATIONTIME column exists in all 4 tables. This is done on the purpose to partition all ARCH tables on based on this column.
Note: the CLOB columns "JOBPARAMETERS" and "METADATA" in the JOBSARCHIVED table are present there just for back compatibility with the PANDA software and are not populated (used) any more. The data of them is stored into the JOBPARAMSTABLE_ARCH and METATABLE_ARCH tables.
For partitioning the Panda ARCH tables is used the RANGE partitioning method on the MODIFICATIONTIME column. In that way we break the table in smaller more manageable pieces which helps for the performance, backup, maintenance and archiving.
Moreover, each table and index partition resides in a dedicated Oracle tablespace (separate DB file) per year basis.
more will be added with the time ...
For maximum flexibility we decided to have all indexes on the ARCH tables partitioned as well (to each table partition there is a corresponding index partition ).
Depending on the nature of the data of the columns on which the filtering conditions will be applied (the SQL WHERE clause ) the different type of indexing is applied. For the columns with not many distinct values we create Bitmap indexes and for the rest, normal B*tree index (some of them with COMPRESS option included)
For example, the JOBSARCHIVED has 11 indexed columns, 6 of which are B*tree and 5 are Bitmap
CREATE INDEX ATLAS_PANDAARCH.JOBS_PANDAID_IDX ON ATLAS_PANDAARCH.JOBSARCHIVED(PANDAID) LOCAL