-- JuliaAndreeva - 03-Jun-2010 We need to get information for 4 job monitoring data categories :

Below I've tried to find correspondence between the Dashboard DB and Panda Db structures, taking into account only data which is needed for Dashboard job monitoring purposes and correspondingly for creating of the sql queries ran by Dashboard collector.

Task Table and other tables to which Task table points via foreign keys

DashboardField DashboardTable Panda fied Panda table Comments/questions
TaskMonitorId TASK taskname+taskid for production, produsername+jobdefinitionid for analysis cloudtasks for production, job table for analysis For production task description is done in the production table and taskid is coming from there. The name which would be allocated to the task in Dashboard would have a structure panda_
UserId foregn key to USERS table TASK produsername jobtable we take from Panda produsername and produserid , which contains certificate subject. They define unique record in Dashboard USERS table , task table contains a foregn key to it
TaskCreatedTimeStamp TASK - - Internal to Dashboard when record is created in the dashboard DB
TaskTypeId foregn key to TASK_TYPE table TASK processingtype jobtable in case of pathina or Ganga TaskType inDashboard is analysis, for other cases is just the same value as recorded in processingtype. The value itself is recorded in the TASK_TYPE table, task keeps a foregn key to it
NeventsPerJob TASK - - Not known in Panda
ApplicationId foregn key to APPLICATION table TASK atlasrelease jobtable  
ApplExecId foreign key to APPL_EXEC table TASK transformation jobtable  
InputCollectionId foreign key to InputCollection table TASK prodDBlock jobtable  
OutputCollectionId foreign key to OutputCollection table TASK destinationblock jobtable  
DefaultSchedulerId foreign key to SCHEDULER table TASK - - for panda jobs is always 'panda'
SubmissionToolId foreign key to SUBMISSION_TOOL table TASK processingtype for analysis jobs (pathina or ganga) in case of analysis, have to clarify are there any different possibilities for production
SubToolVerId foreign key to SUBMISSION_TOOL_VER table TASK ? ? I think I forgot to ask about it , have to clarify
SubmissionUIId foregn key to SUBMISSION_UI table TASK creationhost jobtable where task submission was initiated from
JobProcessingTypeId TASK - - Note really used in Dashboard for the time being, the idea was that for difefrent type of jobs there could be additional set of parameters to follow
TagetCE TASK destinationse jobtable does not make much sence for PANDA since the site is resolved by PANDA before submission, in Dashboard was used in order to resolve site as soon as possible, even before the job is scheduled at the site
SeId foreign key to SE table TASK destinationse jobtable  
SubmissionType foregn key to SUBMISSION_TYPE table TASK - - for panda will be always 'panda_server'

Job Table and other tables to which Job table points via foreign keys

DashboardField DashboardTable Panda fied Panda table Comments/questions
SchedulerJobId JOB pandaid jobstable Panda developers told taht pandaid is unique and not recyclable, so panda_ should be good enough for identifying panda GRID job
TaskJobId JOB ? ? see comments below , it is a bit complex for the moment and not yet clear how to solve
TaskId foreign key to the TASK table JOB      
LocalBatchJobId JOB ? ? Did not clarify with Tadashi whether it is available in PANDA or not , probably not
VOJobId JOB pandaid jobtable  
NextJobId JOB ? ? see comments below, it is currently unclear how to resolve the job resubmission history
RBId foreign key to the RB table JOB - - always 'panda' for panda jobs
EventRange JOB ? ? see comments below, again it relates to job resubmission history
SubNodeIp JOB schedulerid jobtable location of panda server
LongCEId JOB ? ? Looks like we only have SHORT_CE and SITE in Panda
ShortCEId foreign key to SHORT_CE table JOB computingelement jobtable  
SiteId foregn key to the site table JOB computingsite jobtable  
DboardStatusId JOB calculated from jobstatus in panda jobtable jobtable  
DboardJobEndId JOB calculated from panda jobstatus and transexitcode jobtable means application code, only transformation code should be taken into account to undetrstand whether application ran properly
DboardGridEndId JOB calculated from jobstatus and all possible exitcodes in panda jobtable means GRID exit code, for panad would mean jobprocessing success not related to application success as such , that is success of panda workload management system
DboardStatusEnterTimeStamp JOB modificationtime only when it triggered status change jobtable
DboardFirtsInfoTimeStamp JOB creationtime jobtable  
DboardLatestInfoTimeStamp JOB modificationtime jobtable  
GridStatusId JOB jobstatus jobtable redundant in case of panda , unless we instrument pilot jobs
GridStatusReasonId JOB - - might have no sence here
GridEndStatusTimeStamp JOB modificationtime when job istirned into finished jobtable redundant unless we instrument pilots
JobExecExitCode JOB transexitcode jobtable exit code of user exectable code
JobExecExitReasonId JOB transexitdiag jobtable reason of user executable failure
JobExecExitTimeStamp JOB endtime jobtable  
CreatedTimeStamp JOB - - Internal to Dashboard when record is created in the Dashboard DB
SubmittedTimeStamp JOB creationtime jobtable  
ScheduledTimeStamp JOB - - Does not make sense for PANDA , job is not pending at the site
StartedRunningTimeStamp JOB starttime jobtable  
FinishedTimeStamp JOB modificationtime when job is turned into finished jobtable  
SchedulerId foreign key to the SCHEDULER table JOB - - always panda for panda jobs
TimeOutFlag JOB ? ? Flag which marks the jobs which never reported back after they started running and therefore are timeout by Dashboard, unclear wheter it makes sence for panada jobs
ExeTime JOB ? ? wall clock time of user executable, not clear wheter it is known in panda, have to clarify
ExeCpu JOB cpuconsumptiontime? jobtable need to clarify whether it relates to executable only or includes time spend on saving output of the job
WrapWC JOB endtime -starttime jobtable wallclock time of the user job
WrapCPU JOB cpuconsumptiontime? jobtable need to clarify whether it relates to executable only or includes time spend on saving output of the job
StOutWC JOB ? ? whether time spent on saving output is known in Panda, needs to be clarified
JobType JOB - - Does not make sense for Panda
StageOutSE JOB destinationse jobtable
PilotFlag JOB - - always the same for all jobs processed by panda pilots
InputSE JOB ? ? in panda it looks like input SE and StOutSE are always the sme, to clarify
ParentPilotId JOB pilotid jobtable currently in panda there is a name of the parent pilot. We might need to create a pilot table and to keep in the JOB table a foreign key to the record in this table
WNHostName JOB modificationhost jobtable  
NEvProc JOB nevents jobtable number of processed events
NEvReq JOB - - number of events which had to be processed by the job, does not exist in panda
NFilesProc JOB ninputfiles jobtable  

Open questions

Resubmissions

Dashboard keeps the track of resubmission history of every particular job. This feature is considered to be useful by analysis users and analysis support team since it allows to debug various problems. We would like to provide this functionality for panda jobs as well if possible.

As far as we understand from Tadashi, Panda has slightly different approach to resubmissions compared to Dashboard and WLM systems in CMS. In the Dashboard scenario in case of resubmission the task is always kept the same and just the new job is created. Dashboard keeps reference to the unique portion of work (whether in terms of input files or in terms of input events or in terms of random seeds or combination of those ) which have to be processed by a given job of the task. If the same input is processed by non-unique jobs of the same task it would mean that all those jobs are just resubmission attempts of some primary job.

As we understood from Tadashi, in ATLAS in case jobs are submitted via PANDA there are two possible scenarios.

1) The resubmitted jobs go to the same site as their predecessors. In this case PANDA would create a new task but would keep a reference to the previous task in the executionid filed of the jobtable and would increase by one attemptnumber of all jobs of this task. The problem for Dashboard to recreate a resubmission history consists of the fact that panda does not keep reference to the predecessor of every single job in the job record. To find out that job B is a next attempt of job A, one needs to make sure that A and B have exactly the same set of input files and in some cases in addition to it exactly the same values on the jobparams table which is CLOB and pretty expensive for comparison. It is possible to retrive info needed for recreation of the resubmission history, but would require pretty heavy queries on the DB, and looks to be a big overhead. Related to it we have a request. If we understand how panda works, it might just clone jobs which need to be resubmitted combining them in a single task. At that point PANDA should know pandaid of the previous attempts of all jobs in the set. Would it be possible to record pandaid of the previous attempt in the jobtable for analysis jobs? It would require small change of the panda schema adding one more field in the job table. We found out that jobparameters fileld in the job table is always 'null' for analysis and suggested whether this one could be used for pandaid of the job predecessor but Tadashi thinks that it is messy and it is better to create a new field.

2) In the second scenario the jobs are resubmitted to a different site and then re-submission as such is happenning ouside panda scope on the UI level (Ganga or Pathena). In this case as far as we understood there is no way to recretae a resubmission history using just info in panda DB.

Can update be delayed?

Modification time stamp is it time stamp when data is change din PANDA DB ? If we decide to query PANDA DB every 5 minutes, can it happen that we would miss some status changes? Should we ask every 5 minutes for last 5 minutes interval or rather every 5 minutes for last 10 minutes?

Query details

Below is an example of the query we might need to run every few minutes. For 5 minutes interval on the integration instance it takes ~1.5 seconds, returning 8-12K jobs.

select PANDAID, GRID, PRODSOURCELABEL,to_char(MODIFICATIONTIME, 'YYYY-MM-DD HH24:MI:SS') MODIFICATIONTIME,JOBDEFINITIONID,SCHEDULERID, to_char(CREATIONTIME, 'YYYY-MM-DD HH24:MI:SS') CREATIONTIME,CREATIONHOST, MODIFICATIONHOST,ATLASRELEASE,TRANSFORMATION,PRODUSERID,ATTEMPTNR,JOBSTATUS, to_char(STARTTIME, 'YYYY-MM-DD HH24:MI:SS') STARTTIME, to_char(ENDTIME, 'YYYY-MM-DD HH24:MI:SS') ENDTIME , CPUCONSUMPTIONTIME,CPUCONSUMPTIONUNIT,TRANSEXITCODE,PILOTERRORCODE,EXEERRORCODE, EXEERRORDIAG,SUPERRORCODE,SUPERRORDIAG,DDMERRORCODE,DDMERRORDIAG,BROKERAGEERRORCODE,BROKERAGEERRORDIAG, JOBDISPATCHERERRORCODE,JOBDISPATCHERERRORDIAG,COMPUTINGSITE,COMPUTINGELEMENT,PRODDBLOCK,DESTINATIONDBLOCK, DESTINATIONSE,NEVENTS,CPUCONVERSION,JOBEXECUTIONID,PROCESSINGTYPE,PRODUSERNAME,NINPUTFILES

from atlas_panda.JOBSACTIVE4 where MODIFICATIONTIME >= to_date(:start_time,'YYYY-MM-DD HH24:MI:SS') and MODIFICATIONTIME < to_date(:end_time, 'YYYY-MM-DD HH24:MI:SS')

union ALL

select PANDAID, GRID, PRODSOURCELABEL,to_char(MODIFICATIONTIME, 'YYYY-MM-DD HH24:MI:SS') MODIFICATIONTIME,JOBDEFINITIONID,SCHEDULERID, to_char(CREATIONTIME, 'YYYY-MM-DD HH24:MI:SS') CREATIONTIME,CREATIONHOST, MODIFICATIONHOST,ATLASRELEASE,TRANSFORMATION,PRODUSERID,ATTEMPTNR,JOBSTATUS, to_char(STARTTIME, 'YYYY-MM-DD HH24:MI:SS') STARTTIME, to_char(ENDTIME, 'YYYY-MM-DD HH24:MI:SS') ENDTIME, CPUCONSUMPTIONTIME,CPUCONSUMPTIONUNIT,TRANSEXITCODE,PILOTERRORCODE,EXEERRORCODE, EXEERRORDIAG,SUPERRORCODE,SUPERRORDIAG,DDMERRORCODE,DDMERRORDIAG,BROKERAGEERRORCODE,BROKERAGEERRORDIAG, JOBDISPATCHERERRORCODE,JOBDISPATCHERERRORDIAG,COMPUTINGSITE,COMPUTINGELEMENT,PRODDBLOCK,DESTINATIONDBLOCK, DESTINATIONSE,NEVENTS,CPUCONVERSION,JOBEXECUTIONID,PROCESSINGTYPE,PRODUSERNAME,NINPUTFILES

from atlas_panda.JOBSARCHIVED4 where MODIFICATIONTIME >= to_date(:start_time, 'YYYY-MM-DD HH24:MI:SS') and MODIFICATIONTIME < to_date(:end_time, 'YYYY-MM-DD HH24:MI:SS')

union ALL

select PANDAID, GRID, PRODSOURCELABEL,to_char(MODIFICATIONTIME, 'YYYY-MM-DD HH24:MI:SS') MODIFICATIONTIME,JOBDEFINITIONID,SCHEDULERID, to_char(CREATIONTIME, 'YYYY-MM-DD HH24:MI:SS') CREATIONTIME,CREATIONHOST, MODIFICATIONHOST,ATLASRELEASE,TRANSFORMATION,PRODUSERID,ATTEMPTNR,JOBSTATUS, to_char(STARTTIME, 'YYYY-MM-DD HH24:MI:SS') STARTTIME, to_char(ENDTIME, 'YYYY-MM-DD HH24:MI:SS') ENDTIME, CPUCONSUMPTIONTIME,CPUCONSUMPTIONUNIT,TRANSEXITCODE,PILOTERRORCODE,EXEERRORCODE, EXEERRORDIAG,SUPERRORCODE,SUPERRORDIAG,DDMERRORCODE,DDMERRORDIAG,BROKERAGEERRORCODE,BROKERAGEERRORDIAG, JOBDISPATCHERERRORCODE,JOBDISPATCHERERRORDIAG,COMPUTINGSITE,COMPUTINGELEMENT,PRODDBLOCK,DESTINATIONDBLOCK, DESTINATIONSE,NEVENTS,CPUCONVERSION,JOBEXECUTIONID,PROCESSINGTYPE,PRODUSERNAME,NINPUTFILES

from atlas_panda.JOBSDEFINED4 where MODIFICATIONTIME >= to_date(:start_time, 'YYYY-MM-DD HH24:MI:SS') and MODIFICATIONTIME < to_date(:end_time, 'YYYY-MM-DD HH24:MI:SS')

union ALL

select PANDAID, GRID, PRODSOURCELABEL,to_char(MODIFICATIONTIME, 'YYYY-MM-DD HH24:MI:SS') MODIFICATIONTIME,JOBDEFINITIONID,SCHEDULERID, to_char(CREATIONTIME, 'YYYY-MM-DD HH24:MI:SS') CREATIONTIME,CREATIONHOST, MODIFICATIONHOST,ATLASRELEASE,TRANSFORMATION,PRODUSERID,ATTEMPTNR,JOBSTATUS, to_char(STARTTIME, 'YYYY-MM-DD HH24:MI:SS') STARTTIME, to_char(ENDTIME, 'YYYY-MM-DD HH24:MI:SS') ENDTIME, CPUCONSUMPTIONTIME,CPUCONSUMPTIONUNIT,TRANSEXITCODE,PILOTERRORCODE,EXEERRORCODE, EXEERRORDIAG,SUPERRORCODE,SUPERRORDIAG,DDMERRORCODE,DDMERRORDIAG,BROKERAGEERRORCODE,BROKERAGEERRORDIAG, JOBDISPATCHERERRORCODE,JOBDISPATCHERERRORDIAG,COMPUTINGSITE,COMPUTINGELEMENT,PRODDBLOCK,DESTINATIONDBLOCK, DESTINATIONSE,NEVENTS,CPUCONVERSION,JOBEXECUTIONID,PROCESSINGTYPE,PRODUSERNAME,NINPUTFILES

from atlas_panda.JOBSWAITING4 where MODIFICATIONTIME >= to_date(:start_time, 'YYYY-MM-DD HH24:MI:SS') and MODIFICATIONTIME < to_date(:end_time, 'YYYY-MM-DD HH24:MI:SS')

Some more useful info for the record

The task for production is defined in the production DB, for analysis there is no description of task as such, but the task is defined by combination of userid and jobdefinitioid

When user submits a new job the record is created in the JOBSDEFINED , for production from JOBSDEFINED table record can be moved eithr to JOBACTIVATED if input already exists, or to WAITING if input does not exist yet. For analysis PATHENA already resolved the name of the site where job should go, so job is moved to JOBACTIVATED. For historical reason for ~5 minutes job record can exist both in JOBSDEFINED and other table wher job was moved to. The job record stays in JOBACTIVATED table unless the job is finished whatever resuplt of processing is. When it is finished it is moved to JOBSARCHIVE table. Every new resubmission would result in creating a new record with a new pandaid in JOBSDEFINED table.

Are there some standard APIs available to retrieve data from PANDA DB? There is an api to PANDA which we need to look into :

https://svnweb.cern.ch/trac/panda/browser/panda-server/current/pandaserver/userinterface/Client.py

Regarding errror codes, according to Tadashi normally a single job has one single error code, so we can keep it in two fields we currently have in Dashboard, one for GRID another one for application.

Dashboard PanDA collector

Collector is intended to retrieve job/task related information from PANDA DB and stores in ATLAS Dashboard Job monitoring DB. http://svnweb.cern.ch/world/wsvn/dashboard/trunk/arda.dashboard.service-collector-panda/#path_trunk_arda.dashboard.service-collector-panda_stout
Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r17 - 2010-09-07 - EdwardKaravakis
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback