Task/Job monitoring

Description of Dashboard API

List of requirements

Development server

Task monitoring dashboard development server launched on http://pcadc01.cern.ch/

Front page: List of users

List of users, who run jobs during last month.

URL :http://pcadc01.cern.ch/client/index.html

Action name: gangataskmonitoring

URL request:

http://pcadc01.cern.ch/dashboard/request.py/gangataskmonitoring

Output JSON object format:

{"basicData": {"GridName": "UserName1"},... {"GridName": "UserNameN"}}

List of tasks

List of tasks for defined user during given time period (from-to) OR timerange

Action name: gangataskstable

Parameters:

    • usergridname,
    • timerange OR from , to time period in format
    • typeofrequest=A (currently mandatory)

Example:

  • gangataskstable?usergridname=%22KonstantinosKousouris%22&from=2010-04-02%2018:20&to=2010-04-07%2012:30&typeofrequest=A

JSON output example:

{"user_taskstable": [{"Executable": "cmsRun", "UNKNOWN": 239, "SubmissionType": "direct", "Application": "CMSSW",  "NUMOFJOBS": 264,  "TargetCE": "15_Selected_SE", "SubmissionTool": "crab", "SubmissionUI": "T1_US_FNAL",   "PENDING": 0, "TASKMONID": "kkousour_crab_0_100411_030027_1mv54a", "TaskType": "analysis",   "ApplicationVersion": "CMSSW_3_5_6", "TaskId": 3338662, "SUCCESS": 0, "TaskCreatedTimeStamp": "2010-04-11 10:08",   "SchedulerName": "LOCALFNAL", "TaskMonitorId": "kkousour_crab_0_100411_030027_1mv54a",   "FAILED": 25, "RUNNING": 0, "NEventsPerJob": 35761, "InputCollection": "/MinimumBias/Commissioning10-PromptReco-v8/RECO", "SubToolVersion": "2.7.1"}]}

Jobs of chosen tasks

Action name: gangataskjobs

URL request:

Parameters:

  • taskmonitorid;
  • what. Could be :
    • all - Displays all job states;
    • P - pending;
    • R - running;
    • U - unknown;
    • S - successful;
    • F - failed;

Example:

gangataskjobs?taskmonid=kkousour_crab_0_100411_030027_1mv54a&what=all

Output JSON object format example for what = all parameter :

{"taskjobs": [[{"STATUS": "U", "resubmissions": 1, "EventRange": "1", "started": "2010-04-11 08:03", "GridEndId": "U",   "AppGenericStatusReasonValue": "Error return without specification",  "finished": "2010-04-12 08:03",  "submitted": "2010-04-11 08:03", "Site": "T3_US_FNALLPC",  "TaskJobId": 153301646, "JobExecExitCode": null,   "SchedulerJobId": "https://cmslpc16.fnal.gov/be0adc69f8cfa9f7a184ad7ce27dd2b2c81c68fa/1", "GridEndReason": "unknown"}],   , {"username": "\"KonstantinosKousouris\"", "what": "ALL", "taskmonid": "kkousour_crab_0_100411_030027_1mv54a"}]} 

Page details in case of parameter what=all

Plots

  • Terminated Jobs by Site ( grouping by "Site" and "STATUS" values);
  • Graphical Overvew (used status values(# running, #pending, etc ) from Page2 for appropriate task;
  • Successful Jobs Distributed by Site ( grouping by sites with :success" status);
  • Processed Events Cumulative Plot.
    • Action name: proceventscumulativeAlt
    • URL request:
    • Parameters:
      • taskmonid
    • JSON output:

{"totaljobs": [[{"TOTAL": #jobs}], {"taskmonid": "taskmonid"}],

"procevents": [[{"NEventsPerJob": #events}], {"taskmonid": "taskmonid"}],

"succjobs": [[{"TOTAL": #jobs, "TOTALEVENTS": #total events}], {"taskmonid": "andersj_pip_e_1_10_5o7w2r"}],

"meta": { ... }, "allfinished": [[ {"finished": Timestamp, "Events": #events}, ...}], "lastfinished": [[{"finished": Timestamp}], {"taskmonid": taskmonid}],

"firststarted": [[{"started": Timestamp}], {"taskmonid": taskmonid}]}

Jobs' table

Column name Key
SchedulerJobId SchedulerJobId
Id in Task EventRange
Status STATUS;
* P - pending;
* R - running;
* U - unknown;
* S - successful;
* F - failed;
Appl Exit Code

Check STATUS, if "P" or "R" - display "Not yet"

JobExecExitCode >-1?JobExecExitCode : Unknown

toolTipText:

AppGenericStatusReasonValue
Grid End Status

GridEndId

toolTipText: GridEndReason

Retries resubmissions
Site Site
Submitted submitted
Started started
Finished finished

JSON output example for what = F parameter :

{"taskjobs": [[{"JobExitReason": "  Output file(s) not found", "resubmissions": 1, "EventRange": "23",  "started": "2010-04-11 08:03", "GridEndId": "U",  "finished": "2010-04-11 08:26", "submitted": "2010-04-11 08:03", "Site": "T3_US_FNALLPC",  "TaskJobId": 153301734,  "AppStatusReason": "unknown", "JobExitCode": 60302, "SchedulerJobId": "https://cmslpc16.fnal.gov/be0adc69f8cfa9f7a184ad7ce27dd2b2c81c68fa/23",   "GridEndReason": "unknown"}],  {"username": "\"KonstantinosKousouris\"", "what": "F", "taskmonid": "kkousour_crab_0_100411_030027_1mv54a"}]}

Resubmitted jobs

Action name: resubmittedjobsAtl

URL request:

Parameters:

Colomn nameSorted descending Key
Submitted submitted
Started started
Site Site
JobExitReason JobExitReason
Id in Task EventRange
Grid End Status GridEndId
Finished finished
Appl Exit Reason AppStatusReason
Appl Exit Code JobExitCode

Example: Request : resubmittedjobsAtl?what=ALL&taskjobid=152060487&taskmonid=aproskur_crab_0_100406_145411_tl7n64

JSON output example:

{"rsJobs": [{ "JobExitReason": "CMS exception (CMSSW)", "EventRange": "1", "started": "2010-04-06 13:00:05", "GridEndId": "D", "Site": "T2_UK_London_Brunel", "submitted": "2010-04-06 12:55:10", "finished": "2010-04-06 13:10:06", "AppStatusReason": "unknown", "JobExitCode": 8001, "SchedulerJobId":"https://wms218.cern.ch:9000/p4J3bxUEvPlLICjIXb1olg", "GridEndReason": "unknown"}, ...}]}

-- LauraSargsyan - 27-Apr-2010

Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r12 - 2011-11-09 - LauraSargsyan
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback