Big PanDA Monitoring

Introduction

BigPanDA Monitoring Requirements

Draft of monitoring needs for LSST VO

  • Summary of activity per PanDA instance (multi-VO summary view)
  • VO production summary
  • physics group/single user activity summary page
  • job properties
  • job history
  • accounting?
  • adjacent activities such as SW installation and data distribution
  • logging: server activity, monitor activity; VO specific resources availability: sites, slots, SW, data, ...
All these things are possible to modularize, and can be done VO-independently. When a VO-specific need of change emerges, the models can be extended.

PanDA-Condor interface

Survey of the existing monitor capabilities

Monitoring pages

[1] A single job details page

  • (01-01) Status summary table: PanDAID, Owner, Working group; job name, job attempt number, job input/output dataset; job status; timestamp Created, interval "time to start", duration of the job, timestamp Ended/Modified; cloud, siteID; job priority.
  • (01-02) File input/output table: filename, type, status, dataset and dispatch block.
  • (01-03) Find and view log files.
  • (01-04) Log records for job.
  • (01-05) Job details table: column name, column value. In order of columns in the table (e.g. jobsdefined4) definition.
  • (01-06) Production job: link to "show associated task".
  • (01-07) Production job: link to "show recent jobs for task".
  • (01-08) Production job: link to "Show script to re-create job for offline debugging".
  • (01-09) Production job: show log extract.
  • (01-10) Production job: Transformation tags: list of trf tags. Link to "Interpret tags and show transformation configuration".
  • (01-11) Associated build job info (if applicable).
  • (01-12) Job history: list PanDAID, attempt nr., timestamp Ended/Modified, job name, cloud/site, error summary.

[2] User's activity summary page

  • (02-01) Activity statistics: nr. of jobs in past week, job types with nr of jobs per job type, 24h quota (in CPU hours), link to doc with priority degradation, list of groups; usage in CPU hrs for personal analysis/group production in the last 1 day and in the last 7 days.
  • (02-02) Filter summary of jobs: selector of job type/all jobs, text field timespan "last N days", text field jobsetID, selector of job status/all jobs, show/hide summary plot, button Retrieve All.
  • (02-03) Summary wall: number of selected jobs. Group by states with number of jobs in that state, number of jobs links to filtered page; Followed by several lines, one line per quantity, each line contains number of distinct categories of that quantity, and list of categories with number of jobs per that category. Quantities are: Users, Processing types, Job types, Transformations, Working groups, Creation hosts, Sites, Regions, Clouds, Jobsets; Link to filtered pages "show jobs for last 7/15/30 days".
  • (02-04) Button to show/hide table with datasets used by selected jobs: User datasets, type, last used, used in N jobs.
  • (02-05) Button to show/hide job sets: user:jobsetID linking to jobset summary page, timestamp Created, timestamp Latest [Q-001], #jobs total (build job included), #jobs in Pre-run, Running, Holding, Finished, Failed, Cancelled, Merging, buildJob status, Site, Input/Output dataset, link to ATLAS Analysis Task Monitor for the input/output dataset.
  • (02-06) Information about "showing N jobs from ... to ..." (N <= 1000).

[3] User's jobset summary page

  • (03-01) Filter: same filter as in (02-02).
  • (03-02) Summary wall as in (02-03).
  • (03-03) Table with jobset summary with only 1 jobset, same fields as (02-05).
  • (03-04) Information about how many jobs are shown, see (02-06).
  • (03-05) Table with jobs summary, one row per job, same info as in (01-01) per job.

[4] Production

[5] Clouds

  • http://panda.cern.ch/server/pandamon/query?dash=cloud
  • title: Clouds and tasks
  • Cloud status:
    • hint: TLo, THi are transfer timeouts for low and high priority jobs
    • table1: columns: Cloud, Tier1, Status, Comment, Tasks, TLo days, THi days, Storage used by PanDA, Free space (GB), as of, Workload SI2Kdays, Weight, Nprestage
    • table1: one line per cloud
  • Fast track
    • line with text: Fast track clouds used for high priority tasks: CA:true DE:true ES:true FR:true IT:true ND:true NL:true RU:true TW:true UK:true US:true
    • line with text: Clouds used for validation tasks: CA:true CERN:true DE:true ES:true FR:true IT:true ND:true NL:true RU:true TW:true UK:true US:true
    • table2: hint: Cloud/site details: nqueue is the queue depth (number of pilots in pre-running states) maintained by the scheduler.
npilots is the current queue depth which should roughly match nqueue.

[6] Incidents

[7] DDM

[8] PandaMover

[9] AutoPilot

[10] Sites

  • http://panda.cern.ch/server/pandamon/query?dash=site
  • title: Panda Sites
  • 196 analysis sites -- click to see content in the table
  • 11 installation sites -- click to see content in the table
  • 1 pandamover sites -- click to see content in the table
  • 109 production sites -- click to see content in the table
  • 103 production multicloud sites -- click to see content in the table
  • 70 test sites -- click to see content in the table
  • 4 test multicloud sites -- click to see content in the table
  • table:

[11] Releases

[12] Analysis

[13] Stats

[14] Users

[15] Physics data

[16] Left Panel: Jobs

"Search" link

http://panda.cern.ch/server/pandamon/query?mode=dbquery

Leads to a Web form with the following fields:

  • Job ID
  • Panda ID
  • SW Release
  • Transformation
  • Job Status
  • Project
  • Scheduler ID
  • Creation Time (mm/dd/yyyy). start
  • Creation Time (mm/dd/yyyy). end

Summary subpanel

Just like in most blocks of the left panel, the links serve as a quick way to query a URL with slightly varying parameters. In this case, the tabulated data has the following columns: Cloud, Pilots, %fail, Latest, defined, pending, waiting, assigned, activated, sent, starting, running, holding, transferring, finished, failed, cancelled, unassigned

Queries appear to contain the 72 hour limit as a parameter by default. Links go to the "platform" mon page.

All Jobs http://pandamon.cern.ch/jobinfo?limit=2 summary stats on top, by the user, processing types, job types, working groups;
user:jobset tabulated data on the bottom, the usual breakdown by job state
CMS Jobs http://pandamon.cern.ch/jobinfo?VO=cms&limit=2 link to the "platform" mon page, currently empty table
analysis http://pandamon.cern.ch/cloudsummary?jobtype=analysis link to the "platform" mon page, breakdown by the cloud
production,test http://pandamon.cern.ch/cloudsummary?jobtype=production,test  
jedi http://pandamon.cern.ch/cloudsummary?jobtype=jedi&hours=72 entries grouped by the cloud; breakdown by a variety of state condition
all http://pandamon.cern.ch/cloudsummary?jobtype=&hours=72  

States subpanel

pending http://pandamon.cern.ch/jobinfo?jobStatus=pending&plot=no
defined http://pandamon.cern.ch/jobinfo?jobStatus=defined&plot=no
waiting http://pandamon.cern.ch/jobinfo?jobStatus=waiting&plot=no
assigned http://pandamon.cern.ch/jobinfo?jobStatus=assigned&plot=no
activated http://pandamon.cern.ch/jobinfo?jobStatus=activated&plot=no
sent http://pandamon.cern.ch/jobinfo?jobStatus=sent&plot=no
starting http://pandamon.cern.ch/jobinfo?jobStatus=starting&plot=no
running http://pandamon.cern.ch/jobinfo?jobStatus=running&plot=no
holding http://pandamon.cern.ch/jobinfo?jobStatus=holding&plot=no
transferring http://pandamon.cern.ch/jobinfo?jobStatus=transferring&plot=no
finished http://pandamon.cern.ch/jobinfo?jobStatus=finished&plot=no
failed http://pandamon.cern.ch/jobinfo?jobStatus=failed&plot=no
cancelled http://pandamon.cern.ch/jobinfo?jobStatus=cancelled&plot=no
unassigned http://pandamon.cern.ch/jobinfo?jobStatus=cancelled&computingSite=NULL&limit=200&plot=no

example - screenshot

[17] Left Panel: Jobs Types

It appears that these are customized search links to expedite queries. The timing cutoff appears to be included in the query, i.e. it's not coded in the service. Links contain queries of the type http://pandamon.cern.ch/jobinfo?jobtype=XXX&hours=3&plot=no, where XXX can take the following values: analysis, production, prod_test, install, test, retried, jedi, merging, all

Links take the user to the "platform" pages. These pages contain a summary panel on top, which breaks the information down by users, releases, processing types, job types, transformations, Working Groups, creation hosts, sites, clouds, jobsets.

The job info table that follow the top section has the following columns: User:jobsetID, Created, Latest, Jobs, Pre-run, Running, Holding, Finished, Failed, Cancelled, Merging, buildJob, Site. Interestingly, the latter contains site names which are links to the old Panda Monitor (i.e. throws back to the original page).

Links summary:

analysis http://pandamon.cern.ch/jobinfo?jobtype=analysis&hours=3&plot=no
production http://pandamon.cern.ch/jobinfo?jobtype=production&hours=3&plot=no
prod_test http://pandamon.cern.ch/jobinfo?jobtype=prod_test&hours=3&plot=no
install http://pandamon.cern.ch/jobinfo?jobtype=install&hours=3&plot=no
test http://pandamon.cern.ch/jobinfo?jobtype=test&hours=3&plot=no
retried http://pandamon.cern.ch/jobinfo?jobtype=&jobStatus=retried&plot=no
jedi http://pandamon.cern.ch/jobinfo?jobtype=jedi&hours=3&plot=no
merging http://pandamon.cern.ch/jobinfo?jobtype=&processingType=usermerge&hours=3&plot=no
all http://pandamon.cern.ch/jobinfo?jobtype=&hours=3&plot=no

[18] Left Panel: Jobs Timing

This is another interface to a query engine, with the following breakdown (the links present histograms for the past 3 hours)
analysis http://pandamon.cern.ch/jobs/jobtiming?jobtype=analysis&hours=3
production http://pandamon.cern.ch/jobs/jobtiming?jobtype=production&hours=3
prod_test http://pandamon.cern.ch/jobs/jobtiming?jobtype=prod_test&hours=3
test http://pandamon.cern.ch/jobs/jobtiming?jobtype=test&hours=3
jedi http://pandamon.cern.ch/jobs/jobtiming?jobtype=jedi&hours=3
all http://pandamon.cern.ch/jobs/jobtiming?jobtype=&hours=3

Pages contain histograms of job execution time distributions: execute, cpu, setup, stagein, stageout. As is evident from above, queries are basically same, just select on different job types.

example - screenshot

[19] Left Panel: Jobs Metrics

"VM metrics" on top is a link to Wiki: https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/PandaPilot#QaJobMetrics

Other links in this subpanel include the following. As in many other cases, a 3 hour window is assumed by default.

analysis http://pandamon.cern.ch/jobs/jobram?jobtype=analysis&hours=3
production http://pandamon.cern.ch/jobs/jobram?jobtype=production&hours=3
prod_test http://pandamon.cern.ch/jobs/jobram?jobtype=prod_test&hours=3
jedi http://pandamon.cern.ch/jobs/jobram?jobtype=jedi&hours=3
all http://pandamon.cern.ch/jobs/jobram?jobtype=&hours=3

example - screenshot

[20] Left Panel: Jedi tasks

Top Subpanel - Summary

These links present tabulated data with columns:_JediTaskID, status, username, Files, Used, Failed, transUses, transPath, modificationTime, taskname_

All Tasks http://pandamon.cern.ch/jedi/taskinfo?tasktype=*&limit=2000
My Tasks https://pandamon.cern.ch/jedi/taskinfo?tasktype=analysis&username=auto&hours=24

In the latter, certificate required for auth. Obviously covers analysis tasks.

Bottom Subpanel - States

Note that 3 day time cutoff is built into the service, not in the URL query.
registered http://pandamon.cern.ch/jedi/taskinfo?status=registered
defined http://pandamon.cern.ch/jedi/taskinfo?status=defined
waiting http://pandamon.cern.ch/jedi/taskinfo?status=waiting
ready http://pandamon.cern.ch/jedi/taskinfo?status=ready
pending http://pandamon.cern.ch/jedi/taskinfo?status=pending
scouting http://pandamon.cern.ch/jedi/taskinfo?status=scouting
running http://pandamon.cern.ch/jedi/taskinfo?status=running
holding http://pandamon.cern.ch/jedi/taskinfo?status=holding
merging http://pandamon.cern.ch/jedi/taskinfo?status=merging
prepared http://pandamon.cern.ch/jedi/taskinfo?status=prepared
finished http://pandamon.cern.ch/jedi/taskinfo?status=finished
aborting http://pandamon.cern.ch/jedi/taskinfo?status=aborting
aborted http://pandamon.cern.ch/jedi/taskinfo?status=aborted
finishing http://pandamon.cern.ch/jedi/taskinfo?status=finishing
broken http://pandamon.cern.ch/jedi/taskinfo?status=broken
failed http://pandamon.cern.ch/jedi/taskinfo?status=failed

[21] Left Panel: FAX

There is a single subanel labeled "Summary", which contains:
Fail over: http://pandamon.cern.ch/fax/failover

Presents a table titled: "Panda report on jobs failovers to FAX over last 6 hours", with the following columns: Site, Jobs, WithFAX [files], WithoutFAX [files], WithFAX [GB], WithoutFAX [GB]

example - screenshot


Configuration: http://pandamon.cern.ch/fax/map An elaborate graphic with a world map and superimposed structure of the graph, representing the FAX network in action (directed arrows etc).

[22] Left Panel: Quick Search

http://panda.cern.ch:25980/server/pandamon/query?tp=main#

There are forms which enable search function by the following parameters:

  • Panda job ID
  • Panda id by LFN
  • Batch ID
  • Dataset
  • Task request
  • Task status
  • File

Example of the form code:

< form id="formdsid" action="http://panda.cern.ch/server/pandamon/query" style="margin-top:0; margin:0; border:0; font-size:8pt;" method="GET " >

     Dataset 

    < input id="informdsid" type="text" border="0" maxlength="200" size="6" name="name" tyle="margin-top:0; margin:0; border:0; font-size:8pt;" title="Please fill out this field" >< /input >
    < input type="hidden" value="dset" name="mode" >< /input >

< /form >

[23] Left Panel: Errors

Contains links like: http://pandamon.cern.ch/jobs/joberror?jobtype=production&item=taskID. All links go to the page "jobs/joberrors". Note that the time periods in queries are not uniform.

Production Tasks http://pandamon.cern.ch/jobs/joberror?jobtype=production&item=taskID
Production Sites http://pandamon.cern.ch/jobs/joberror?jobtype=production&item=computingSite&hours=48
Production Users http://pandamon.cern.ch/jobs/joberror?jobtype=production&item=prodUserName&hours=48
Analysis Site http://pandamon.cern.ch/jobs/joberror?jobtype=analysis&item=computingSite&hours=48
Analysis Users http://pandamon.cern.ch/jobs/joberror?jobtype=analysis&item=prodUserName&hours=48
Production Cloud http://pandamon.cern.ch/jobs/joberror?jobtype=production&item=cloud&hours=48
Analysis Cloud http://pandamon.cern.ch/jobs/joberror?jobtype=analysis&item=cloud&hours=48
JEDI Task http://pandamon.cern.ch/jobs/joberror?jobtype=&item=JediTaskID
CMS jobs http://pandamon.cern.ch/jobs/joberror?jobtype=analysis&item=computingSite&VO=cms&hours=72
My Jobs https://pandamon.cern.ch/jobs/joberror?jobtype=analysis&item=computingSite&prodUserName=auto&hours=72

Analysis Users: example - screenshot - note that the distributions on this page are bar graphs, displaying time evolution of various errors for users

[24] Left Panel: Summaries

Contains a few forms for queries as detailed below:
  • Blocks: days (entry field)
    • Displays a screen for N days, titled "Production job blocks (destinationDBlock) active in the last N*24 hours. Managed production jobs only"; columns: Destination Block (as a link labeled by the destination block name), subscript - number of jobs in various states, total number of jobs, ATLAS release.
  • Errors: days (entry field)
  • Nodes: days (entry field). Note: potentially a problematic query, long query times and tons of data, does not scale.
    • Displays a screen for N days, with a table listing nodes per site, for N days; columns: All Site, Job, Nodes, Jobs, Latest, defined, assigned, waiting, activated, sent, running, holding, transferring, finished, failed, tot/trf/other
  • Usage 1, 3 days:
    • Displays breakdown of usage by the cloud, presents a pie chart. Metrics - wall time, jobs, nodes. Same type of info for sites. Further, for each site there is breakdown for wall time per node and wall time per job.

This is a sample HTML code for one of the forms:

< form action="http://panda.cern.ch/server/pandamon/query" style="margin:1px; border:0; font-size:8pt;" method="GET" >


     Blocks: 

    < input type="text" border="0" maxlength="3" size="3" name="days" style="margin-top:0; margin:0; border:0; font-size:8pt;" title="Please fill out this field" >< /input >
    < input type="hidden" value="prodlist" name="overview" >< /input >

     days

< /form >

[25] Left Panel: Tasks

Top link: "Search": http://panda.cern.ch/server/pandamon/query?mode=taskquery.
It's a generic task and tag search facility, and contains a form, in which a table exposes the following fields:
  • Project
  • Input Dataset
  • Transformation
  • Transformation Version
  • Grid Flavour (sic!)
  • Submitted By
  • Output Task Name
  • Configuration Tag
  • Requested by - selector based on "Group" and "Regional" identity
  • Status
  • Request ID
  • More (?)
  • Query Tasks/Tags (selector)

Depending on the type of "Task Req", these links lead to pages formatted accordingly.
Generic Task Req http://panda.cern.ch/server/pandamon/query?mode=reqtask1
EvGen Task Req http://panda.cern.ch/server/pandamon/query?mode=reqtask2&type=evgen
HLT Task Req http://panda.cern.ch/server/pandamon/query?mode=reqtask3
Task list https://pandamon.cern.ch/tasks/listtasks1 WARNING: slow
New Tag http://panda.cern.ch/server/pandamon/query?mode=defNewTag
Bug Report http://panda.cern.ch/server/pandamon/query?mode=listBugReport
Task overview query http://panda.cern.ch/server/pandamon/query?mode=tinfoSearch

Comments to the above table:

  • New Tag generates tags based on the following user input
    • transformation name
    • software version
    • production branch
    • userid
    • function

  • Bug Report - quick links to Savannah pages covering Athena, Data access on the Grid, and "Any other problem", which goes to ADC operators.
  • Task overview query - a quick query of tasks based on the tag name and a regex in the task name

On the bottom, there is an entry field "clone task". Presumably takes the ID of the task to be cloned. This function is not robust because it will bomb, for example, if you enter "1", and display diagnostics including some of the server code. This is the code for the form that supports this function:

< form action="https://pandamon.cern.ch/tasks/clonetask" style="margin-top:0; margin:0; border:0; font-size:8pt;" method="GET" >
     Clone Task 
    < input type="text" border="0" maxlength="7" size="6" name="tid" style="margin-top:0; margin:0; border:1; font-size:8pt;" title="Please,provide the TaskID to clone" >< /input >

< /form >

[26] Left Panel: Datasets

"Search": http://panda.cern.ch/server/pandamon/query?mode=dbquery - generic form-based search of datasets on the following attributes
  • Name
  • DataFormat
  • Tag
  • StartTime (days ago)
  • has the option to apply these selectors to Container or Dataset
  • Choice of formats for output - summary, object list, object list + file size

Also, the subpanel contains:

DQ2 Share Search - Bourricot/DQ2 link (outside of anything monitor) https://bourricot.cern.ch/dq2/share/results/
DQ2 Popularity - seems to link to a custom Python script, which powers a form to query DQ2: http://popularity.cern.ch/
Aborted datasets - presents a table with the following columns: dataset, task id, status, timestamp http://panda.cern.ch/server/pandamon/query?mode=listAbortedDatasetsState

[26] Left Panel: Datasets Distribution

Large Section on top of the subpanel - DaTRI. Appears to hit same base query with two varying parameters.
Data_Transfer_Request http://panda.cern.ch/server/pandamon/query?mode=ddm_req
List_User_Requests http://panda.cern.ch/server/pandamon/query?mode=ddm_req&action=List
List_Pathena_Requests http://panda.cern.ch/server/pandamon/query?mode=ddm_pathenareq&action=List
List_Ganga_Requests http://panda.cern.ch/server/pandamon/query?mode=ddm_gangareq&action=List
Group_Production http://panda.cern.ch/server/pandamon/query?mode=ddm_groupreq&action=List

Lower part:

PD2P http://panda.cern.ch/server/pandamon/query?mode=pd2p link to the "PD2P monitor" page, a table is presented with the following columns:
cloud, country, atlas site, ddm endpoint, num ds;
there is a selector on top for T1/T2 and the number of days.
Site names are links showing tables of datasets
AODs http://panda.cern.ch/server/pandamon/query?mode=listAODReplications link to the AOD dataset replication page, where the last 500 replications
are listed, and additional filters are available to narrow down the search
EVNTs http://panda.cern.ch/server/pandamon/query?mode=listEVNTReplications gets HTTP Error 503
Conditions DS http://panda.cern.ch/server/pandamon/query?mode=listConditionsDB distribution and status of the condition ds, by the cloud
DB Releases http://panda.cern.ch/server/pandamon/query?mode=listDBRelease  
SIT pacballs http://panda.cern.ch/server/pandamon/query?mode=listPacballs  
Validation Samples http://panda.cern.ch/server/pandamon/query?mode=listValidationReplications  
Functional Tests http://panda.cern.ch/server/pandamon/query?mode=listFunctionalTests the top portion contains a summary table of test dataset statuses by
the cloud; number of files, time metrics for transfer; bottom - large table of datasets;
Columns - Tier, Total Datasets, Total Files in datasets, Total CpFiles in datasets,
Subscribed, Transfer, Done, Suspect, Deleted, Last Subscription,
Last Transfer, Last FC Checked, Average datasets Transfer time
ATLAS Data http://panda.cern.ch/server/pandamon/query?mode=listCR same as above but for the data
Reprocessed_Datasets http://panda.cern.ch/server/pandamon/query?mode=listReproDSReplications yet another way to query dataset info; nothing is provided in the table by default,
but the top portion contains selection fields - cloud and regex for the dataset name;
also links to ARDA and DQ2 browsers

[28] Left Panel: Logging monitor

Single link "Incidents": http://panda.cern.ch/server/pandamon/query?mode=mon&hours=6.

Opens a table of the tabulated incidents data, with columns:

  • Category (can be blank, but can also be bamboo or prod)
  • Type
  • Level (e.g. debug, info, warning)
  • Count
  • First
  • Latest

[29] Left Panel: Analytics

Popularity http://panda.cern.ch/server/pandamon/query?mode=mon&hours=6 Reprocessed Dataset Replication Status; plots with histograms of jobs per dataset,
ordered by popularity; time slices for same information; same information in tabulated form
Analysis Timing http://panda.cern.ch/server/pandamon/query?mode=mon&hours=6 a few histograms showing distribution by execution time, for prun, pathena and ganga

[30] Platform Left Panel: Jobs

*Search*
http://panda.cern.ch:25880/server/pandamon/query?mode=jobquery

Summary

All Jobs http://pandamon.cern.ch/jobinfo?jobtype=*&limit=150
CMS Jobs http://pandamon.cern.ch/jobinfo?jobtype=analysis&VO=cms&limit=150
My Jobs https://pandamon.cern.ch/jobinfo?jobtype=analysis&prodUserName=auto&hours=24
ATLAS Jobs http://pandamon.cern.ch/jobinfo?jobtype=*&VO=atlas&limit=150

Clouds' Summary

analysis http://pandamon.cern.ch/cloudsummary?jobtype=analysis
production,test http://pandamon.cern.ch/cloudsummary?jobtype=production,test
jedi http://pandamon.cern.ch/cloudsummary?jobtype=jedi&hours=72
all http://pandamon.cern.ch/cloudsummary?jobtype=&hours=72

Regions' Summary

analysis http://pandamon.cern.ch/cloudsummary?jobtype=analysis&region=yes
production,test http://pandamon.cern.ch/cloudsummary?jobtype=production,test&region=yes
jedi http://pandamon.cern.ch/cloudsummary?jobtype=jedi&hours=72&region=yes
all http://pandamon.cern.ch/cloudsummary?jobtype=&hours=72&region=yes

States

defined http://pandamon.cern.ch/jobinfo?jobStatus=defined&plot=no
pending http://pandamon.cern.ch/jobinfo?jobStatus=pending&plot=no
waiting http://pandamon.cern.ch/jobinfo?jobStatus=waiting&plot=no
assigned http://pandamon.cern.ch/jobinfo?jobStatus=assigned&plot=no
activated http://pandamon.cern.ch/jobinfo?jobStatus=activated&plot=no
sent http://pandamon.cern.ch/jobinfo?jobStatus=sent&plot=no
starting http://pandamon.cern.ch/jobinfo?jobStatus=starting&plot=no
running http://pandamon.cern.ch/jobinfo?jobStatus=running&plot=no
holding http://pandamon.cern.ch/jobinfo?jobStatus=holding&plot=no
transferring http://pandamon.cern.ch/jobinfo?jobStatus=transferring&plot=no
finished http://pandamon.cern.ch/jobinfo?jobStatus=finished&plot=no
failed http://pandamon.cern.ch/jobinfo?jobStatus=failed&plot=no
cancelled http://pandamon.cern.ch/jobinfo?jobStatus=cancelled&plot=no
unassigned http://pandamon.cern.ch/jobinfo?jobStatus=cancelled&computingSite=NULL&limit=150&plot=no

[31] Platform Left Panel: Job Types

Types

analysis http://pandamon.cern.ch/jobinfo?jobtype=analysis&hours=3&plot=no
production http://pandamon.cern.ch/jobinfo?jobtype=production&hours=3&plot=no
prod_test http://pandamon.cern.ch/jobinfo?jobtype=prod_test&hours=3&plot=no
install http://pandamon.cern.ch/jobinfo?jobtype=install&hours=3&plot=no
test http://pandamon.cern.ch/jobinfo?jobtype=test&hours=3&plot=no
retried http://pandamon.cern.ch/jobinfo?jobtype=&jobStatus=retried&plot=no
jedi http://pandamon.cern.ch/jobinfo?jobtype=jedi&hours=3&plot=no
merging http://pandamon.cern.ch/jobinfo?jobtype=&processingType=usermerge&hours=3&plot=no
all http://pandamon.cern.ch/jobinfo?jobtype=&hours=3&plot=no

[32] Platform Left Panel: Job Timing

Timing: https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/PandaPilot#QaPilotTiming

analysis http://pandamon.cern.ch/jobs/jobtiming?jobtype=analysis&hours=3
production http://pandamon.cern.ch/jobs/jobtiming?jobtype=production&hours=3
prod_test http://pandamon.cern.ch/jobs/jobtiming?jobtype=prod_test&hours=3
test http://pandamon.cern.ch/jobs/jobtiming?jobtype=test&hours=3
jedi http://pandamon.cern.ch/jobs/jobtiming?jobtype=jedi&hours=3
all http://pandamon.cern.ch/jobs/jobtiming?jobtype=&hours=3

[33] Platform Left Panel: Jobs' Metrics

VM Metrics: https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/PandaPilot#QaJobMetrics

analysis http://pandamon.cern.ch/jobs/jobram?jobtype=analysis&hours=3
production http://pandamon.cern.ch/jobs/jobram?jobtype=production&hours=3
prod_test http://pandamon.cern.ch/jobs/jobram?jobtype=prod_test&hours=3
jedi http://pandamon.cern.ch/jobs/jobram?jobtype=jedi&hours=3
all http://pandamon.cern.ch/jobs/jobram?jobtype=&hours=3

[34] Platform Left Panel: Jedi Tasks

Summary

All Tasks http://pandamon.cern.ch/jedi/taskinfo?tasktype=*&limit=2000&days=20
My Tasks https://pandamon.cern.ch/jedi/taskinfo?tasktype=analysis&username=auto&hours=24

States

registered http://pandamon.cern.ch/jedi/taskinfo?status=registered&days=20
waiting http://pandamon.cern.ch/jedi/taskinfo?status=waiting&days=20
defined http://pandamon.cern.ch/jedi/taskinfo?status=defined&days=20
pending http://pandamon.cern.ch/jedi/taskinfo?status=pending&days=20
assigning http://pandamon.cern.ch/jedi/taskinfo?status=assigning&days=20
ready http://pandamon.cern.ch/jedi/taskinfo?status=ready&days=20
scouting http://pandamon.cern.ch/jedi/taskinfo?status=scouting&days=20
running http://pandamon.cern.ch/jedi/taskinfo?status=running&days=20
holding http://pandamon.cern.ch/jedi/taskinfo?status=holding&days=20
merging http://pandamon.cern.ch/jedi/taskinfo?status=merging&days=20
prepared http://pandamon.cern.ch/jedi/taskinfo?status=prepared&days=20
finished http://pandamon.cern.ch/jedi/taskinfo?status=finished&days=20
aborting http://pandamon.cern.ch/jedi/taskinfo?status=aborting&days=20
aborted http://pandamon.cern.ch/jedi/taskinfo?status=aborted&days=20
finishing http://pandamon.cern.ch/jedi/taskinfo?status=finishing&days=20
broken http://pandamon.cern.ch/jedi/taskinfo?status=broken&days=20
failed http://pandamon.cern.ch/jedi/taskinfo?status=failed&days=20

[35] Platform Left Panel:FAX

Summary

Fail over http://pandamon.cern.ch/fax/failover?hours=20
Configuration http://pandamon.cern.ch/fax/map

[36] Platform Left Panel:Quick Search

A form for queries, sample code below:

< form action="/jobinfo" style="margin-top:0; margin:0; border:0; font-size:8pt; font-weight:normal;" method="GET" >

     Panda jobID 

    < input type="text" border="0" maxlength="100" size="5" name="job" style="margin-top:0; margin:0; border:1; font-size:8pt;" title="Please fill out this field" >< /input >

< /form >

Forms exist for the following attributes: Panda jobID, Batch ID, Dataset, Task request, Task status, File

[37] Platform Left Panel: Errors

Task Errors: http://pandamon.cern.ch/jobs/joberror#

Production Tasks http://pandamon.cern.ch/jobs/joberror?jobtype=production&item=taskID
Production Sites http://pandamon.cern.ch/jobs/joberror?jobtype=production&item=computingSite&hours=48
Production Users http://pandamon.cern.ch/jobs/joberror?jobtype=production&item=prodUserName&hours=48
Analysis Site http://pandamon.cern.ch/jobs/joberror?jobtype=analysis&item=computingSite&hours=48
Analysis Users http://pandamon.cern.ch/jobs/joberror?jobtype=analysis&item=prodUserName&hours=48
Production Cloud http://pandamon.cern.ch/jobs/joberror?jobtype=production&item=cloud&hours=48
Analysis Cloud http://pandamon.cern.ch/jobs/joberror?jobtype=analysis&item=cloud&hours=48
JEDI Tasks http://pandamon.cern.ch/jobs/joberror?jobtype=&item=JediTaskID
CMS jobs http://pandamon.cern.ch/jobs/joberror?jobtype=analysis&item=computingSite&VO=cms&hours=72
Job Types http://pandamon.cern.ch/jobs/joberror?jobtype=all&item=prodSourceLabel&hours=48&opt=sum+item+time
My Jobs https://pandamon.cern.ch/jobs/joberror?jobtype=analysis&item=jobsetID&prodUserName=auto&hours=72

[38] Platform Left Panel: Operations

Worker Nodes http://pandamon.cern.ch/wnlist?jobtype=&jobStatus=wn
Worker Nodes (prod) http://pandamon.cern.ch/wnlist?jobtype=production&jobStatus=wn
Worker Nodes (analy) http://pandamon.cern.ch/wnlist?jobtype=analysis&jobStatus=wn
Factories http://pandamon.cern.ch/wnlist?jobtype=&jobStatus=factory
Factories (prod) http://pandamon.cern.ch/wnlist?jobtype=production&jobStatus=factory
Factories (analy) http://pandamon.cern.ch/wnlist?jobtype=analysis&jobStatus=factory
Production (region) http://panda.cern.ch/?dash=prod
Production (cloud) http://panda.cern.ch/?dash=prod&view=cloud
Analysis http://panda.cern.ch/?dash=analysis
Analysis (CMS) http://pandamon.cern.ch/cloudsummary?jobtype=analysis&VO=cms
Analysis (ATLAS only) http://pandamon.cern.ch/cloudsummary?jobtype=analysis&VO=atlas
Clouds http://panda.cern.ch/?dash=clouds
Clouds' Spec http://pandamon.cern.ch/taskBuffer?method=getCloudList
Clouds' Config http://pandamon.cern.ch/taskBuffer?method=getCloudConfig
Sites http://panda.cern.ch/?dash=site
Sites' Spec http://pandamon.cern.ch/taskBuffer?method=getSiteInfo
Sites' Activities http://pandamon.cern.ch/wnlist
Releases http://pandamon.cern.ch/releaseinfo
AutoPilot http://panda.cern.ch/?tp=main
DDM http://panda.cern.ch/?dash=ddm
Blacklisted Sites http://bourricot.cern.ch/blacklisted_production.html
DQ2 DS Mover http://panda.cern.ch/?ddm=dash
Users http://pandamon.cern.ch/listusers
20 Most Active Users http://pandamon.cern.ch/listusers?topsize=20
Panda statistics http://panda.cern.ch/?mode=sitestats

[39] Platform Left Panel: Summaries

Three forms of this kind:

< input type="text" border="0" maxlength="3" size="3" name="days" style="margin-top:0; margin:0; border:0; font-size:8pt;" title="Please fill out this field"> < /input >

The forms are:

  • Blocks: days
  • Errors: days
  • Nodes: days

Links: Usage

1 days http://panda.cern.ch/?overview=usage
3 days http://panda.cern.ch/?overview=daysusage&days=3

[40] Platform Left Panel: Tasks

Top link: Search - http://pandamon.cern.ch/jobinfo?jobtype=&jobStatus=retried&plot=no#

Generic Task Req http://panda.cern.ch/?mode=reqtask1
EvGen Task Req http://panda.cern.ch/?mode=reqtask2&type=evgen
HLT Task Req http://panda.cern.ch/?mode=reqtask3
Task List http://pandamon.cern.ch/tasks/listtasks1
All Tasks List http://pandamon.cern.ch/tasks/listtasks1?failed=yes
New Tag http://panda.cern.ch/?mode=defNewTag
Bug Report http://panda.cern.ch/?mode=listBugReport
Task overview query http://panda.cern.ch/?mode=tinfoSearch

Form: Clone Task

< input type="text" border="0" maxlength="7" size="6" name="tid" style="margin-top:0; margin:0; border:1; font-size:8pt;" title="Please,provide the TaskID to clone" >< /input >

[41] Platform Left Panel: Datasets

Top link: Search - http://pandamon.cern.ch/jobinfo?jobtype=&jobStatus=retried&plot=no#

DQ2 Share Search https://bourricot.cern.ch/dq2/share/results/
DQ2 Popularity http://popularity.cern.ch/
Aborted datasets http://panda.cern.ch/?mode=listAbortedDatasetsState

[42] Platform Left Panel: Datasets Distribution

DaTRI:

Data_Transfer_Request http://panda.cern.ch/?mode=ddm_req
List_User_Requests http://panda.cern.ch/?mode=ddm_req&action=List
List_Pathena_Requests http://panda.cern.ch/?mode=ddm_pathenareq&action=List
List_Ganga_Requests http://panda.cern.ch/?mode=ddm_gangareq&action=List
Group_Production http://panda.cern.ch/?mode=ddm_groupreq&action=List

PD2P http://panda.cern.ch/?mode=pd2p
AODs http://panda.cern.ch/?mode=listAODReplications
EVNTs http://panda.cern.ch/?mode=listEVNTReplications
Conditions DS http://panda.cern.ch/?mode=listConditionsDB
DB Releases http://panda.cern.ch/?mode=listDBRelease
SIT pacballs http://panda.cern.ch/?mode=listPacballs
Validation Samples http://panda.cern.ch/?mode=listValidationReplications
Functional Tests http://panda.cern.ch/?mode=listFunctionalTests
ATLAS Data http://panda.cern.ch/?mode=listCR
Reprocessed_Datasets http://panda.cern.ch/?mode=listReproDSReplications

[43] Platform Left Panel: User

Summary http://pandamon.cern.ch/jobinfo?jobtype=&jobStatus=retried&plot=no#
20 Most Active Users http://pandamon.cern.ch/listusers?topsize=20

[44] Platform Left Panel: Stats

Stats http://pandamon.cern.ch/jobinfo?jobtype=&jobStatus=retried&plot=no#

[45] Platform Left Panel: Dashboards

Physics data http://panda.cern.ch/server/pandamon/query?dash=physics
Task Production http://dashb-atlas-task-prod.cern.ch/templates/task-prod
DDM Dashboard http://dashb-atlas-data.cern.ch/dashboard/ddm2/
SSB http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview
AGLT2 http://www.aglt2.org/csum.php

[46] Platform Left Panel: Logging Monitor

Summary http://pandamon.cern.ch/jobinfo?jobtype=&jobStatus=retried&plot=no#
Queue Control http://panda.cern.ch/?overview=incidents&typekey=queuecontrol
Configuration http://panda.cern.ch/?overview=incidents&typekey=monconfig

[47] Platform Left Panel: Analytics

Popularity http://pandamon.atlascloud.org/ppop
Analysis Timing http://pandamon.atlascloud.org/ptimes

[48] Platform Left Panel:New Panda

My Jobs https://pandamon.cern.ch/jobinfo?prodUserName=auto&hours=24
Introduction http://pandamon.cern.ch/system/home
Job Information http://pandamon.cern.ch/jobinfo
Job Errors http://pandamon.cern.ch/jobs/joberror?item=cloud
List Monitor Modules http://pandamon.cern.ch/alist
Incidents http://pandamon.cern.ch/logsummary
Users http://pandamon.cern.ch/listusers
Production http://pandamon.cern.ch/cloudsummary
Analysis http://pandamon.cern.ch/cloudsummary?jobtype=analysis
Releases http://pandamon.cern.ch/releaseinfo
Jobs/Releases http://pandamon.cern.ch/taskBuffer?method=countReleases
Sites http://pandamon.cern.ch/taskBuffer?method=getSiteInfo
Clouds http://pandamon.cern.ch/taskBuffer?method=getCloudList
Clouds' Config http://pandamon.cern.ch/taskBuffer?method=getCloudConfig
MC Shares http://pandamon.cern.ch/taskBuffer?method=getMCShares
JEDI Tasks http://pandamon.cern.ch/taskBuffer?method=getJediTaskAtt&db=jmt
Sites' MaxTime http://pandamon.cern.ch/schedcfg
Multi Core Tasks http://pandamon.cern.ch/mcore
Worker Nodes http://pandamon.cern.ch/wnlist
List Jobs' IDs and DataSets http://pandamon.cern.ch/joblfn
Plain List of the Selected Jobs' IDs http://pandamon.cern.ch/taskBuffer?method=getJobIds(status=%22failed%22,site=None,username=%22Jiri*%22,jobtype=%22analysis%22)
Get Prod Job Script http://pandamon.cern.ch/taskBuffer?method=getScriptOfflineRunning(1673414844)
Task Request http://pandamon.cern.ch/reqtask1
Task List http://pandamon.cern.ch/tasks/listtasks1
List Panda Error Codes http://pandamon.cern.ch/errorcodes
FAX viewer http://pandamon.cern.ch/fax/map
Classic PanDA Pages http://pandamon.cern.ch/old
Analysis Timing http://pandamon.atlascloud.org/ptimes
Popularity http://pandamon.atlascloud.org/ppop

[Q] Questions

[C] Comments

Query/Mode

Consider http://panda.cern.ch/server/pandamon/query?mode=listAbortedDatasetsState. There are plenty of queries that look similar to this one, with "query" mapping onto a Python class or method in all likelihood, and "mode" driving a selector (maybe through an "if" statement. The "query" url seems to be awfully overloaded not in the slightest bit mnemonic. Looking further, this pattern is pervasive and arguably bad design. Cf a completely different domain, which also has http://panda.cern.ch/server/pandamon/query?mode=dbquery. This one actually leads to a form. Compare it to: http://panda.cern.ch/server/pandamon/query?mode=reqtask3.

Factoring of the query parameters into URLs

Left panel contains a number of subpanels which oftentimes are built as shortcuts selecting a particular parameter in the otherwise identical URL (e.g. x=1 vs z=3). Curiously, the total number of clicks in this design is not reduced compared to a solution where the user navigates to a purpose built query page, from which they can continue with their decision as to what option to use. So this design pattern in the old/platform monitor only creates clutter in the left panel without offering distinct advantages.

Development matters



Major updates:

-- MaximPotekhin - 02 May 2014 - finished the "platform mon" section
-- MaximPotekhin - 21 Apr 2014 - reformatted most entries for the Left Panel as tables, and added a considerable number of URLs
-- MaximPotekhin - 12 Apr 2014
-- MaximPotekhin - 16 Mar 2014: Adding material according to plan
-- JaroslavaSchovancova - 04-Oct-2013



Responsible: TorreWenaus


Topic revision: r26 - 2014-05-09 - MaximPotekhin
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback