Panda in Hadoop

This page describes Panda data collected, describes collection process and lists related analysis codes.

Panda Job status

Each time a Panda job changes status this change is recorded into PandaJobStatus Oracle table. Consequently most jobs have multiple entries. The data collected is essential in understanding how much time jobs spend in different states. When joined with the PandaJobArchive table it can further slice and dice the data according to job type, user, memory consumption, etc.

Importing

Oracle table is ATLAS_PANDA.JOBS_STATUSLOG. The data is sqooped once a day to HDFS directory /atlas/analytics/panda/jobs_status/ using lxhadoop cluster. Code doing import is in ATLAS-Hadoop/import/JobStatusImport.sh. Keep in mind that up to 10 million rows are added to this table every day.

Columns descriptions:

Column name type Description
PANDAID long regular PANDAID
MODIFICATIONTIME chararray in this format: 'yyyy-MM-dd HH:mm:ss.0'
JOBSTATUS chararray can be: defined, waiting, assigned, throttled, activated, sent, starting, running, holding, transferring, finished, failed, cancelled
PRODSOURCELABEL chararray can be: test,prod_test,rucio_test,user,install,ptest,managed,panda,rc_test
CLOUD chararray cloud
COMPUTINGSITE chararray site

Analyses

  • Times jobs spent in waiting and transferring states, per day, per cloud, per site, per job type
The code to analyze this data is in ATLAS-Hadoop/pigCodes/Panda/PandaJobStatusAnalysis/.
name type Description
Reshuffle.pig pig The first to run is Reshuffle.pig that changes the original table where each change of the job status has one row to a lot smaller table that has one row per job. The output is saved on lxhadoop in directory: /user/ivukotic/PandaJobStatus/Reshuffle.
RucioInfluence.pig pig Uses the Reshuffled data to calculate Wating and Transfering times, calculate averages per day, cloud, site, prodsourcelabel .

The pig scripts use a Jython reduce scripts from file: myudfs.py

UDF return type input parameters Description
BagToBag bag bag of raw panda_jobs_statuslog data needed in order to individual job status changes into one bag per job.
BagToTimes ntup(skip,assigned,transferring) bag of ntups(status,time) calculates times spent in waiting/transferring states

Panda job archive table

The main panda job table is once a day incrementally copied from a production Oracle DB table to ATLAS_PANDAARCH.JOBSARCHIVED. There is a 3 day delay in this copy. From there it is "sqooped" once a day to lxhadoop at directory /atlas/analytics/panda/jobs/. Sqoop is done to Avro files. Code doing import is in ATLAS-Hadoop/import/JobImport.sh . Two clob fields of the original table are dropped. The data start from 2015-01-01. Data starting 2015-10-20 contain additional fields with a job's memory measurements. This made schemas different before and after 2015-10-19 and 2015-10-20

Analyses

The code to analyze this data is in ATLAS-data-analytics/pigCodes/Panda/.

  • Overflow analysis Code is in Panda/OverflowTimes. It was used to calculate the most important performance metrics of Overflow and Local panda jobs. Now it is mostly obsolete since the data in ElasticSearch can provide all of the information much faster and nicer in a form of the few dedicated dashboards.

  • Overflow flows Code is in Panda/OverflowMatrix. It selects all the overflow jobs and for each combination of source and destination sums up numbers of jobs, input files, input data size, all aggregated based on job status. This information is then stored in HDFS. From there it gets loaded by a python script that sends it to a FSB rest interface. The data should in the future go to ES.

  • Influence of the "over-the-pledge" resources on wait times of US user's jobs Code is in Panda/US_users_priorities, and serves to compare wait times that US users jobs experience with the ones experienced by jobs of all other users. Results are shown per computing site cloud.

  • Task duration analysis Code is in Panda/TaskDurations. A GoogleDoc explaining the analysis in details.

PanDA logger

OBSOLETE - should be updated by Shaojun Sun. He should be importing it into ES.

PanDA has a logging server that receives logging messages from the pilot, server, etc. The last week of logs is accessible in the bigpanda monitor, which presents the logs from Oracle where they are originally stored. Only the last week of logs is stored in Oracle. In Feb 2015 Ilija Vukotic implemented loading of logger records to Hadoop, so the full history since then is available in lxhadoop at /atlas/analytics/panda/PandaLog. Data are imported once a day. It grows at ~450MB/day. Ilija can provide help writing pig scripts to analyze it.

Use case

The logger records warnings and errors from the server and other sources; detailed histories of the actions of complex components like job brokerage, reassignments and JEDI task processing; pilot activity; PD2P activity; etc, and so is a rich source of information on the detailed dynamic activity of the PanDA system.

Importing

Sqoop incremental import can not be used as the table has no primary index. For this reason cron imports logs day-by-day in only one stream.
sqoop import --connect "jdbc:oracle:thin:@//itrac5101-v.cern.ch:10121/adcr.cern.ch" --table ATLAS_PANDA.PANDALOG --username ATLAS_PANDABIGMON --P --as-avrodatafile --target-dir /atlas/analytics/panda/PandaLog --columns NAME,MODULE,LOGUSER,TYPE,PID,LOGLEVEL,LEVELNAME,TIME,FILENAME,LINE,MESSAGE,BINTIME --where "BINTIME>'08-Feb-2015' and BINTIME<='09-Feb-2015'" -m 1 --append

Column descriptions:

COLUMN_NAME DATA_TYPE NULLABLE DATA_DEFAULT
NAME VARCHAR2(30 BYTE) Yes  
MODULE VARCHAR2(30 BYTE) Yes  
LOGUSER VARCHAR2(80 BYTE) Yes  
TYPE VARCHAR2(20 BYTE) Yes  
PID NUMBER(11,0) No '0'
LOGLEVEL NUMBER(9,0) No '0'
LEVELNAME VARCHAR2(30 BYTE) Yes  
TIME VARCHAR2(30 BYTE) Yes  
FILENAME VARCHAR2(100 BYTE) Yes  
LINE NUMBER(9,0) No '0'
MESSAGE VARCHAR2(4000 BYTE) Yes  
BINTIME DATE No to_date('01-JAN-70 00:00:00', 'dd-MON-yy hh24:mi:ss')

Analyses



Major updates:

-- IlijaVukotic - 2014-11-19

Responsible: IlijaVukotic
Last reviewed by: Never reviewed

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng civais-panda.png r1 manage 77.0 K 2014-12-09 - 18:55 RobertGardner  
Edit | Attach | Watch | Print version | History: r22 < r21 < r20 < r19 < r18 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r22 - 2015-11-10 - IlijaVukotic
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Atlas All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback