Activity Reports 2016 - Wen Guan


PanDA Pilot

  • Fixing and updating pilot to commission Event Service. A lot of small updates had been applied to fix the jobstatus, the job accounting info, the error report and so on.
  • Event Service validation: whether jobstatus is correctly reported, whether the cputime and walltime are correct, what's the performance, what's the distribution of failed jobs and preempted jobs, what's the distribution of job length(walltime) and so on.
  • objectstore checking: what's the failure rate of stagein/stagout, what's the failure reason
  • tar multiple event outputs to one file(working)
  • working together with dashboard: debug the reason why accounting summary calculated from panda db is different with dashboard summary: dashboard failed to collect the final state for many 'merging' jobs. at the same time, 'merging' is not accounted in summary. took long time to find the problem.
  • validating other bigpanda, dashboard and elasticsearch metrics.
  • commission es tasks.


PanDA Pilot

  • Work together with Panda to organize the event status and job status. Update in pilot to report new event status. Organize and update pilot to report job status and job substatus to distinguish errors.
  • test pilots at P1 and fix some http proxy issues.
  • Work together with Bigpanda and Elasticsearch to improve the ES monitor. Check monitor info, collect new requirements from different person. Validate the number and compare the info with panda db to decide what info is important to display.
  • Work on pilot traces, external stageout and so on.
  • Work on the new movers structure. Updates ES and ES merge to support new movers structure. Updates the new movers to support event file stageout to/stagein from Objectstore.


PanDA Pilot

  • test tar/zip function on Grid.
  • fix nEvents, cputime in pilot for preemptable jobs
  • validate cputime on preemptable queues
  • discuss with bigpanda and dashboard developers to improve event service monitor
  • create a ES monitor on ElasticSearch dashboard
  • Define new ES errors and organize pilot to report ES errors
  • Fix pilot to report errors in the new priority error handler
  • Fix Yoda to schedule same job to the same rank again when first try failes
  • test new objectstore sitemover with rucio CLI

Pilot 2.0

  • pilot2: architectural discussions with harvester team
  • Working on draft db model design and development
  • discuss with pilot developers on db based pilot structure.

Other Activities

  • ..

-- PaulNilsson - 2016-10-10

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2016-12-06 - WenGuan
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback