EventServiceOperations

EventService introduction

  • An Event Service job is similar to a normal panda job. The difference is in the panda job payload. When the pilot gets a panda job which includes "('eventService', 'True')" in the payload, it will start a ES process to handle it.
  • For an ES job, the input files can be staged in to local working directory or read directly (same with normal panda job).
  • The event objects and logs will be send to an object store when the ES job is running.
  • When the pilot finishes all work, it will transfer its logs to a grid storage element (same as with normal panda job, these log files can be accessed from the panda monitor).
  • * DetailEventServiceIntroduction * can be found here. Please read it at first if you are not familiar with Event Service.

Links

ES General

  • There are two types of ES tasks:
    • Pure ES task: For this task, Panda will only generate ES jobs with EventService=1.
      • Panda generates ES jobs with EventService=1
      • Pilot gets ES jobs to run
      • When a bunch of events finishes, for example 1000 events, panda generates ESMerge jobs with EventService=2.
      • Pilot gets ESMerge jobs to run. When an ESMerge job finish, it means a bunch of events finished.
    • JumboEnabled ES task: For this task, Panda will generate both jobs with eventservice=4 and eventservice=5.
      • Normally JumboEnabled ES tasks have a lot of events. At first, Panda will Panda generates ESJumbo jobs with EventService=4. These jobs will be scheduled to panda queues with useJumboJobs defined in AGIS catchall. For HPC, harvester will be installed to run these jumbo jobs.
      • At the same time, Panda will a lot cojumbo ES jobs to schedule them to Grid. These cojumbo jobs may share the share input events with Jumbo jobs. If the events are already prefetched by HPC, the cojumbo jobs will be set in waiting status. Cojumbo job is similar with EventService=1 jobs, except the input events are shared with Jumbo jobs. So from pilot view, they are the same.
      • When a Jumbo job finishes, Harvester will update the status of events. some left events can be handled by cojumbo jobs.
      • When a task has not many left events, it's not efficient to use HPC to process them. Panda will automatically disabled Jumbo. After that, these tasks will only generate cojumbo jobs to process the left events.
  • Panda Queues:
    • useJumboJobs in catchall(AGIS): Panda will schedule jumbo jobs with EventService=4 to it.
    • jobseed=es or jobseed=all: Panda will schedule normal es jobs with EventService=1 and cojumbo jobs with EventService=5 to it.
    • jobseed=eshigh: At the end of a task, panda will increase its priority to speed up the processing of the tail. These high priority jobs will be scheduled to eshigh panda queues. For many MCore queues, we added jobseed=eshigh to them. Normally these panda queues will only simu, reco and jobs other than ES. But when there are high priority tails of es tasks, panda will schedule these high priority tails to these panda queues.
  • ESMerge:
    • EventService=1, EventService=4 and EventService=5 will generate a lot of event level files or a tar file with many event level files. These files are stored in objectstore for Grid and in some datadisk for HPC. We need to merge them to root files.
    • Currently ESMerge jobs are generated by panda with a bunch of continuous events. For example, if these events, 1 to 999, 1001 to 3000 are finished and we defined 1000 events per job. Two ES merge job will be generated with events "1001 to 2000" and "2001 to 3000". For events "1 to 999", panda will wait until the event 1000 is finished.
    • Currently ESmerge failure is one issue for ES. Specially when there are some es premerge files on datadisk(objectstore is more scaleable and less errors for remote reading). So for an esmerge job, panda will check the storage of input premerge files. If there are datadisks, panda will schedule the esmerge jobs to panda queues associated with one of those datadisk. If there are no datadisks, panda will schedule esmerge jobs close to the input objectstore.

Operations

Common Errors


-- WenGuan - 2019-03
-- WenGuan - 2015-07-08

Edit | Attach | Watch | Print version | History: r31 < r30 < r29 < r28 < r27 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r31 - 2019-05-08 - WenGuan
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback