The PanDA Production and Distributed Analysis System

Introduction

The PanDA Production ANd Distributed Analysis system has been developed by ATLAS since summer 2005 to meet ATLAS requirements for a data-driven workload management system for production and distributed analysis processing capable of operating at LHC data processing scale. ATLAS processing and analysis places challenging requirements on throughput, scalability, robustness, efficient resource utilization, minimal operations manpower, and tight integration of data management with processing workflow. PanDA was initially developed for US based ATLAS production and analysis, and assumed that role in late 2005. In October 2007 PanDA was adopted by the ATLAS Collaboration as the sole system for distributed processing production across the Collaboration. It proved itself as a scalable and reliable system capable of handling very large workflow. In 2008 it was adopted by ATLAS for user analysis processing as well.

PanDA throughput has been rising continuously over the years. In 2009, a typical PanDA processing rate was 50k jobs/day and 14k CPU wall-time hours/day for production at 100 sites around the world, and 3-5k jobs/day for analysis. In 2014, PanDA processes about a million jobs per day, with about 150,000 jobs running at any given time. The PanDA analysis user community numbers over 1400.

PanDA was designed to have the flexibility to adapt to emerging computing technologies in processing, storage, networking and distributed computing middleware. This flexibility has also been successfully demonstrated through the past six years of evolving technologies adapted by computing centers in ATLAS which span many continents. This proven scalability and flexibility makes PanDA well suited for adoption by a wide variety of exabyte scale sciences. The interest in PanDA by other big data sciences brought the primary motivation to generalize PanDA, aka the BigPanDA project, providing location transparency of processing and data management, for High Energy Physics community and other data-intensive sciences, and a wider exascale community. The BigPanDA project has been active since September 2012.

Design and implementation

Principal design features

The principal features of PanDA's design are as follows.

  • Support for both managed production and individual analysis users so as to benefit from a common WMS infrastructure and to allow analysis to leverage production operations support, thereby minimizing overall operations workload.
  • A coherent, homogeneous processing system layered over diverse and heterogeneous processing resources. This helps insulate production operators and analysis users from the complexity of the underlying processing infrastructure. It also maximizes the amount of PanDA systems code that is independent of the underlying middleware and facilities actually used for processing in any given environment.
  • Use of pilot jobs for acquisition of processing resources. Workload jobs are assigned to successfully activated and validated pilots based on PanDA-managed brokerage criteria. This 'late binding' of workload jobs to processing slots prevents latencies and failure modes in slot acquisition from impacting the jobs, and maximizes the flexibility of job allocation to resources based on the dynamic status of processing facilities and job priorities. The pilot is also a principal 'insulation layer' for PanDA, encapsulating the complex heterogeneous environments and interfaces of the grids and facilities on which PanDA operates.
  • Extensive direct use of Condor (particularly CondorG), as a job submission infrastructure of proven capability and reliability. While other job submission approaches are also supported such as local batch, CondorG is the backbone submission system of PanDA and has been very successful as such.
  • Coherent and comprehensible system view afforded to users, and to PanDA's own job brokerage system, through a system-wide job database that records comprehensive static and dynamic information on all jobs in the system. To users and to PanDA itself, the job database appears essentially as a single attribute-rich queue feeding a worldwide processing resource.
  • A comprehensive monitoring system supporting production and analysis operations; user analysis interfaces with personalized user level views; detailed drill-down into job, site and data management information for problem diagnostics; usage and quota accounting; and health and performance monitoring of PanDA subsystems and the computing facilities being utilized.
  • System-wide site/queue information database recording static and dynamic information used throughout PanDA to configure and control system behavior from the regional level down to the individual queue level. The database is an information cache, gathering data from grid information systems, site experts, the data management system and other sources. It supports information access and limited remote control functions via an http interface. It is used by pilots to configure themselves appropriately for the queue they land on; by PanDA brokerage for decisions based on cloud and site attributes and status; and by the pilot scheduler to configure pilot job submission appropriately for the target queue.
  • Integrated data management based on dataset (file collection) based organization of data files, with datasets consisting of input or output files for associated job sets. Movement of datasets as required for processing and archiving is integrated directly into PanDA's workflow. This design matches the data-driven, dataset-based nature of the overall ATLAS computing organization and workflow.
  • Automated pre-staging of input data (either to a processing site or out of mass storage, or both) and immediate return of outputs, all asynchronously, minimizing data transport latencies and delivering (for analysis) the earliest possible first results. Pre-staging of input data prior to initiation of jobs is an essential efficiency and robustness measure, such that jobs themselves are not subject to data staging/transfer latencies and failure modes, improving efficiencies for job completion and resource utilization.
  • Support for running arbitrary user code (job scripts), as in conventional batch submission. There is no ATLAS specificity or workload restrictions in the design.
  • Easy integration of local resources. Minimum site requirements are a grid computing element or local batch queue to receive pilots, outbound http support, and remote data copy support using distributed data movement tools (e.g. xrootd).
  • Simple client interface allows easy integration with diverse front ends for job submission to PanDA.
  • Authentication and authorization is based on grid certificates, with the job submitter required to hold a grid proxy and VOMS role that is authorized for PanDA usage. User identity (DN) is recorded at the job level and used to track and control usage in PanDA's monitoring, accounting and brokerage systems. The user proxy itself can optionally be recorded in MyProxy for use by pilots processing the user job, when pilot identity switching/logging (via gLExec) is in use.
  • Support for usage regulation at user and group levels based on quota allocations, job priorities, usage history, and user-level rights and restrictions.
  • Security in PanDA employs standard grid security mechanisms -- see PandaSecurity.

Architecture and workflow

Jobs are submitted to PanDA via a simple python client interface by which users define job sets, their associated datasets and the input/output files within them. Job specifications are transmitted to the PanDA server via secure http (authenticated via a grid certificate proxy), with submission information returned to the client. This client interface has been used to implement PanDA front ends for ATLAS production, distributed analysis (e.g. pathena), and other experiments. The PanDA server receives work from these front ends into a global job queue, upon which a brokerage module operates to prioritize and assign work on the basis of job type, priority, input data and its locality, available CPU resources and other brokerage criteria. Allocation of job sets to sites is followed by the dispatch of corresponding input datasets to those sites, handled by a data service interacting with the distributed data management system. Data pre-placement is a soft (formerly strict) precondition for job execution: jobs are not released for processing until their input data is accessible from the processing site. When data dispatch completes, jobs are made available to a job dispatcher. Strict data colocation for analysis is being replaced by a softer requirement that data be accessible via an adequately performant network path, as determined by a networking site-to-site 'cost matrix'.

An independent subsystem manages the delivery of pilot jobs to worker nodes, using a number of remote (or local) job submission methods as defined in its configuration. A pilot once launched on a worker node contacts the dispatcher and receives an available job appropriate to the site. If no appropriate job is available, the pilot may immediately exit or may pause and ask again later, depending on its configuration (standard behavior is for it to exit). An important attribute of this scheme for interactive analysis, where minimal latency from job submission to launch is important, is that the pilot dispatch mechanism bypasses any latencies in the scheduling system for submitting and launching the pilot itself. The pilot job mechanism isolates workload jobs from grid and batch system failure modes (a workload job is assigned only once the pilot successfully launches on a worker node). The pilot also isolates the PanDA system proper from grid heterogeneities, which are encapsulated in the pilot, so that at the PanDA level the grid(s) used by PanDA appears homogeneous. Pilots generally carry a generic 'production' grid proxy, with an additional VOMS attribute 'pilot' indicating a pilot job. Analysis pilots may use glexec to switch their identity on the worker node to that of the job submitter (see PandaSecurity).

The isolation of the pilot system in the architecture allows for many different pilot system implementations to be smoothly integrated with PanDA, making it possible to integrate widely diverse processing resources with different requirements for a pilot service. These include supercomputers, cloud resources, and self-contained computing environments like NorduGrid.

The PanDA architecture

PanDA architecture

PanDA job flow

Pilot workflow. For details see GenericPanDAPilot

Pilot workflow

Implementation

The implementation of PanDA (original but still largely accurate) is shown below. Supported front ends are the ATLAS production system, pathena (a distributed analysis interface to the ATLAS offline software framework, Athena), prun (a more generic front end for ATLAS analysis), and a generic submission client user by other experiments. The PanDA server containing the central components of the system is implemented in python (as are all components of PanDA) and runs under Apache as a web service (in the REST sense; communication is based on HTTP GET/POST with the messaging contained in the URL and optionally a message payload of various formats). Oracle RDBMS is used to implement the job queue and all metadata and monitoring repositories. Condor-G is the distributed scheduler used by the pilot scheduling subsystem, the AutoPyFactory. A monitoring server works with the Oracle DBs, including a logging DB populated by system components recording incidents via a simple web service behind the standard python logging module, to provide web browser based monitoring and browsing of the system and its jobs and data. Since 2013, PanDA has also been (re)implemented against a MySQL back end as an alternative to Oracle.

Original Implementation

PanDA implementation

PanDA Components

  • PandaServer - central PanDA hub composed of several components that make up the core of PanDA. Implemented as a stateless REST web service over Apache mod_python and with a MySQL back end
    • PandaTaskBuffer - the PanDA job queue manager, keeps track of all active jobs in the system
    • PandaBrokerage - matches job attributes with site and pilot attributes. Manages the dispatch of input data to processing sites, and implements PanDA's data pre-placement requirement
    • PandaJobDispatcher - receives requests for jobs from pilots and dispatches job payloads. Jobs are assigned which match the capabilities of the site and worker node (data availability, disk space, memory etc.) Manages heartbeat and other status information coming from pilots.
    • PandaDataService - data management services required by the PanDA server for dataset management, data dispatch to and retrieval from sites, etc. Implemented with the ATLAS DDM system.
  • PandaDB - Database system serving PanDA
  • PandaClient - PanDA job submission and interaction client
  • AutoPyFactory - Pilot submission, management and monitoring system. Supersedes first generation PandaJobScheduler, as well we the second generation system, the AutoPilot.
  • Harvester - Harvester is a resource-facing service between the PanDA server and collection of pilots for resource provisioning and workload shaping. It is a lightweight service running on a VObox or an edge node of HPC centers to provide a uniform view for various resources. Central Harvester instances.
  • PandaPilot - the execution environment (effectively a wrapper) for PanDA jobs. Pilots request and receive job payloads from the dispatcher, perform setup and cleanup work surrounding the job, and run the jobs themselves, regularly reporting status to PanDA during execution. Pilot development history is maintained in the pilot blog.
  • Panda Schedconfig - Database table to configure resources and guide the submission of pilot jobs
  • PandaMonitor - web based monitoring and browsing system that provides an interface to PanDA for users and operators.
  • PandaLogger - logging system allowing PanDA components to log incidents in a database via the Python logging module and http
  • Bamboo - interface between PanDA and the ATLAS production database. Superseded in the second generation ATLAS production system Prodsys2 by the JEDI extensions of PanDA.
  • PanDA extensions - Extending PanDA for general use (Open Science Grid WMS program)
  • PD2P - PanDA dynamic data placement model
  • JEDI - An extension of core PanDA to add capability for task-level workload management
  • Event Service - Event-level master/slave event processing in PanDA/JEDI
  • BigPanDAmonitoring
  • BigPanDAinstanceInstallation
  • BigPanDARPMbuildSystem
  • BigPanDAforORNL
  • Intelligent Networking
  • PanDA share distribution and monitoring
  • PanDA error table and retrial module

Security

Security in PanDA employs security mechanisms that are standard in web and grid circles, augmented by built-in capabilities to monitor, track and control usage. See PandaSecurity for information. For operational PanDA security advice, see here.

ATLAS Production System

  • Development of the ATLAS production system, now in its second generation Prodsys2, is closely allied with PanDA and utilizes the JEDI extensions of PanDA.

Information for users

System Operations and Facilities

PanDA Project

PanDA development

Source code repository: github.com/PanDAWMS

PanDA beyond ATLAS

Mailing list

ATLAS PanDA development discussions take place on the CERN e-group atlas-adc-panda mailing list. A CERN e-groups forum is available for discussions on usage of PanDA for analysis.

Documents, Talks

PanDA project logo

  • PanDA logo, 1800x1731 pixels:
    PanDA-rev-logo.jpg

  • PanDA logo, 600x577 pixels:
    PanDA-rev-logo-midsize-600px.jpg

  • PanDA logo, 260x250 pixels:
    PanDA-rev-logo-midsmall-250px.jpg

  • PanDA logo, 200x192 pixels:
    PanDA-rev-logo-small-200px.jpg

2013-2014

2010-2012

  • Networking and Workload Management (PDF), Kaushik's talk in Dec'12 at the LHCONE Meeting
  • Recent Improvements in the ATLAS PanDA Pilot (poster, paper), P. Nilsson, CHEP 2012, United States, May 2012
  • PD2P : PanDA Dynamic Data Placement for ATLAS (slides, paper), T. Maeno, CHEP 2012, United States, May 2012
  • Evolution of the ATLAS PanDA Production and Distributed Analysis System (slides, paper), T. Maeno, CHEP 2012, United States, May 2012
  • The ATLAS PanDA Pilot in Operation (poster, paper), P. Nilsson, CHEP 2010, Taiwan, October 2010
  • Overview of ATLAS PanDA Workload Management (slides, paper), T. Maeno, CHEP 2010, Taiwan, October 2010

2007-2009

  • Distributed Data Analysis in ATLAS (slides, letter), P. Nilsson, ICCMSE 2009, Greece, September 2009

2004-2006

Meetings

PanDA project meetings currently take the form of BigPanDA project meetings. ATLAS PanDA issues are primarily discussed in ATLAS Distributed Computing development meetings.

Archive

Pre-2010 meetings, workshops, reviews

Previous generation systems and R&D

Misc presentations

Previous Contributors


Major updates:

-- TorreWenaus - 22-May-2014
-- MaximPotekhin - 13-Dec-2012
-- MaximPotekhin - 24-Oct-2012
-- MaximPotekhin - 14 Sep 2012
-- MaximPotekhin - 11 Oct 2010
-- AldenStradling - 31 Jul 2009
-- TorreWenaus - 25 Jun 2008
-- TorreWenaus - 05 Nov 2007
-- TorreWenaus - 06 Oct 2006
-- TorreWenaus - 06 Mar 2006
-- RobertGardner - 10 Aug 2005
-- KaushikDe - 01 Aug 2005



Responsible: TorreWenaus

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf 12sep06_da.pdf r1 manage 46.4 K 2006-10-07 - 17:10 TorreWenaus pathena update Sep 2006
PDFpdf 200609-ddmbnl-panda-ddm.pdf r1 manage 3843.5 K 2006-10-07 - 15:55 TorreWenaus Panda/DDM integration, T. Wenaus Sep 2006
PDFpdf 200609-panda-osg.pdf r1 manage 4223.7 K 2006-10-07 - 17:13 TorreWenaus Panda overview, Sep 2006
PDFpdf 200804-ISGC-Wenaus-PanDA.pdf r1 manage 3676.0 K 2008-04-10 - 23:02 TorreWenaus PanDA talk, ISGC conference, Taiwan, April 2008
PowerPointppt CHEP07_-_Paul_Nilsson_-_Experience_from_a_pilot_based_system_for_ATLAS.ppt r1 manage 585.0 K 2007-11-05 - 20:49 PaulNilsson CHEP2007 presentation by Paul Nilsson
PowerPointppt CHEP10_panda_wm.ppt r1 manage 854.0 K 2011-11-02 - 09:41 TadashiMaeno  
PDFpdf CHEP12_evo_paper.pdf r1 manage 283.1 K 2012-06-01 - 11:18 TadashiMaeno  
PDFpdf CHEP12_evo_poster.pdf r1 manage 96.9 K 2012-06-01 - 11:18 TadashiMaeno  
PDFpdf CHEP12_pd2p_paper.pdf r1 manage 312.9 K 2012-06-01 - 11:18 TadashiMaeno  
PDFpdf CHEP12_pd2p_slides.pdf r1 manage 275.8 K 2012-06-01 - 11:18 TadashiMaeno  
PDFpdf CHEP2012_-_Improvements_in_the_ATLAS_PanDA_Pilot.pdf r1 manage 902.0 K 2012-05-24 - 17:33 PaulNilsson CHEP 2012 Poster by Paul Nilsson
PDFpdf CHEP2012_-_Improvements_in_the_ATLAS_PanDA_Pilot_paper_draft_3.pdf r1 manage 697.7 K 2012-05-24 - 17:37 PaulNilsson CHEP 2012 Paper by Paul Nilsson
PowerPointppt DButils_upgrade_by_connection_pooling.ppt r1 manage 1498.0 K 2009-04-28 - 16:07 UnknownUser DButils upgrade using Connection Pool
JPEGjpg JobFlow.jpg r1 manage 59.2 K 2006-10-07 - 13:54 TorreWenaus Panda job flow
PDFpdf PNilsson-DistAnaFinal.pdf r1 manage 1846.2 K 2010-08-05 - 15:43 PaulNilsson ICCMSE 2009 (slides)
PDFpdf P_Nilsson_-_PanDA.pdf r1 manage 2505.4 K 2009-03-24 - 20:53 PaulNilsson ACAT2008 Slides
JPEGjpg PanDA-rev-logo-midsize-600px.jpg r1 manage 121.1 K 2014-06-04 - 21:01 JaroslavaSchovancova PanDA logo, 600x577 pixels
JPEGjpg PanDA-rev-logo-midsmall-250px.jpg r1 manage 30.7 K 2014-06-04 - 21:02 JaroslavaSchovancova PanDA logo, 260x250 pixels
JPEGjpg PanDA-rev-logo-small-200px.jpg r1 manage 25.9 K 2014-06-04 - 21:02 JaroslavaSchovancova PanDA logo, 200x192 pixels
JPEGjpg PanDA-rev-logo.jpg r1 manage 583.0 K 2014-06-04 - 21:00 JaroslavaSchovancova PanDA logo, 1800x1731 pixels
PDFpdf PaulNilsson-v2.pdf r1 manage 535.4 K 2010-08-05 - 15:44 PaulNilsson ICCMSE 2009 (letter)
PDFpdf PaulNilsson.pdf r1 manage 269.9 K 2009-03-24 - 20:51 PaulNilsson ACAT2008 Paper
PDFpdf Paul_Nilsson-CHEP2007.pdf r1 manage 240.4 K 2007-11-05 - 20:48 PaulNilsson CHEP2007 proceedings by Paul Nilsson
PowerPointppt The_ATLAS_PanDA_Pilot_in_Operation.ppt r1 manage 661.0 K 2011-11-02 - 10:21 PaulNilsson The ATLAS PanDA Pilot in Operation (CHEP'10, poster)
PDFpdf The_ATLAS_PanDA_Pilot_in_Operation_final.pdf r1 manage 1640.0 K 2011-11-02 - 10:16 PaulNilsson The ATLAS PanDA Pilot in Operation (CHEP'10)
PDFpdf WMSnetworking-dec12.pdf r1 manage 7658.9 K 2012-12-13 - 19:17 MaximPotekhin Networking and Workload Management, Kaushik's talk in Dec'12
PowerPointppt bamboo_workflow.ppt r1 manage 129.5 K 2008-08-06 - 19:25 TorreWenaus Bamboo workflow, Tadashi Maeno, Feb 2008
PDFpdf chep2010pandaWM.pdf r1 manage 297.8 K 2011-11-02 - 09:40 TadashiMaeno  
PowerPointppt chepPanda.ppt r1 manage 410.5 K 2007-11-05 - 20:08 TadashiMaeno  
JPEGjpg panda-arch.jpg r1 manage 276.7 K 2006-03-06 - 22:46 TorreWenaus Updated architecture jpg
PDFpdf panda-arch.pdf r1 manage 49.7 K 2006-03-06 - 22:45 TorreWenaus Updated Panda architecture diagram
JPEGjpg panda-implementation.jpg r2 r1 manage 301.2 K 2006-10-07 - 13:44 TorreWenaus Updated implementation jpg
PDFpdf panda-implementation.pdf r2 r1 manage 54.8 K 2006-10-07 - 13:43 TorreWenaus Updated Panda implementation diagram
Microsoft Word filedoc panda-osg-extensions.doc r1 manage 1049.0 K 2006-10-07 - 15:39 TorreWenaus Preliminary plans for Panda extension work in the OSG
Edit | Attach | Watch | Print version | History: r175 < r174 < r173 < r172 < r171 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r175 - 2018-05-09 - FernandoHaraldBarreiroMegino
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback