Data Service for PanDA

Functions of the central Panda data service (part of the PandaServer)

  • Request/select group of jobs from the PandaTaskBuffer.
  • Request list of sites for these jobs from PandaBrokerage.
  • Request DDM to reserve and move the block of input files.
  • Check status of group of jobs in PandaTaskBuffer. Receive notification from DDM on transfer completion to trigger downstream actions.
  • Request DDM to move and archive output files.

Implementation of Panda data handling with ATLAS DDM

The figures below shows Panda's dataset-based automated data handling, implemented using the ATLAS DDM system. All data handling is at the dataset level (file collections, with a data block being an immutable dataset). Sites are subscribed to datasets to trigger automated dataflow, and distributed (HTTP URL) callbacks provide notification of transfer completion and are used to trigger job release on data arrival and other chained operations. This automated dataflow together with enforced data pre-placement as a precondition to job dispatch has been key to minimizing operational manpower and maximizing robustness against transfer failures and SE problems.

Dataset-based data flow in Panda is as follows:

diagram

Implementation of dataset-based data flow with the ATLAS DQ2 DDM system and its subscription and callback mechanisms is shown here:

diagram

Datasets used by the system can be examined via the dataset browser.

See UsingDQ2 for information on accessing Panda-produced (and other ATLAS DQ2-managed) data.


Interactions with DDM

  • registers empty output datasets when production/user tasks are submitted
  • registers empty dataset containers when user tasks are submitted
  • sets metadata of datasets, e.g., owner, origin, lifetime, pin_lifetime
  • retrieves lists of files in input datasets when tasks are submitted (including some metadata like LFNs, fsize, cheksum, numOfEvents, etc) to insert them to the database
  • for open datasets, retrieves lists of files periodically. It would be nice if there is an API call to get the list of added or removed files after a timestamp
  • gets the list of constituent datasets in dataset containers when user tasks are sumbmitted with containers
  • takes N files in input datasets and gets the number of available files at each site. Currently this is done by scanning LFC
  • transfers missing input files to T2 and pins existing files at T2 using dispatch datasets
  • activates jobs when receiving callbacks for dispatch datasets
  • registers subscriptions for sub datasets when the first files are added
  • transfers output files to T1 using sub datasets
  • adds files to sub datasets when jobs are finished at T2
  • adds files to tid datasets per job when receiving callbacks for sub datasets which the job use or when finding in LFC that all output files are available at T1
  • could add files to sub/tid datasets per sub/tid datasets since adding files per job is causing quite a few loads on the panda server machines. But this would cause a delay until files are available at T1 in tid datasets
  • deletes dispatch datasets when all jobs which use the dispatch datasets are successfully finished
  • reduces lifetime of dispatch datasets when all jobs which use the dispatch datasets are finished but some of them are failed
  • deletes sub datasets when files are transferred to T1 and they are added
  • gets the number of available files at each site per dataset for the task brokerage, PD2P, analysis brokerage
  • makes subscriptions to T1 for input datasets when only T2 has them
  • makes subscriptions to free sites for PD2P
  • creates transient datasets at T2 to merge files there and then deletes them
  • gets free and total disk space at each site
  • pins input datasets when jobs are submitted
  • gets dataset names and LFNs with GUIDs
  • would re-open datasets when lost files are re-generated


Creation and deletion policies for _dis and _sub datasets

  • one _dis dataset is created per ~20 input files with the replica lifetime of 7 days
  • one _sub dataset is created per ~50 output files for each data type with the replica lifetime of 14 day
  • _dis and _sub datasets are hidden
  • multiple _dis datasets may contribute to one _sub dataset
  • _dis dataset is created only at T2 when input files already exist at the site
  • _dis dataset is erased from EGEE/EGI when the associated _sub dataset is frozen and all jobs contributing to the _dis dataset successfully finished
  • _dis dataset is not erased when some jobs contributing to the _dis dataset failed, so that input files in the _dis dataset might be re-used by next attempt
  • _sub dataset is deleted from EGEE/EGI T2 when all files in the _sub dataset are transferred to T1
  • _sub dataset is frozen when all jobs contributing to the _sub dataset finished/failed/cancelled/reassigned

Pinning datasets

Dataset replicas are pinned for the following reasons. The expiration date for pinning is 7 days.

  • When production jobs are submitted, T1 replicas are pinned. If replicas are available only at T2s, subscriptions to T1 are made and T2 replicas are pinned.
  • When the task brokerage assigns a task to a cloud, replicas at T1 are pinned. If replicas are available only at T2s, subscriptions to T1 are made and T2 replicas are pinned.
  • When analysis jobs are submitted, PD2P pins replicas at the site.

Development Team

Data service component of central Panda server: TadashiMaeno

DDM services at Tier 1 and Tier 2 sites: WenshengDeng, HorstSeverini, PatrickMcGuigan, MarkSosebee

DDM operations coordination in US ATLAS: AlexeiKlimentov


Major updates:
-- TorreWenaus - 06 Mar 2006 -- KaushikDe - 02 Aug 2005



Responsible: KaushikDe

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg PandaDataflow.jpg r1 manage 207.7 K 2006-03-06 - 23:17 TorreWenaus Panda dataflow with DQ2 jpg
PDFpdf PandaDataflow.pdf r1 manage 26.4 K 2006-03-06 - 23:17 TorreWenaus Panda dataflow with DQ2
JPEGjpg dataset-flow.jpg r1 manage 55.6 K 2006-10-07 - 14:57 TorreWenaus Dataset-based data flow
Edit | Attach | Watch | Print version | History: r60 < r59 < r58 < r57 < r56 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r60 - 2013-03-11 - TadashiMaeno
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback