PandaDynamicDataPlacement

Introduction

This page describes PanDA Dynamic Data Placement (PD2P).

General Policies

The following general policies apply to PD2P algorithm for making replicas of datasets at Tier 1 and Tier 2 sites.
  • The dataset name prefix is mc or data
  • The data type is not RAW,RDO,HITS,ESD,EVNT
  • The hidden metadata is not True
  • Destination of all PD2P datasets will be DATADISK
  • Skip GangaRobot, HammerCloud, _dis, _sub etc
  • Site must have sufficient free space
  • Site must be analysis, not test, production etc
  • Replicas at non-online sites are ignored
  • Only online sites are used

Tier 1 Algorithm

  • Primary copies of ATLAS data will be placed at Tier 1 sites based on CREM policies
  • PD2P will make additional copy at Tier 1 if:
    • no PD2P replica was made during the past week to a T1
    • int(log10(nused))>nSecondaryT1Replicas, where nused stands for how many times the dataset was used per jobset
    • log10(nused) == 10**int(log10(nused)) (it means that another replica is created when nused=10,100,1000,... and not for nused=1)
  • All T1s are used even if no T2 has shcedconfig.cachedSE=True in the cloud
  • Location of additional 'secondary' copy is based strictly on MoU share
  • If dataset is a container, each containing dataset is placed using MoU share
  • If cloud=IT, immediately makes an additional copy at a Tier 2 within IT cloud. In this case, the Tier 2 Algorithm described in the next section is not triggered

Tier 2 Algorithm

Tier 2 Algorithm is executed independently of Tier 1 Algorithm. If one or more sites in a cloud have shcedconfig.cachedSE=True, the cloud is regarded as a PD2P cloud. When an analysis job is submitted to Panda, PD2P is triggered. Datasets which match the following conditions are distributed.
  • No replica within PD2P T2 sites (maxSitesHaveDS). If nWaitingJobsets>2, the number of replicas should be less than maxSitesHaveDS=int(log10(nWaitingJobs/200)). nWaitingJobsets is the number of Jobsets which have at least one WaitingJob. Both nWaitingJobsets and nWaitingJobs are computed per dataset when the analysis job is submitted.
  • Limit on maximum number of Tier 2 copies is 0<maxSitesHaveDS<5
  • The number of active DQ2 subscriptions per dataset made by PD2P is not greater than 2
PD2P finds a candidate site with shcedconfig.cachedSE=True from any PD2P cloud, by using the weight

where

  • W … The same weight for analysis brokerage described in this link
  • S … The number of subscriptions made by PD2P to the site for last 24 hours + 1
  • R … The number of replicas of the dataset in T2 sites in the same cloud + 1

A new copy tends to go to another cloud due to the second negative weight if a T2 site in a cloud already has a copy. If the input is a dataset container, one candidate is found for each constituent dataset.

Logger

PD2P Logger

  • action=UNSELECTEDT2 : Selection according to brokering algorithm
  • action=SELECTEDT2_T2MOU : Selection according to a/b/c/d classification

Monitor

All the monitorings associated to PD2P and extracted from the logger informations are documented in PD2P Monitor

Additional Information


Major updates:
-- TadashiMaeno - 15-Nov-2010 -- Revised KaushikDe - 13-Apr-2011



Responsible: TadashiMaeno

Never reviewed

Edit | Attach | Watch | Print version | History: r27 | r25 < r24 < r23 < r22 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r23 - 2012-07-16 - TadashiMaeno
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback