Panda/AGIS integration of ND cloud sites

Aim: to rationalise topology: one PandaSiteID<->one ATLAS site<->one GOCDB site - To enable ARC CE deployment outside of the ND cloud

Panda integration is prerequisite, after which AGIS integration follows.

Site re-definition

  • AGIS is live now ( http://atlas-agis.cern.ch/agis/ ), all sites need to be defined there. Information is collected from GOCDB, BDII-top and can also be added manually. All ATLAS internal info (e.g. schedconfig) is added manually. AGIS satisfies all the requirements previously identified as pre-requisites for the ND Cloud integration with Panda (bullet points below in this chapter). Entries are to be filled partly by ADC ops, partly by a cloud operator (e.g. assigno queues to CEs). The information to be provided flows hierarchically as follows:
 
                   SITE (a dictionary - better if name==GOCDB name)
                       |
                       ATLASSITE (there can be more than one per SITE e.g. GRIF)
                                 |
                                 PANDASITE (there can be more than one, we'll likely not need multiples)
                                     |
     - Panda brokers here -->         PANDARESOURCE (differentiate here between ANALY and PROD)
                                             |
                                             PANDAQUEUE (each queue should match a set of homogeneous resources)
                                              | | | |
                                              Many CEs (e.g. for redundancy)
  • The suggested way of configuring entries is: SITE==ATLASSITE==PANDASITE, then split between ANALY and PROD #and# further between resources by convenient criteria e.g. PRODSHORT, PRODLONG, PRODHIMEM etc.

  • Needs sites visible in top-bdii's. With AGIS, can also integrate non published resources. The following had however been agreed:
    • NDGF-T1 (GOCDB site) will maintain its distributed internal structure (multiple Sites and CE's in multiple countries). This is an exception, but similar to what is done for GRIF (5 physical sites and ATLASSites). The NDGF-T1 site-bdii will be re-configured in order to expose properly the sub-structure, so that services e.g. CVMFS auto-setup will work properly. This is confirmed to be OK (Andrej). Accounting/pledge complexity is taken care of within NDGF already.
    • T2 sites not part of NDGF-T1 e.g. SE-SNIC-T2 (UNIBE-LHEP in the near future) will be visible through their own site-bdii
    • New Grid sites will either be GOCDB-registered and exposed through their site-bdii, the new ARC2 infosys, or be incorporated withing NDGF-T1. A separate site (pledge, etc) will be required to be a GOCDB site (site-bdii, ARC2 infosys)

  • Individual sites (ATLASSites) within NDGF-T1 will not advertise their own SRM endpoint as their storage will remain part of the NDGF-T1 SRM. Separate GOCDB sites might or might not have an associated SRM endpoint.

  • Some of these sites are T1, some T2, some T3: this can be taken care of within AGIS

  • No problems are expected with mixed sites (ARC / CREAM).*

  • The distributed T1 model will still exist: AGIS must be able to handle more than one ATLASSite labeled as T1 in the cloud: this will be possible (Ale)

  • Releation between resource (CE+queue) and PanDA queue will no longer be mandatory in AGIS. This means that it is down to the site-admin/cloud to create the relation. The relation will be of the type "many-to-many" (many CE=queue to many PanDA queue) which gived full flexibility.

  • The new ARC2 infosys will be able to register directly to tob-bdii's. This requires top-bdii's to fully understand glue2 (timescale: * due end of summer, not delivered yet*). In order for ARC to get pilots, some attributes are needed from the infosys (typically gatekeeper, grid-manager, queue, etc). Currently ARC gets these via a number of queries to the ARC infosys. The ARC clients seem to be more efficient with the queries.The following is agreed:
    • Andrej will try existing clients on AFS to see whether the needed queries can be done already.
    • If not, will investigate how to access the EMI ARC clients publicily.
    • Failing all the above, a flat file on a webserver including the necessary attributed can be used (until the integration ARC2 infosys-top-bdii will be a fact) - NOT discussed, NOT clear whether this is still an issue

Panda job brokering

SW

* Central Installation+Validation to be handled by Panda (inc. CVMFS). Ongoing: integrating current system with AGIS first, then move to PanDA (Alessandro De Salvo). As of now, SW tags and 'is_cvmfs' flag are in AGIS

* Will drop SW tags from site-bdii, these will only live in AGIS

* Until integration is complete, assume all releases are there (as now?) - ND manager makes sure that releases are installed&validated (also CVMFS)

Data

  • DDM knows data exist within in ND SEs, Panda knows via schedconfig what DDM endpoints are associated to each site: OK

  • Data are moved within cloud (needs WAN performance, OK for now) and each site has a local cache that is NOT part of the SE (it is confirmed that at this stage this cannot be accounted for as site storage to ATLAS)

  • Currently data are in NDGF-T1 dCache (with pools at many sites) and SE-SNIC-T2 (UNIBE-LHEP will be added soon-ish). There are dCache Pools for T1 storage, T2 storage, there's a replica area caching a second copy of data (~1-month worth of): this topology will be maintained (replica area is not part of pledges)

  • AGIS/schedconfig must be able associate each site to any storage (Primary and Secondary) endpoint: this is implemented in AGIS. [previous discussion: This will allow to e.g. broker to UNIBE-LHEP T2 analysis jobs (for data stored in their SE) but also T1 jobs running on NDGF-T1 data (moved through WAN to the site local cache). Eventually, it should be handled in the same way as the xrootd federation. Right now one can map many clusters to one SE but not one cluster to many SEs. Workaround will be multiple SiteIDs per physical resource, each associated to one single SE (example: Bern to NDGF SE; Bern to Bern SE). In the long term, the full matrix will be the solution.]

  • Assuming DDM knows how/were to send T1 and T2 data: this is confirmed

Pilot job assignment:

  • Currently, there's no pilot factory for ARC. The Control Tower (CT) emulates pilots running at (virtual sites) ARC (T1) and ARC-T2 and calls to Panda to get T1 and T2 jobs accordingly. The CT then performs a re-brokering of jobs to the underlying sites, creates the actual jobs and submits them directly to the individual CEs. It's likely a new version of the CT will be maintained, which is able to work with real pilots and will submit to the newly defined PanDA queues. ARC and ARC-T2 queues to be obsoleted. Work on modifications to the CT is still pending.

  • In order to ensure that the ARC CE deployment can occur also on sites NOT part of the ND cloud, the auto-pilot factory must know how to submit to the ARC CE (with job priority properly set), and also modification to the pilot is likely to be needed. Submission can then be performed to an ARC CE that is associated to a PanDA queue (in turn associated to one ARC service at one ATLASSite). Andrej/David to look at the modifications. It is recommended to interact with APF2 devs: John Hover and Jose Caballero

  • T1 jobs will go to multiple T1 ATLASSites, T2 jobs will go to multiple T2 sites. What about T3's? In the current scheme sites labeled as T3's in ATLAS can run T2 and even T1 workloads (as set in the current CT). These sites are likely to be served by the new version of the CT (to be clarified)

  • Mixed ARC / CREAM sites: NO problems expected (see Site section at the top)

Progress

  • Define one/more sites in AGIS dev for Andrej to test

  • Next iteration: undefined as we ran out of time

-- GianfrancoSciacca - 15-Mar-2012

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2012-10-22 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback