May 2017 GDB notes

Agenda

https://indico.cern.ch/event/578986/

Introduction (Ian Collier)

Slides

  • Latchezar: if the March GDB will again be at ISGC, we need to inform potential participants early

  • Ian: OSG have a new security officer, Susan Sons from Indiana University

HEPiX Report (Mattias Wadenstein for Helge Meinhard)

Slides

  • Romain: regarding security incidents, coordination and collaboration are OK in WLCG,
    but incidents do not stop there: they involve university campuses etc.

  • Latchezar: what is the general view on Knights Landing (KNL) performance?
  • Mattias:
    • standard SW running OK on Xeon is slow on KNL
    • certain ATLAS code was sped up 2x by compiling with special options
    • a recent gcc can be used for that
    • KNL shines for highly optimized HPC code
    • the integer unit is based on Atom and therefore slow
  • Mattias:
    • as Omni-Path uses the CPU, in the BNL setup some CPUs had to be sacrificed to allow the links to be filled

Data Management Steering Group update (Oliver Keeble)

Slides

  • Oxana:
    • shouldn't data and storage be presented as the top priority?
    • we might even consider renaming WLCG to Worldwide LHC Data Grid !
  • Oliver:
    • we will try and catch that notion in a vision statement
    • we intend to have a prioritized agenda by the time of the workshop
  • Ian:
    • the workshop has good opportunities to benefit from wider participation

Canadian Tier 1 relocation & reorganisation (Reda Tafirout)

Slides

  • no comments

WLCG Federated Storage demo (Andrey Kirianov)

Slides

The initial goal is to federate several Russian cites and CERN, later DESY joined with dCache.

EOS was the originally chosen echnology (xroot protocol in particular), dCache also tested (2.16, xroot as well). Several load tests performed and network monitored (perfsonar). X509 authentication. Servers mostly in VMs. For the tests, max bandwidth was 1 Gbps, often less. ALICE works much faster with root, ALTLAS - with FUSE, but the tests are not equivalent and a direct comparison between ALICE and ATLAS is impossible. Some issues discovered and reported to the developers.

Data I/O performance does not depend on the client-server link speed, but metadata I/O obviously does. Several data placement scenarios were tested.

Clearly best write performance is achieved when data are placed at a site with best connectivity; read is less dependent and can be equally fast in a fedarated access.

For dCache, FUSE mounts had to be implemented, and several other issuew were discovered, will be addressed by dCache, and tests will be re-done. So far there is no notable difference between dCache and EOS performance in terms of read/write speeds, with some exceptions.

All in all, both EOS and dCache perform well. Other technology yet to test is HTTP-based federations.

Deployment-wise, for a smaller site it is easier to just deploy disk servers as a part of a federation, rather than running a complete service.

EOS Federation across 300ms (Luca Mascetti)

Slides

Deployment across CERN and Wigner sites, 3x100Gbps links, all saturated.

Ran tests involving additional storage nodes in Taiwan and Australia, with either dual or triple replicas per file (1GB files).

Network paths were not dedicated, constantly changing, and sometimes were not even symmetric. However, the main problem is the TCP window. Authentication for write is most affected by the latency if the namespace server is far away, so a local namespace is desirable. New Zealand is expected to be added in future, looking for a US site as well.

Overall, scalablity is good, even with high latencies. A similar comment is made as to the previous talk: one should distinguish between federations and distributed storages; this arrangement is rather a distributed multi-site storage, not federation, since it involves distriuted pools with one central namespace manager.

Oliver is tasked with sorting out nomenclature.

Regional Data Federations (Fabrizio Furano)

Slides

Feasibility study of local HTTP federations.

Requirements:

  • non-intrusiveness
  • low costs of joining.
  • Another goal is to test how can cloud resources be exploited.

Involved 3 teams: ATLAS-Canada, ATLAS-Italy and Belle-II, some sites are shared between teams. Dynafed and DPM are evaluated, which can be mixed into a federation. X509 authentication, also signed URLs forwarding to authenticated users.

For ATLAS, testing involves complete Rucio and Panda integration. Belle-II tested a numer of different technlogies via Dynafed, a true federation.

Things look very promising so far, news to come soon.

Softdrive - CVMFS made easier (Dennis Van Dok)

Slides

Nikhef and SURFSara set up a single repository for e-science users.

Users can ssh in and upload s/w, which is then rsync-ed to the sites. The system uses the same path as CVMFS (is not a CVMFS mount though).

Strong focus on user friendlness. All started well, but scaled poorly, a new revision every 5 min turned out to be too heavy. Moved to a nested catalog per user, scales much better.

Users don't complain (about 30 of them, with 120 GB in 1.8 million files). There's still active development going on, more functionalities to be added. There are no special ways to prevent users from sharing datasets this way, except of imposing quotas.

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2017-05-22 - IanCollier
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback