Neutrino Platform EHN1 Cluster Computing Model For Use By ProtoDUNEs

Definitions

  • Data Taking Operations – during beam time (includes taking Beam generated and Cosmic Ray generated data at the same time)
  • Cosmic Ray Operations – sustained running taking Cosmic Rays
  • Commissioning – taking “ready” software and “hardware” and getting it ready for Operations
  • Analysis Operations Phase - After data taking and cosmic ray operations

Processing Offline Operations and Analysis activities take place in each of these phases.

Validation and Metrics

Validation and Metrics for use of the Cluster are useful and need to be defined and collected:

  • Nagios and Ganglia plots are presented on the web regularly.
  • ProtoDUNE-DP runs benchmarks after any change in the software.
  • ProtoDUNE-SP depends on the centralized DUNE Continuous Integration system. It will be useful if the EHN1 Cluster is included as a test site for this.

Overview of EHN1 Cluster

Goal:

  # of Racks # of Nodes/Rack # of cores  
Odd Numbered 6 23 184
Even Numbered 24 5 192
Total 11 258 2064

Dec 2018:

  • CPUs Total: 1848; Hosts up: 232; Hosts down: 20; some trays in the Rack08 are missing nodes.
  • Total (final) setup : 232+20 =252 hosts , 2016 cores

EHN1 Cluster Configuration and Usage Model

Accounts

During BeamTime/Cosmic/ operations pDUNE-DP will have a np04-dataprod service account (and e-group) as a privileged account with a few administrative users only. Each account will have an associated description of the use

Outside of these times can add some job queues to allow additional users, mapped through the DUNE VO, to use the resources.

ProtoDUNE Dual Phase Model of Use

ProtoDUNE Single Phase Model of Use

NP04 will let NP02 use their share of the NP EHN1 Cluster during NP02 Commissioning, Beam and Cosmic Data Taking.

(An agreement is in progress between the 2 experiments for an equivalent number of slots on the Tier-0 to be made available from the NP02 share for use by NP04. )

Given current input from ProtoDUNE-DP the dates of full use by NP02 will be from June 1 to Dec 1 2018.

Outside of these dates NP04 has asked the DUNE Software and Computing group to include the EHN1 Computing Cluster as part of the transparently usable distributed offline facility. ProtoDUNE-SP will work with NP and DUNE S&C on how best to accomplish this and make use of the EHN1 NP Cluster resource. (PLEASE CHECK if this is true: at the moment the Torque job management system currently preferred by ProtoDUNE-DP is not supported as part of the DUNE S&C distributed offline facility)

Operations and Training - initial thoughts

The Joint Data Challenge (currently scheduled for April 9 2018) includes a component for Operations. EHN1 NP Computing Cluster is expected to be part of the that team which will work to define and then exercise a model for support and operations for the data taking. The team is currently coordinated by the DUNE S&C Coordinators - Andrew Norman and Heidi Schellman.

DUNE S&C provides training materials and sessions at Collaboration meetings. Need to input to them for EHN1 NP Computing Cluster. As input:

Need to do training how to use the overall distributed computing infrastructure including all clusters. This group and task will be responsible for release preparation; support for data preparation and physics analysis - e.g would like to have some extra cores and/or nodes for doing for analysis. What release do they want to use and how to include in the input path.

Individual things working order like that.

  • Where is the s/w infrastructure - where are the releases.
  • Where are the validations..
  • Need everything automatic and can be done with a press of a button

-- RuthPordes - 2017-12-12

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2017-12-12 - RuthPordes
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CENF All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback