Neutrino Platform EHN1 Cluster Computing Model For Use By ProtoDUNEs

Overview

Definitions

  • Data Taking Operations – during beam time (includes taking Beam generated and Cosmic Ray generated data at the same time)
  • Cosmic Ray Operations – sustained running taking Cosmic Rays
  • Commissioning – taking “ready” software and “hardware” and getting it ready for Operations
  • Analysis Operations Phase - After data taking and cosmic ray operations

Processing Offline Operations and Analysis activities take place in each of these phases.

Introduction to the EHN1 Neutrino Platform Cluster

Compute resources

For more information please follow this link.

Goal:

  # of Racks # of Nodes/Rack # of cores
Odd Numbered 6 23 138
Even Numbered Full 4 24 96
Rack08 1 18 18
Total 11 252 2016

Dec 2018:

Cores Total now . 1847 Hosts up 232 Hosts Down 20
Cores Final 2016 some trays in the Rack08 are missing nodes.

Validation and Metrics

Validation and Metrics for use of the Cluster are useful and need to be defined and collected:

  • Nagios and Ganglia plots are presented on the web regularly.
  • ProtoDUNE-DP/NP02 runs benchmarks after any change in the software.
  • ProtoDUNE-SP/NP04 depends on the centralized DUNE Continuous Integration system. It will be useful if the EHN1 Cluster is included as a test site for this.

EHN1 Cluster Configuration and Usage Model

Accounts

During BeamTime/Cosmic/ operations will have a np04-dataprod service account (and e-group) as a privileged account with a few administrative users only. Each account will have an associated description of the use

Outside of these times can add some job queues to allow additional users, mapped through the DUNE VO, to use the resources.

NP02 Model of Use

NP04 Model of Use

NP04 will let NP02 use their share of the NP EHN1 Cluster during NP02 Commissioning, Beam and Cosmic Data Taking.

(An agreement is in progress between the 2 experiments for an equivalent number of slots on the Tier-0 to be made available from the NP02 share for use by NP04. )

Given current input from NP02 the dates of NP04 relinquishing use of the EHN1 NP cluster will be from June 1 to Dec 1 2018. (PLEASE CHECK/ADD here..) Given current HEPSPEC benchmarks we expect to discuss a ratio of about 1 Tier-0 core day to 3 EHN1 NP Cluster Core day equivalent from DP on the Tier-0.

Outside of these dates NP04 has asked the DUNE Software and Computing group to include the EHN1 Computing Cluster as part of the transparently usable distributed offline facility. NP04 will work with NP and DUNE S&C on how best to accomplish this and make use of the EHN1 NP Cluster resource. (PLEASE CHECK if this is true: at the moment the Torque job management system currently preferred by NP02 is not supported as part of the DUNE S&C distributed offline facility)

Operations and Training - initial thoughts

The Joint Data Challenge (currently scheduled for April 9 2018) includes a component for Operations. EHN1 NP Computing Cluster is expected to be part of the that team which will work to define and then exercise a model for support and operations for the data taking. The team is currently coordinated by the DUNE S&C Coordinators - Andrew Norman and Heidi Schellman.

DUNE S&C provides training materials and sessions at Collaboration meetings. Need to input to them for EHN1 NP Computing Cluster. As input:

Need to do training how to use the overall distributed computing infrastructure including all clusters. This group and task will be responsible for release preparation; support for data preparation and physics analysis - e.g would like to have some extra cores and/or nodes for doing for analysis. What release do they want to use and how to include in the input path.

Individual things working order like that.

  • Where is the s/w infrastructure - where are the releases.
  • Where are the validations..
  • Need everything automatic and can be done with a press of a button

Arrow blue up Back to EHN1 Computing Main Page Arrow blue up Back to CENF-Computing Main Page

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2018-03-06 - NectarB
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CENF All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback