HEP workloads


Purpose

  • Can a set of HEP workloads became the future benchmarking suite?Probably yes, but it needs to be proven
    • Short term goal
      • Build standalone benchmarking applications based on the HEP workloads
      • Study the features of the experiments’ workloads
    • Long term goal
      • Build, distribute, support the suite

Requirements

  • Must be adopted not only by individuals inside the experiments but also by external performance experts, site procurement teams and hardware vendors
    • Usability Simple instructions: Insert disk, run shell script, wait, and read and report score *
      docker run --rm -v /tmp/results/:/results $IMAGE
    • Accessibility No remote data access from vendor‘s office
      • With containers the benchmark can be distributed as full tarball in a drive
    • Free License This follows the experiment code license
    • Maintenance Long term commitment from the experiments, in order to support the software for several years
      • Simplified by the code distribution in cvmfs. Data preservation initiatives can support it

Documentation

Task list

Legend:

  • Yes / Done: Done
  • Work in progress, under construction: Work in progress
  • Blank box : To be done. Looking for volunteers
  • Warning, important: issue need fix
  • Stop : showstopper

Infrastructure


# Description Status
7 Migrate gitlab repository to https://gitlab.cern.ch/hep-benchmarks Work in progress, under construction
1 Implement a fully automated procedure to build a standalone container image for each HEP reference workloads Yes / Done
2 Create containers starting from Experiments’ recipes Yes / Done: GEN-SIM
Work in progress, under construction: DIGI-RECO
3 Implement Gitlab Continuous Integration approach for long term maintainability
(see https://gitlab.cern.ch/giordano/hep-workloads/pipelines)
Yes / Done
4 Consolidate the CI approach Work in progress, under construction
5 Integration in the benchmarking suite Work in progress, under construction
6 Test migration to singularity containers Yes / Done

Workloads

For those activities the feedback of the experiments is expected

Status of inclusion of HEP workloads

To run a HEP standalone container

 docker run --rm --network=host -v /tmp/results/:/results $IMAGE
 

# Stage Release Orchestrator Run w/ cvmfs Standalone container Validation standalone container
LHCb Full Sim   Yes / Done Yes / Done Yes / Done

IMAGE=gitlab-registry.cern.ch/giordano/hep-workloads/lhcb-gen-sim:latest

Yes / Done Yes / Done
            (Rel.) Spread Robustness
ALICE Full Sim

v5-09-XX-15

v5-09-09-01-1

Yes / Done Yes / Done Yes / Done

IMAGE=gitlab-registry.cern.ch/giordano/hep-workloads/alice-gen-sim:latest

Warning, important [1] Warning, important [2]
ATLAS Gen 19.2.5.5 Yes / Done Yes / Done Yes / Done

IMAGE=gitlab-registry.cern.ch/giordano/hep-workloads/atlas-gen-bmk:latest

Warning, important Yes / Done
Sim 21.0.15 Yes / Done Yes / Done Yes / Done

IMAGE=gitlab-registry.cern.ch/giordano/hep-workloads/atlas-sim-bmk:latest

Yes / Done Work in progress, under construction [3]
Digi-Reco 21.0.23 Work in progress, under construction Blank box Blank box Blank box
CMS Gen-Sim 10_2_9 Yes / Done Yes / Done Yes / Done

IMAGE=gitlab-registry.cern.ch/giordano/hep-workloads/cms-gen-sim:latest

Work in progress, under construction Warning, important [4]
Digi Work in progress, under construction Blank box Blank box Blank box
Reco Work in progress, under construction Blank box Blank box Blank box

Issue List
  • [1] Too wide fluctuation rate
  • [2] Around 2% of threads crashing, resulting in an error rate of 5...20% depending on the number of copies
  • [3] Bug in results parser of in the top-level script, sometimes running before the last output file is complete (already fixed?)
  • [4] Small amount of threads crashing, ending up in an endless loop, or entering infinite wait state

Check list


# Description Alice Atlas CMS LHCb
7 Can the container image size be reduced with smaller input data? Blank box Blank box Blank box Blank box
      gen sim    
1 Confirm that running workload is the desired/updated one for benchmarking Yes / Done Yes / Done Blank box Yes / Done Yes / Done
2 Can it run without any assumption, but /cvmfs and local input files? Can it run without internet WAN access? Blank box Blank box Blank box Blank box Blank box
3 Identification of the benchmarking metrics from log files Blank box Blank box Blank box Blank box   Blank box
4 Define running conditions (#threads, #events), “equalize” job duration, normalize scores Blank box Blank box Blank box Blank box   Blank box
5 Study reproducibility of results Warning, important Warning, important Yes / Done Work in progress, under construction Yes / Done
6 Error detection, robustness Warning, important Yes / Done Work in progress, under construction Warning, important Yes / Done
-- DomenicoGiordano - 2019-02-01

Experiment contact person

Exp name email
LHCb Andrea Valassi andrea.valassi@cernNOSPAMPLEASE.ch
ALICE Costin Grigoras grigoras@cernNOSPAMPLEASE.ch
ATLAS Lorenzo Rinaldi Lorenzo.Rinaldi@boNOSPAMPLEASE.infn.it
  Martina Pagacova Martina.Pagacova@cernNOSPAMPLEASE.ch
CMS David Lange David.Lange@cernNOSPAMPLEASE.ch
  Andrea Sciabà Andrea.Sciaba@cernNOSPAMPLEASE.ch
Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2019-04-15 - ManfredAlefExternal
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    HEPIX All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback