DIANE Introduction for Students...

Application Patterns

  • iterative decomposition (looping) typical for many applications
    • data analysis (ATLAS/Athena, ntuples)
    • simulation (Geant4, Autodock)
    • others: legacy fortran applications (ITU)
  • task farming or master/worker
    • set of independent tasks
  • other patterns:
    • geometric decomposition: SPMD
    • recursive decomposition divide-conquer
    • functional decomposition: pipelining

Everyday life in the Grid: the truth will set you free

  • efficiency:
    • submission time: 10s
      • cannot submit large numbers of jobs in short time
    • submitted->running: minutes/hours/days (large variation)
      • it is hard to predict when the jobs will start executing
    • latency of the information system: ~2 minutes
  • failures:
    • system errors:
      • VOs misconfigured
      • sites having strange configurations
      • temporary service unavavailbility (RBs, CEs, WNs, cooling)
      • example: GoodGeant4Sites
    • application errors

Middleware Metascheduling

  • Resource Broker may resubmit failed jobs automatically
    • RetryCount in the JDL
  • Some CEs can do it too (for example Condor)
  • Problems with metascheduling:
    • the job must go via a long resubmission cycle

Master-Worker Metascheduling in the User Space (DIANE)

  • overlay "virtual" cluster
  • keep direct connections open for fast turnaround time
  • possible to distinguish application/system errors
    • customize error recovery
  • pulling tasks from master queue
    • automatic load balancing
  • examples from HEP:
    • DIANE - a generic framework for master/worker processing
    • DIRAC - agent-based production system of LHCb - 'permanent' overlay
    • PROOF - parallel analysis extension to ROOT

Typical usage scenario of DIANE

Conserving resources and not abusing the time share with other users!

  • run master on your local server and activate the job
  • submit with ganga a number of worker agents to the distributed system (LSF, Grid,...)
  • as soon as a worker agent is executed it contacts the master and asks for work
  • workers which cannot contect the master, die after few seconds

Greedy resource "booking", very nasty to other users

  • run master in the inactive mode
  • submit worker agents
  • when a desired number of workers is available, activate the job
  • e.g. ITU scenarios

Example

  • look at DIANE main page

-- JakubMoscicki - 07 Jul 2006

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2007-03-08 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback