User Level Scheduling in the Grid: the outline of the technology potential and the research directions


The User Level Scheduling (ULS) techniques have been very successfully
applied in a number of application areas such as bio-informatics,
image processing, telecommunications and physics simulation. The ULS
helped to improve Quality of Service (QoS) in the Grid, what have been
experienced as reduced job turnaround time, more efficient usage of
resources, more predictable, reliable and stable application
execution. To be universally adopted however, the ULS techniques must
be proved to be compatible with the fundamental assumptions of the
Grid computing model such as respect for the resource usage policies,
fair-share toward other users, traceability of user's activities and
so on.  In this talk we will the outline the main benefits and
possible pitfalls of the ULS techniques. We will also try to introduce
initial research ideas for modeling and measuring of the Quality of
Service on the Grid and for analysis of the impact of the ULS on other
users (fair-share). Finally we will present ideas for enhanced support
for certain applications such as the iterative algorithms or
parameter-sweep.

This presentation builds on the "The Quality of Service on the Grid with 
user-level scheduling" presented at UvA on September 1st 2006.

recall of the current activities

Placeholders and late binding

  • the technology is also called: placeholder, late binding, pilot jobs
  • you do not send specific job to the resource, you acquire the resource and assign the job at runtime, you free the resource when you are done
  • some examples:
    • HEP production systems (centralized task queue, server acts on behalf of the user): Alien (Atlas), DIRAC (LHCb), PANDA (Atlas)
    • Condor glide-ins (build a virtual Condor pool from Globus resources)
    • Boinc (CPU cycle scavanging)

User Level Scheduling

  • it's the late binding technology
  • the scheduler has the application knowledge (may make better error recovery or load balancing decisions)
  • it runs in the user space (resources are accountable for and tracability is not compromized)
  • it is capable of creating transient/volatile overlays on top of the regular infrastructure ("virtual clusters")
  • DIANE implementation:
    • not specific to any particular technology or infrastructure (Grid, LSF/PBS, explicit IP host list + mixing of the resources)
    • portable (python and CORBA)
    • self-contained and small distribution with fully automatic installation

Outstanding issues of User Level Scheduling

  • Improvement of QoS characteristics
    • extra reliability (fail-safety and application-specific fine tuning)
    • reduction of stretch (aka makespan, turnaround time)
    • stabilization of the output inter-arrival rate (which is also more predictable)
  • Potential flaws
    • effect on fair-share: would other users be penalized by ULS jobs?
    • potential harmfullness of the redundant batch requests, estimate the level of redundancy

Area of applicability

Research Directions

  • effect on fair-share: would other users be penalized by ULS jobs?
    • fair-share can be measured (find the paper)
    • can be modeled and simulated

Engineering Directions

Use-Cases

-- JakubMoscicki - 06 Dec 2006

Edit | Attach | Watch | Print version | History: r7 | r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2006-12-06 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback