I/O-bound and CPU-bound tagging


Many sites have also expressed interest with scheduling based on Resource Constraints, we should like to give sites more options.

General explications and context

I/O throughput on sites storage systems have an optimum Load, and global throughput can be reduced when a load above a certain threshold is reached. Excessive Storage load could also lead to Storage system instability.


  • LRMS or Local Resource Management Systems
LRMS are systems that schedule to execution of jobs in clusters, this covers both Cloud and Batch queue.
  • VO or Virtual Organisation
A VO is a collection of scientists working in a related field doing similar processing.
  • SE or Storage Element.
A SE is a storage resource usable by one or more VO's.


  • Sites may support more than one VO.
  • Jobs presented to the LRMS may not give any scheduling hints to LRMS except VO.
  • Sites are aware of load on various site resources and would like to schedule jobs based on this knowledge.
  • Some jobs use less IO resources than others (Monte-Carlo jobs V Analysis jobs)
  • Sites can contribute to higher job efficiency by optimising scheduling of jobs from multiple VO communities.
  • Sites Storage systems have an optimum throughput above this threshold of requests their aggregate throughput may reduce.
  • DESY have found it beneficial to schedule user jobs per node, and fill the remaining slots with Monte-Carlo jobs.
  • Grid worker nodes can support more CPU bound jobs than I/O bound jobs.
  • LRMS have rich configuration for scheduling.

Use cases

  • Site wants to schedule only CPU bound jobs as IO load is optimal on SE.
  • Site wants to avoid all jobs accessing the same file.

The questions was raised, whether cpusets or affinities (see the relevant part of the WM TEG wiki) could solve this issue at a site level. It turns out these are quota tools, preventing one job from starving another job of resources; each works at a Single Worker Node level. We believe these techniques are complimentary to job tagging and do not remove the use case for it.


The original proposal was to tag jobs and have a boolean list of constraints, i.e. a job could be flagged as either CPU or I/O bound. The WLMTEG decided that the proposed changes to the JDL should have extra flexibility. A weighted value between 0 and 1 (with 0 being CPU-bound and 1 I/O-bound); initially, just the values 0 and 1 will be used to gain experience with the idea. Further Tags may be added at a later date.

As knowledge of job characteristics improve, a VO can specify different values; at a later stage we may decide to define an exact metric. This parameter does not translate into a specific resource to be allocated to the job, but may help sites distributing the jobs in a convenient way. It is up to the experiments to decide which of their jobs will be tagged one way or the other.

There was a long discussion on the accuracy of constraint tagging. One first conclusion was that users adding constraint tagging is to be avoided, since users are commonly unaware of such details and inconstant in their actions. In general, accuracy in the scalar value was not seen as important, and sites must expect user communities will not be consistent in weighted values. On the other hand, experiments do know some jobs will be CPU or I/O bound, and tagging these jobs as such will help some sites scheduling them more efficiently. It is recognized that jobs often change constraints during execution, and precise measurement is not seen as easy.

We therefore propose that Resource Constraint Tags should be agreed and added to the JDL as a set of weighted values between 0 and 1; we expect Resource Constraint Tags may not be limited to I/O and CPU bound constraints.

These Constraint Tags should be honored by the CE and passed to the LRMS in an agreed way, allowing sites to customize scheduling if they so desire. Sites that do not want to use to make use of Resource Constraint Tags will not need to use them. Pilot frameworks should honor the Resource Constraints between the pilot and the jobs the pilot executes.

-- DavideSalomoni - 03-Feb-2012

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 2012-03-02 - OwenMillingtonSyngeExCern
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback