-- AdriaCasajus - 16 Feb 2009

DIRAC Task Queue Matching

This page describes how the DIRAC handles matching using Task Queues.

Scenario

One of the most CPU consuming tasks any workload management system has to handle is matching. Every time a resource is available, the workload management system has to decide which job will use it. To do so, DIRAC WMS has to select a job that matches the resource attributes but also one that follows the priority rules. And it has to be able to select one job from up to a million waiting jobs.

Matching phases

Before the matching takes place, DIRAC tries to group together jobs to ease matching. All jobs with identical requirements are grouped in one TaskQueue before they can be matched. By doing this the number of possible choices is narrowed down.

Task Queue creation

Each job will be placed into a _TaskQueue that has the same requirements as the job.

Task Queue Requirements

Job requirements used to group into TaskQueues are:

  • Owner DN : DN of the user that submitted the job
  • Owner Group : Group used to submit the job by the user
  • Setup : Setup to which this job was submitted
  • CPU Time : Required CPU time for the job to run
  • Submit Pools : Pools to be used to submit pilots for this job
  • Pilot Types : Pilot types that can run this job (possible choices are private or nothing right now)
  • Sites : Sites that can run this job
  • GridCEs : Grid CEs that can run this job
  • GridMiddlewares : Grid middleware required to run this job
  • BannedSites : Sites that can not run this job
  • LHCbPlatforms : Operating system and architecture required

Task Queue Implementation

TaskQueues are implemented as a DB schema. There is a main table for TaskQueues containing:

* TaskQueue id * Owner DN * Owner Group * Setup * CPU time

CPU time has an infinite amount of possible values so it's restricted to 4 different values (500, 5000, 50000, 300000). The Job CPU time is moved to the next value. For instance if job requires 10 CPU time, the TaskQueue will have 500. If the job CPU time exceeds 300000 the TaskQueue will have 300000 at maximum.

Resource matching

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2009-02-17 - AdriaCasajus
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback