A proposal for a pool thread based implementation for status polling in the JobTracking component

Contact Carlos Kavka

Introduction

An implementation for a threaded status monitor is proposed by using a pool scheduler module, a pool of threads and two queues for communication. The pool scheduler module assigns workload to threads, which in parallel perform status polling. A model based on token passing that identify subsets of jobs guarantees synchronization between the threads and the pool scheduler, allowing dynamic assignment and re-assignment of jobs.

The model

The model is shown in figure 1. The pool scheduler partitions the set of jobs to be watched for into a number of subsets, assigning an identifier ( id) to each one of them. After that, the pool scheduler inserts the identifiers into the input queue. All threads in the pool are started when the pool is created, and as a first action, they wait for an id to be read from the input queue. After getting the id, the thread performs the get status operation for all jobs associated to that id, inserts the id into the output queue, and read again the input queue to get a new assignment. When the pool scheduler gets an id in the output queue, it can perform the re-assignment of jobs associated to that id.

poolthread.png

Figure 1: The pool thread based model.

As an example, let's assume that there are n =10 threads in the pool, and m =100 jobs to assign. Note that all threads in the pool are started, and are waiting for requests in the input queue. The pool scheduler can implement any policy, but let's assume that current policy suggests to partition the jobs in s =2 sets, having two threads with 50 jobs assigned to each one of them. The pool scheduler then assigns id =1 to a set of 50 jobs, and id =2 to the other set of jobs, inserting both id =1 and id =2 into the input queue. Two threads will be waken up. Let's assume that thread t =1 gets id =1, and thread t =2 gets id =2. The first thread then will get the status of the first set of jobs, and the second one the status of the other set of jobs. Let's suppose the thread t =2 finishes first, inserting the id =2 into the output queue. The pool scheduler then can update the subset of jobs assigned to id =2, determining how many jobs have finished (which will not be included any more on future get status operations) and the number of new jobs. Let's assume 20 jobs with id =2 have finished, and there are 10 new jobs. The pool scheduler can decide to assign the id =2 to these new jobs, and insert id =2 in the input queue. A thread will be awaken and will start processing these 40 jobs. Note that any idle thread can be selected, not necessarily the thread t =2.

Model properties

The model has a number of interesting properties:

  1. The model guarantees that no more that n threads will be in execution at any time, since the pool defines exactly n threads. However, threads can be added dynamically if necessary.
  2. Threads are not created and destroyed during processing. All threads are started together, and remain alive waiting for work to be performed.
  3. Since threads do not have associated a set of jobs to be watched for, but an id which works as a token, no other thread can be working at the same time on the same set of jobs.
  4. Even if threads work concurrently, they are synchronized under control of the pool scheduler, since they perform work, report it, and wait for new work to be assigned. A thread has no fixed set of jobs to watch for, meaning that it is not necessary to include or remove jobs from internal structures of the thread when the assignment is changed.
  5. The pool scheduler has full control of the assignment policy. Even if the pool is defined with n threads, the pool scheduler can determine how many of them are performing get status operations (see example in previous section where there are only 2 threads in execution on a pool with n =10 threads).
  6. There is no relation between the number of sets of jobs ( s) and the number of threads ( n), which can be adjusted independently based on current needs. For example, the pool scheduler can partition the set of jobs in s =30 subsets, even if there are only n =5 threads in the pool. In this case, the pool scheduler will insert the 30 identifiers in the input queue. Five threads will get an id, starting to process the associated jobs. When a thread finishes, it will look for more work in the queue, getting immediately the next id from the queue.
  7. Token re-get operation can be performed by just restarting a thread. In this way, the pool scheduler can perform stronger reassignment operations if necessary.

Implementation

The class diagram is shown in figure 2.

The class PoolScheduler implements the pool thread scheduler as discussed above. The method applyPolicy() implements the scheduling policy, arranging jobs in groups and returning the list of id of all the groups in which the job set has been partitioned.

The class Worker implements the threads in the pool. The method run() gets one id from the queue, updates the status of all jobs associated to this id, and returns the id through the output queue.

The class PoolThread implements the whole pool thread based model. It contains an instance of a PoolScheduler and potentially many instances of the Worker class, one of them for each thread.

The class JobStatus deals with job handling operations. The method addNewJobs() gets jobs from the system not yet assigned to any group, the method doWork(group) gets the status of all jobs associated to the group, and the method removeFinishedJobsI(group) remove the jobs from the group that do not need to be watched any more. Note that a single instance of this class is defined, but since all the methods in this class are called by threads with a different subset of jobs to process, no race conditions arise.

JTPoolThread.png

Figure 2: The class diagram of the pool scheduler.

The activity diagram of the pool scheduler is shown in figure 3. The scheduler gets information about new jobs, partition them into subsets by assigning identifiers to each group, and inserts the identifiers into the input queue to distribute workload to the threads. As soon as a thread finishes, the pool scheduler updates its information, adding new jobs and reassigning the workload again.

admin.png

Figure 3: The activity diagram of the pool scheduler.

The activity diagram of a single thread is shown in figure 4. A thread waits for an id in the input queue. After getting the assignment, it gets the list of jobs to be watched for, and performs the get status operation for all the jobs. When the operation is completed, it inserts the corresponding id into the output queue.

thread.png

Figure 4: The activity diagram of a single thread.

The communication concerning the assignment is performed through the input and output queues, which are safe for concurrent operations. Since the identifiers work as tokens, no other thread or process can work on the same data, so no possibility of race conditions arise, meaning that standard structures can be used to store information about jobs. However, it is suggested to store this information into a database since 1) a database server is available, 2) the implementation can benefit from transaction based operations and 3) it provides a means to have an external auditory of the thread pool based monitoring operation.

-- CarlosKavka - 21 Aug 2007

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2007-08-24 - CarlosKavka
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback