Faster processing of Maintenance BibSched tasks in Inspire

Motivation

BibSched is the module responsible for managing tasks modifying data in Inspire. Some tasks executed by Bibsched are high-priority and should not be blocked by long-running maintenance tasks. In this page I propose the infrastructure that would allow long-running tasks to be seemless from the point of view of the main Invenio installation.

The purpose of having all the tasks managed by Bibsched is assuring the consistency of the data. This may be achieved also by different means.

The effort necessary to implement such a solution should be reasonably small.

Proposed solution

The solution utilises the replication of the database. The replication consists of selecting one master database (currently the only one) and making it send every wirite operation to the replica, where it is replayed.

MainArchitecture.png

Execution of long data-oriented tasks

Usually, long-running data-oriented tasks have the purpose of processing large portion of the database and produce some output. This category of tasks include dumping the content of the entire database or calculating indices.

The property we need to assure is that those tasks can see the state of the database from the time of beginning of their operation. Usually results of those tasks are not crucial for a correct execution of other tasks from the Bibsched queue.

The execution of a long-running task should start in a following manner:

Spawning.png

The replication should be stopped so that the task can see a consistent view of the database from before its beginning.

The task should write its results to a temporary file that can be later uploaded to the main daemons machine. After the task has finished, it should resync with the bibsched queue:

Finalize.png

The purpose of spawning a new task is to upload the newly calculated data (for example an index) to the database. The tasks runs in the following manner:

Resume.png

Resuming of the replication has to involve replaying all the changes that have been made since the suspension (They can be for example redirected to a log file instead of the replica server and later replayed from this file).

The assumption is that uploading of results is much faster than the calculation of the entire task.

Benefits

This schema should allow the BibSched queue not to be blocked for long period of time and allow all the tasks to pass quickly. This schema will not improve the performance of large upload tasks.

Besides the efficiency, replicas may provide a robous fail-over mechanism. In the case of main database failure, one of the replicat might take over the responsabilities of the master without the administrator intervention, increasing the reliability of the Inspire service.

The scalability

If we need more throughput, we can introduce new replicas

Scalability.png

-- PiotrPraczyk - 28-Mar-2011

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng Finalize.png r1 manage 5.2 K 2011-03-28 - 14:09 PiotrPraczyk  
PNGpng MainArchitecture.png r1 manage 17.8 K 2011-03-28 - 14:08 PiotrPraczyk  
PNGpng Resume.png r2 r1 manage 5.9 K 2011-03-28 - 14:09 PiotrPraczyk  
PNGpng Scalability.png r1 manage 50.7 K 2011-03-28 - 14:09 PiotrPraczyk  
PNGpng Spawning.png r1 manage 13.2 K 2011-03-28 - 14:08 PiotrPraczyk  
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2020-08-30 - TWikiAdminUser
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox/SandboxArchive All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback