Faster processing of Maintenance BibSched tasks in Inspire
Motivation
BibSched is the module responsible for managing tasks modifying data in Inspire.
Some tasks executed by Bibsched are high-priority and should not be blocked by long-running maintenance tasks.
In this page I propose the infrastructure that would allow long-running tasks to be seemless from the point of view of the main Invenio installation.
The purpose of having all the tasks managed by Bibsched is assuring the consistency of the data. This may be achieved also by different means.
The effort necessary to implement such a solution should be reasonably small.
Proposed solution
The solution utilises the replication of the database.
The replication consists of selecting one master database (currently the only one) and making it send every wirite operation to the replica, where it is replayed.
Execution of long data-oriented tasks
Usually, long-running data-oriented tasks have the purpose of processing large portion of the database and produce some output.
This category of tasks include dumping the content of the entire database or calculating indices.
The property we need to assure is that those tasks can see the state of the database from the time of beginning of their operation.
Usually results of those tasks are not crucial for a correct execution of other tasks from the Bibsched queue.
The execution of a long-running task should start in a following manner:
The replication should be stopped so that the task can see a consistent view of the database from before its beginning.
The task should write its results to a temporary file that can be later uploaded to the main daemons machine.
After the task has finished, it should resync with the bibsched queue:
The purpose of spawning a new task is to upload the newly calculated data (for example an index) to the database.
The tasks runs in the following manner:
Resuming of the replication has to involve replaying all the changes that have been made since the suspension (They can be for example redirected to a log file instead of the replica server and later replayed from this file).
The assumption is that uploading of results is much faster than the calculation of the entire task.
Benefits
This schema should allow the
BibSched queue not to be blocked for long period of time and allow all the tasks to pass quickly.
This schema will not improve the performance of large upload tasks.
Besides the efficiency, replicas may provide a robous fail-over mechanism.
In the case of main database failure, one of the replicat might take over the responsabilities of the master without the administrator intervention, increasing the reliability of the Inspire service.
The scalability
If we need more throughput, we can introduce new replicas
--
PiotrPraczyk - 28-Mar-2011