bibsched in a nutshell
The rules
- bibsched is a task queue comparable to a state machine.
- tasks can be scheduled with specific (desired) runtimes or runtime windows
- tasks can have different priority and leapfrog lower priority tasks
- tasks can be mutually exclusive
CFG_BIBSCHED_NON_CONCURRENT_TASKS (('bibupload',), ('oaiharvest', 'arxiv-pdf-checker'), ('bibindex:exfirstauthor', 'webauthorprofile', 'bibauthorid'), ('bibindex:exactauthor', 'webauthorprofile', 'bibauthorid'), ('bibindex:authorcount', 'webauthorprofile', 'bibauthorid'), ('bibindex:author', 'webauthorprofile', 'bibauthorid'))
- tasks can be monotasks -- i.e. no other task is allowed to run concurrently CFG_BIBTASK_MONOTASKS ('dbdump', 'inveniogc')
- multiple regular tasks of any priority can run concurrently CFG_BIBSCHED_MAX_NUMBER_CONCURRENT_TASKS (8)
- tasks in principle can run on different hosts but this is not enabled due to problems with deadlocks
- there are periodic task with a given sleep time between runs and one-off tasks
- tasks are either in status waiting, scheduled, running, about to sleep, sleeping, about to stop, stopped, error, or states not displayed in the monitor like error acknowledged
- a periodic task past its sleep time becomes runnable again and enters the queue as a regular task with its given priority
- a runnable higher priority task will send a sleep signal to all lower priority tasks and take the first available slot after a task completes or sleeps or another slot becomes available
The game set
so with those rules you can see up to 8 running tasks in bibsched monitor
or you can see one task in status "about to sleep" and nothing else, because the next highest priority task is a monotask waiting on that task to sleep, and any conceivable scenario in between.
some tasks take up to an hour to sleep -- some tasks simply don't have sleep checkpoints.
some tasks are resource heavy and delay a lot of other tasks and periodical higher priority tasks will repeatedly leapfrog waiting tasks with lower priority and therefore the queue may appear to not move
like the game of life you have to watch for a while to see patterns emerge
The moves
- when a task is in error look at the log: hit "l" in the monitor you must understand the underlying issue to either ACK or restart the task
- if it is a periodic task, fix the underlying issue and initialize the task "i"
- if it is a one-time task fix the underlying error and either run the task again, or simply ACK it "k"
- if it is an important task that only runs inside a given time window, consider running it as a one-off outside that time window -- e.g. dbdump failed to network issue and we really want a full dbdump over the weekend -- schedule a new task doing that outside of the regular Saturday night time window
- then initialize the task in error state to run on regular schedule
- sometime the queue looks stalled but there are jobs running below the fold -- i.e. scroll down to get the whole picture
- there are some resource intensive tasks which only run every 14 days -- they can make the queue appear jammed
- there are some long-running tasks which run every 30 days but might run for 10 days straight and take up resources. these will also affect the overall behaviour of the queue
- there are several tasks that frequently fail due to transient issues, some depending on external sources. this includes arxiv-pdf-checker, certain bibtasklets (prefix bst_..) etc. For periodic tasks you simply initialize. For one of tasks it depends. If the task log contains an Asana reference, simply ACK the task. If the task log contains a "logs emailed to talk to said curator. Otherwise consider the type of task and whether information might be lost before you ACK it.
How to play
when there is an actual deadlock -- which is possible but very rare by now (we fixed most common issues) -- you have to use sound judgement on which task to kill. it's not always obvious and you also want to re-schedule that task afterwards
sometimes the only cure is to directly alter the schTASK table, because the bibsched monitor won't react
sometimes you have to alter state or archive a whole bunch of tasks in schTASK table
Most of the time things are pretty straightforward, though, and all these problem scenarios are only going to happen when Thorsten is away.
2019-04-09