System Design: BibCatalogue
1. Introduction
BibCatalogue is a tool to ease tracking of various cataloguing tasks such as the status of various batches in the holding pen, or the tasks partially forwarded onto fellow cataloguers.
2. Use cases
Journal Harvest
An OAI Harvest from a journal comes in to the system and is placed in a holding pen, there are 100 new articles. There are two catalogers trained to handle this journal. When they login at the start of their day, they see that there are 100 new articles, and they begin working on them. They take them one by one to check for duplicates (done where?
SystemDesignBibEdit or
SystemDesignBibMerge ) and correct metadata in
SystemDesignBibEdit . They never work on the same document, and see the ticket count dropping as they both work.
Delegated Task
A cataloger is checking authors and institutions (using
SystemDesignBibEdit) on a paper and finds that it has 2000 authors from the CMS collaboration. Author input for this paper needs to go through a different procedure so the cataloger flags the record for further work by an expert in that routine. When that expert goes into the system, s/he can see the newly added ticket in the queue and launch the correct action on it.
Email response
A user emails that a citation is missed in our record for 0808.0123 a cataloger opens the email (i.e. in RT) clicks a link to the editing screen for this record, makes some changes using
SystemDesignBibEdit and clicks
Submit Changes
(1) and
Done Ticket
(2) which (1) submits correction to
BibUpload and (2) sends a stock reply back to the user notifying them of the change.
2 hard cases for unified search (to be postponed after a functional proto)
Ellis Harvest
An OAI harvest comes in with 2000 new documents. Juanita Ellis informs us that 35 of them are hers and she'd like them processed quickly. We accede to her request, and need to find, within RT, the tickets that correspond to
author:ellis
and work on them first.
Suspicious Folks
A cataloger is getting citation correction requests from an email
scoap3@scoap3.org
. He/she finds many missed citations to papers of S. Mele and H O'Connell that we haven't extracted. We discover that many of these "missed references" are not in the original paper or the revisions or indeed anywhere. Now we'd like to review all the changes we've made based on tickets submitted from this email address.
3. Workflow
For several enrichment workflow examples, please see:
4. Mock-up screenshots
Setting the username/password for bibcatalog_system in user account.
Ticket management interface in the left of BibEdit
5. Architecture
Bibcatalog consists of functions that manipulate tickets in an external ticket request tracker. The first implementation will use RT as a tracker. Tickets can be created by various modules (e.g. bibharvest), but accessed by the request tracker GUI, or by its interface to BibEdit.
There is a notion of queues, if the request tracker supports them.
bibcatalog.py is configured with the following options:
CFG_BIBCATALOG_SYSTEM
(If this is not set, bibcatalog actions are not available. Currently, the only supported value is 'RT')
CFG_BIBCATALOG_RT_COMMAND
(path+full command to RT command line, like /bin/rt)
CFG_BIBCATALOG_RT_URL
(how does the command line access RT)
CFG_BIBCATALOG_QUEUES
(an array of queues)
Moreover, a user id and password for each cataloguer for RT access need to specified in user preferences.
The classes are as follows:
bibcatalog.py
Creates an instance of the class that has been configured for this installation.
bibcatalog_system.py
An interface, contains prototypes for each function. Subclasses must conform to this.
bibcatalog_system_rt.py
An implementation for RT.
6. API
bibcatalog.py consist of ticket operations, as follows.
- check_system(uid) returns an empty string if things are OK, and an error string otherwise.
- ticket_search(uid, recordid, subject, text, creator, owner, date_from, date_until, status, priority) search tickets by various criteria.
- ticket_submit(uid, subject, recordid, text, queue, priority, owner) submit a ticket and initially set its fields.
- ticket_assign(uid, ticketid, to_user) assign a ticket to someone.
- ticket_set_attribute(uid, ticketid, attribute, new_value) sets an attribute. These are members of TICKET_ATTRIBUTES in bibcatalog_system.py.
- ticket_get_attribute(uid, ticketid, attrname) returns the value of an attribute.
- ticket_get_info(uid, ticketid, attrlist) return ticket information as a dictionary.
7. Talk
Ideas from Tibor and Travis' brainstorming session in Sept 08.
- We will use RT to manage workflow
- RT will certainly be used to handle incoming user requests as it is currently in SPIRES
- RT will also create tickets for every other task for catalogers to deal with
- RT will be used for inspire system level support, (possibly CDS as well???)
- RT needs to be scalable to thousands of tix per day as opposed to the hundreds it gets now
- this appears to be no problem, but travis will investigate
- RT authentication should be the same as authentication for calatogers in INSPIRE
- May point to a need to separate RT-INSPIRE from other RT queues at SLAC
- Investigate authentication possibilities (travis)
- Changesets in BibUpload (SystemDesignHoldingPen) should correspond with tickets in RT
- For the most part they will match 1 to 1 however there may be some cases in which there is no one to one
- Large collections of changesets that are generated from CernBibCheck and passed to HoldingPen may generate only one RT ticket such that cataloguers will need only look at one Ticket and resolve it all at once (configurable in BibCheck itself...)
- Some Changes are automatically propagated through the HoldingPen, requiring no cataloger interference. These might not need tickets, though they could have them if needed...
- Either the ChangeSetQueue (part of Holding Pen in BibUpload) will store RecID <-> Ticket number correspondence or it can be stored in a separate table in BibCatalog
- Use case journal Harvest above is satisfied by having BibHarvest open tickets (via BibCatalog) for every
- BibCatalog's primary responsiblity for providing the Invenio wrapper around RT
- Invenio modules should talk only to BibCatalog, not to RT directly
- Can talk to RT via REST interface, perl API, email, possibly others
- We might consider RT talking to other modules directly but not for core functionality, which should be handled via BibCatalog.
- RT is thus abstracted out of Invenio, and could be replaced by other similar systems, homegrown or not
- History should be preserved
Implementation notes
Use case "Journal Harvest" requires a concept of "set of tickets related to task" in order to see the "diminishing ticket count". We currently do not have it.
Use case "Delegated task" can be almost trivially implemented using ticket_assign.
Use case "Email response" can be implemented using a suitable RT configuration.
* Improvements on Working System
System functional as of 6/09, some comments:
* Add Hover (or mini display) of subject line and queue in the padel for read and close in
BibEdit
-
- allows use case like : 4 tickets for new record "cites" "authors" "Keywords" "metadata" and inputter just closes them as they do them.
- link to jumbo (ModifyAll) page, rather than basics page, so text can be entered
- Add Queue selection to comment page, as well as record ID (possibly minutes projected, but I doubt it)
Less important, but nice to have:
* RTFM would be a nice addition, to allow canned responses for certain situations (i.e. flag this record for conference addition etc)..
-
- Would be nice, but can also be handled by creating separate queues for common tasks
- RTFM modifications needed are in git. RTFM itself comes from download from vendor
- eventually needed
- Ticket creation from bibedit works nicely...except there appears to be no way to make one edit operation send a ticket to the right people by email. It will be in the queue, but not send as email....
- Could ignore this problem, just use the web based queues
- Could also present users in bibedit with choice of queues in which to create ticket...annoying to bibedit though.
- Not really an issue since: * if it is a generic ticker (i.e. no "person" just a a queue) then you don't want email, just a queue * if you know who you want to give it to, you assign it to that person, and they get pinged, just not with your text, but who cares?