System Design: BibEdit
Proposal for a record curation interface for CDS Invenio and Inspire
Background and Motivation
At the time of writing, there are two partially implemented approaches to
record editing for Invenio: Invenio's own preliminary BibEdit and
EPFL's Curator. Moreover, text editing tools are frequently used for
ad hoc purposes. SPIRES system, for example, has a sophisticated
command line editor.
The aim of this page is not to discuss Curator's planned
functionality, such as collection checking. Batch job functionality,
like nightly runs of record checking tools (duplicate checking,
automated correction of spelling mistakes) are not discussed here
either. For an overall architecture of all the various cataloguing
tools, please see
SystemDesign.
The goal of the BibEdit tool is to provide a GUI for interactive
record editing, enrichment, correction and verification
functionality. Using the editor, the user should be quickly able to
correct typos, assign correct (canonical) institute names/codes in the
record from authority records, use the canonical form of the author's
name, the publication's name, the item's language etc.
The record editing philosophy is to make the common cases as efficient
as possible, and to have the data do as much work as possible before
the cataloguer sees/touches it. The database should be able to check
most things, make guesses in common cases that the cataloguer can
confirm. The number of actions and keystrokes by the inputter should
be kept to an absolute minimum.
To reach the goal, a use case example with a workflow and data sources
will be described. Assigning records/fields to specific cataloguers in
intentionally omitted. The cataloguer is any user with record editing rights.
Informal use case example
A set of new records have been entered or harvested. They are in a
holding pen. Checking the set has been assigned to the cataloguer. He/she
logs in and sees a list of the set's records.
The cataloguer checks the records. Before invoking BibEdit the BibMerge module
(see
SystemDesignBibMerge), provides a GUI for eliminating or merging duplicate
records. Some records are duplicates or represent items of no interest -- they are
deleted by the cataloguer. Then BibEdit is called together with BibCheck on each
record. For records that do not need improving he/she uses the save function and
the record is saved in the production database.
For record X, the institute name is misspelled/unknown. The cataloguer
selects the institute field and selects the correct name from a pop-up
list.
The language field for the record is missing. The cataloguer selects
the correct language in a pop-up list.
One of the references of the record has a misspelled journal name. The
cataloguer selects the correct name in a pop-up.
Keywords are missing. The cataloguer selects them in a pop-up list.
GUI
Mock-up screen shots
Workspace, BibCheck as selected menu item with some fictive checking:
Workspace, History as selected menu item:
Screen shots of current version
URL and navigation
Generic URL:
/edit/
Record specific URL (initial): <SITE_URL>/record/<RECID>/edit/
BibEdit is built to operate without any page reloads at all, giving users an
interface reminiscent of a desktop application. However, this requires
BibEdit to use a generic URL, not a URL that is changing depending on the state of the
application (like which record is being edited etc.) The reason for this is
that any URL change will also make the page reload in all modern browsers, defeating the
original purpose.
Therefore, when being called with a non-generic URL, like
/record/50/edit/, BibEdit will redirect to a generic URL like
/edit/#state=edit&recid=50. The last URL might not seem more generic then the
first, but this is actually a trick. The fragment part of the URL (what comes
after the # - the hash) to keep track of state, can be freely updated by the
script without causing page reloads.
If correctly implemented, this allows the user to use the browser navigation
buttons (Back, Forward, Reload) with expected results.
This also facilitates calling BibEdit with custom arguments, like some action
to perform, some particular view or filter to use etc.
One use case: a cataloguer wants to
push author proofing onto a Giva expert, so that a new cataloguing task
``check long author list for record 1234'' would be created, via a new
RT ticket, containing a link of the form:
http://hep-inspire.net/record/1234/edit?action=edit-long-author-list
so that the giva specialist can follow the link to resolve this task.
Other actions of this kind may include (a) `check record 1234', meaning
to run BibCheck on it and open the warnings/propositions insido BibEdit,
or (b) `add field 298 with the value Xyzzy to the record and let the
cataloguer approve it', that some background task may generate, so that
`adding a field' should ideally be callable via URLs, so that we can
prefill the editor window with the data and the human would only see proposed
actions and supposedly only click on ``accept and submit''.
Editor
BibEdit tries to gather all large parts of the curation functionality in one page,
so that the cataloguer can avoid navigating between different screens when
editing a record. Special and more seldom used interfaces, like a user settings
dialog, could eiter be on a separate page or in a dialog.
The left side menu allows record-operations (submit record, delete record, cancel editing)
and field-operations (add new fields, delete selected fields).
To edit, the user only needs to click on the field she would like to
edit, which will make the field editable using an autogrowing textarea
control. Her changes will then be saved when the field loses focus, when she clicks
Return (Save) or Tab (Save and Edit next). Changes can be cancelled by clicking
ESC.
Some fields are supported by BibKnowledge trough BibCheck, which periodically
checks for completion suggestions as the user types.
Subfields can be reordered using the up- and down arrows available on their
left side or by using hotkeys after selecting the field.
New subfields can be added using the plus buttons on the right side of fields.
Holding Pen integration
When editing a record, it is desirable to have the possibility of seeing all the pending changes associated with it. Such a functionality can be provided by the Holding Pen integration. An additional panel should be displayed in the left part of the user interface showing all the Holding Pen entries corresponding to the record.
Each change should allow two actions: apply and delete. Applying a change means adding change suggestions (described in a different section of this document) in the editor interface. They could be accepted or rejected by the user. When applying a change, the Holding Pen entry should be automatically removed. The deleting action should cause an automatic removal of the Holding Pen entry (from the interface + from the database) without previewing the changes.
Besides those, two global operations should be provided: removing all the changes and applying all of them. They semantics should be equivalent to apply and delete executed on all the changes in a chronological order.
The issues:
- Should the deletion of Holding Pen entries be somehow integrated with the Undo functionality ? [should the Holding Pen entries be restored ? ]
- The holding pen entries should have some additional identifier since one harvesting process can retrieve more changes to the same record
- Should there be a possibility of applying the changes in a direct manner - without proposing them as a record changes ? [maybe three operations then: Apply, Propose, Delete ? ]
- Is there some time stamp inside MARC ? When harvesting more changes in the same time, it should be distinguishable which one comes first
BibCheck
Dynamic functionality use case #1
Typing something in the field that you want to use as a lookup for
the true value to be inserted.
I want to insert institution code 14567 (which is "SLAC, SSRL")
I don't know the code (neither the name nor the number).
I type ( in inst. field) "Menlo Park CA" and BibKnowledge returns a list
of insts that have this address ("SLAC", "SLAC, SSRL", "KIPAC, Menlo
Park"). I pick from this list and the resulting code name is stored in
the field.
This is doing a (sophisticated) lookup to determine the correct value of
1 field, based on user input.
Dynamic functionality use case #2
The second case is maintaining something like a field that depends on
another. In the above case, we might imagine storing the inst
numerical code and the inst name (100_i 100_u respectively). If the
numerical code is present, then we should be able to edit that field and
see the 100_u change based on the 110a value in the inst record having a
code = 100_i. Possibly one might like this to work in the opposite
direction as well? Or perhaps it is saner to define a "primary" field and a set of
secondary/derived fields.
Full checking
After editing a record one can call BibCheck to control the validity of
certain fields in the record. The return data from BibCheck can be divided
into automatic changes to the record, suggested changes to the record and
warnings.
Automatic changes will be performed, but can be marked (with green as in the
mock-up) and include an informative message about what was changed and why.
Manual changes can be presented so that the user can choose between fully
accepting the change, ignoring the change or do some sort of manual merge. The
user also has the option of accepting or ignoring all changes for the entire
record.
In much the same way the warnings can be displayed in their respective fields
(or where that field should have been, if it's missing) and then be removed as
soon as the user has resolved the problem (i.e. we recheck the field when the
user has made a change).
History functionality
Listing, viewing, comparing and reverting to earlier revisions of a record.
It is likely that some of this functionality, in particular comparing
revisions and displaying them simultaneously, belongs in or utilizes the
BibMerge module.
Architecture
Python Web-framework
The new version of BibEdit will be rewritten to use the standard Invenio page
handler.
Other options like keeping the existing admin page handler or changing to the
Django framework was considered, but discarded.
AJAX framework
For Javascript framework we decided on jQuery, which has a good usage and
support record, appears to be a very clean and easy implementation and
was a favorite among the developers concerned. And though jQuery in it's
original form has a small footprint it is highly extendable with plug-ins
and separately combinable libraries for UI/effects if needed.
Use of AJAX
We will provide a richer UI supported by AJAX-technology, e.g, to provide
completion suggestions and dynamic checking of field values.
New web applications, like those from Google (e.g., Gmail, Google Calendar,
iGoogle) demonstrates what is possible to achieve with fully AJAX-based
applications. However, since that is also where most people have
experienced such applications, they are bound to have high expectations,
which means we must strive to make BibEdit as user-friendly,
responsive and flawless as possible.
Support for users with special needs, ancient or low-feature browsers
(like text-only or without script support) will be limited, since we have a
rather small user base and can require them to use a smaller set of
browsers for this application. Of course the application should still
recognize users with unsupported systems and politely inform them on what they
need to do to be able to use the application.
Also the application will gain from running on modern computers with the newest
possible Javascript engines, that gives dramatic improvements in performance.
Still performance remains an important issue to be addressed by the developer
with continous testing on relevant platforms and optimizations of the system as
needed.
Command Line Interface (CLI)
Todays BibEdit CLI supports basic history functions (listing, viewing, diffing
and reverting to earlier revisions of a record). This, along with multiedit,
which are the tools most needed by administrators and superusers, will still be
supported in the new version.
Interface with cataloguing workflow
Users will want to 'browse' or work through records, so BibEdit must be
aware that the user is editing a record in the holding pen and provide
buttons like 'Previous', 'Next', 'Save and next' and a return to the
cataloguer interface on exit.
'diff/merge' strategy in stead of locking of records/fields
There has been discussions on how to avoid conflicts or loss of data as a result
of changes being done in parallel, the most obvious example being that two
catalogers edit the same record at the same time. The solution to this problem
has been to lock records if they are being edited, or recently, also if they are
in the queue to be uploaded, but this strategy has it's own problems, such as
obstructing the curation workflow, unintentioned overwrites and costly and
complex lock checking.
Therefore a more CVS style solution has been proposed where a changed record
will be diffed against the original, so that only the changed fields needs to
be submitted for uploading. If changes happen in parallel and there is no
overlapping between the fieldsets being changed, the changes can easily be
merged into the record. If there is conflicts that can't be solved without
human intervention, the record should be returned to the source together with
the proposed changes so that a manual merge can be performed.
An important topic has been where to place this functionality, in particular if
it should be placed in BibEdit or in BibUpload, and we have settled on a
solution that tries to get all the advantages by implementing parts of the
checking in both modules. By doing a diff in BibEdit one can submit two
streams, an append stream and a change stream, in stead of replacing the full
record like it is done today. Also, if conflicts are detected the cataloger can
be alerted of this instantly and do the merging on the fly. By also doing a
diff in BibUpload one is guaranteed to catch all conflicts, even for
submissions coming from other sources then BibEdit or being submitted
asynchronously from off-site mirror editors.
Note that to be able to diff and merge in BibUpload it would be necessary to
tag every MARCXML snippet systematically with revision numbers to know precisely
which version of the record the changes should be diffed against. The tag 005
seems to be a good candidate for this sort of revision stamping. This can
provide some challenges as submissions from external sites can't be expected to
use this revisions system.
The supporting functionality to handle actions related to diffing and merging
records is to be placed in the BibRecord module.
Modules
Server (Python)
This will be updated once the application is moved to the standard page handler.
Client (Javascript)
Javascript modules:
Feature list
Specific features to implement or look into. Users can use this list to see if
features they would like to request is already 'in the works'.
Feature list for first big inputting test
Feature |
Priority |
UI: Extendable left side menu |
|
Searching for records and browsing hits in editor |
|
Create empty record |
High |
New record templates |
High |
Clone record |
High |
UI: Confirmation dialogs before cancel/submit/delete |
Low |
Change public name of application to 'Record Editor' |
Medium |
UI: Allow for ordering fields and subfields with hotkeys |
High |
Basic autocompletion of field content |
High |
Feature list for future releases
Feature |
Priority |
Undo last / last few simple actions |
High |
Field/subfield copy/cut & paste1 |
High |
Remove subfield moving arrows (replaced by copy/cut & paste + hotkeys) |
High |
BibCheck integration |
High |
Field templates, 'Add field from list' |
High |
Cross-links with record merger |
Medium |
Cross-links with fulltext file editor |
Medium |
Cross-links with holdings/items editor |
Medium |
Take human tags as valid input |
Low |
History: Resurrect interface (w/!BibMerge support) |
Medium |
UI: More options after cancel/submit/delete 2 |
Medium |
MARCXML mode3 |
Low |
Warn, in some discrete way, that the record has unsubmitted changes |
Low |
UI: Move focus in a logical way after different edit actions |
Low |
UI: Tab or some other combo does 'Save and edit next' when editing |
Low |
UI: Select by marking start and end of selection |
Low |
Filter to restrict which fields to display |
Medium/Low |
Internationalization |
Medium/Low |
Direct editing of field-/subfield codes |
High |
UI: Optional view: Horizontal, aligned display of subfields |
Low |
UI: Button to let you follow URLs 4 |
Medium |
UI: Show receipt for last few actions in the status area |
Low |
UI: Adding subfield at custom position in field |
Low |
Logging |
Low |
User settings (default format, customized views, language (?) ...) |
Low |
AJAX: Handling of users with inadequate browser-/script-support |
Low |
UI: Add WYSIWYG I18N keyboard for accent entering |
Medium |
1: Important cloned fields that should be unique (report number, system
number, publ. info) should be either not cloned at all, or cloned as
hints printed under empty field values, or grayed out or something, so
that it is clear we are not going to produce dupes.
2:
- Go back to record.
- On submit: Call BibMerge with submitted and the previous revision.
- On submit: Display receipt of submitted record.
- On delete: Undelete.
3: Basically a pure text editor for editing MARCXML. Could be useful
for example as a simple copy/paste mechanism across records.
4: If BibEdit displays a field known to contain URLs (such as 856),
and if the value of a subfield starts with `http', then print an icon or
a link named ``visit this URL'' next to it, that would open that URL in
a new tab/window.
Feature ideas
These are ideas that are still open for discussion and refinement, or features
with too low priority to be listed above (the 'someday/maybe' category).
- QuickKey "An idea surfaced of a QuickKey type functionality, where common long phrases can be entered with a simple key combination." Defined by who?
- Pure text view "Be able to display a bare text record from a button/function on the BibEdit form to be able to paste text from one record to another record. Copy/Paste of multiple chunks difficult in BibEdit, don't want to have to search record in another screen." How could we implement something like this?
- LaTeX preview Preview LaTeX formulas using jsMath (either field by field or the whole record)
- Record media type Showing a head note/title to reflect the media type (e.g., publication, book, proceeding) of the displayed record
Developer tasks
Developers task list for this and future releases. This comes in addition to the feature list.
Tasks will usually be of a more technical nature or very internal to the system, so this
list is not meant to be of particular interest to the general user.
Completed tasks (previous releases)
Task |
Priority |
Migrate to Invenio standard page handler |
|
Update BibEdit entry points to use new page handler |
|
Investigate handling of line breaks in textareas |
|
Rename 'Show' options |
|
Packaging / GIT public commit |
|
Fix: Debug history view |
|
Fix: Validate capital indicators if CERN site |
|
Fix: Figure out the problem with refresh in IE (and Firefox?) |
|
Fix: Wash record ID (don't accept '45asdf') |
|
Fix: Detect and warn if jQuery is missing |
|
AJAX: Requests should handle logout/timeout in a better way |
|
Use jQuery for hiding elements and storing data (in stead of hidden form fields) |
|
Tasks (next release)
Task |
Priority |
Ability to tell if record is in it's original state (even after being changed) |
Low |
Blocking a cataloger from having multiple open edit sessions on the same record |
Low |
Tasks (future releases)
Task |
Priority |
AJAX: Record integrity checking |
Medium/Low |
AJAX: Detect and notify the client if something goes wrong server side (in the Python code, not in the transfer) |
Medium/Low |
AJAX: Transaction IDs and logging |
Low |
AJAX: Error handling |
Low |
AJAX: Concurrent requests / request queuing (might not be necessary) |
Low |
Diff/merge: Personalized cache/XML files |
Low |
Diff/merge: Revision stamping |
Low |
Diff/merge: Changeset committing |
Low |
Diff/merge: Implement on submit diffing w/conflict resolution (BibMerge supported) |
Low |
Tasks (before any release)
- Web tests and regression tests
- Browser performance and compatibility tests
- IE7
- IE8
- FF2
- FF3
- Safari
- Opera
- Chrome
- Iceape
- Pylint
- JSLint / JSMIN
- Packaging
- User tests on dev machine(s)
- git commit
- Put into production
See also