System Design: BibEdit

Proposal for a record curation interface for CDS Invenio and Inspire

Background and Motivation

At the time of writing, there are two partially implemented approaches to record editing for Invenio: Invenio's own preliminary BibEdit and EPFL's Curator. Moreover, text editing tools are frequently used for ad hoc purposes. SPIRES system, for example, has a sophisticated command line editor.

The aim of this page is not to discuss Curator's planned functionality, such as collection checking. Batch job functionality, like nightly runs of record checking tools (duplicate checking, automated correction of spelling mistakes) are not discussed here either. For an overall architecture of all the various cataloguing tools, please see SystemDesign.

The goal of the BibEdit tool is to provide a GUI for interactive record editing, enrichment, correction and verification functionality. Using the editor, the user should be quickly able to correct typos, assign correct (canonical) institute names/codes in the record from authority records, use the canonical form of the author's name, the publication's name, the item's language etc.

The record editing philosophy is to make the common cases as efficient as possible, and to have the data do as much work as possible before the cataloguer sees/touches it. The database should be able to check most things, make guesses in common cases that the cataloguer can confirm. The number of actions and keystrokes by the inputter should be kept to an absolute minimum.

To reach the goal, a use case example with a workflow and data sources will be described. Assigning records/fields to specific cataloguers in intentionally omitted. The cataloguer is any user with record editing rights.

Informal use case example

A set of new records have been entered or harvested. They are in a holding pen. Checking the set has been assigned to the cataloguer. He/she logs in and sees a list of the set's records.

The cataloguer checks the records. Before invoking BibEdit the BibMerge module (see SystemDesignBibMerge), provides a GUI for eliminating or merging duplicate records. Some records are duplicates or represent items of no interest -- they are deleted by the cataloguer. Then BibEdit is called together with BibCheck on each record. For records that do not need improving he/she uses the save function and the record is saved in the production database.

For record X, the institute name is misspelled/unknown. The cataloguer selects the institute field and selects the correct name from a pop-up list.

The language field for the record is missing. The cataloguer selects the correct language in a pop-up list.

One of the references of the record has a misspelled journal name. The cataloguer selects the correct name in a pop-up.

Keywords are missing. The cataloguer selects them in a pop-up list.

GUI

Mock-up screen shots

Workspace, BibCheck as selected menu item with some fictive checking:

bibcheck.png


Workspace, History as selected menu item:

history.png

Screen shots of current version

bibedit.png


bibedit_add_fields.png

URL and navigation

Generic URL: /edit/

Record specific URL (initial): <SITE_URL>/record/<RECID>/edit/

BibEdit is built to operate without any page reloads at all, giving users an interface reminiscent of a desktop application. However, this requires BibEdit to use a generic URL, not a URL that is changing depending on the state of the application (like which record is being edited etc.) The reason for this is that any URL change will also make the page reload in all modern browsers, defeating the original purpose.

Therefore, when being called with a non-generic URL, like /record/50/edit/, BibEdit will redirect to a generic URL like /edit/#state=edit&recid=50. The last URL might not seem more generic then the first, but this is actually a trick. The fragment part of the URL (what comes after the # - the hash) to keep track of state, can be freely updated by the script without causing page reloads.

If correctly implemented, this allows the user to use the browser navigation buttons (Back, Forward, Reload) with expected results.

This also facilitates calling BibEdit with custom arguments, like some action to perform, some particular view or filter to use etc.

One use case: a cataloguer wants to push author proofing onto a Giva expert, so that a new cataloguing task ``check long author list for record 1234'' would be created, via a new RT ticket, containing a link of the form:

http://hep-inspire.net/record/1234/edit?action=edit-long-author-list

so that the giva specialist can follow the link to resolve this task.

Other actions of this kind may include (a) `check record 1234', meaning to run BibCheck on it and open the warnings/propositions insido BibEdit, or (b) `add field 298 with the value Xyzzy to the record and let the cataloguer approve it', that some background task may generate, so that `adding a field' should ideally be callable via URLs, so that we can prefill the editor window with the data and the human would only see proposed actions and supposedly only click on ``accept and submit''.

Editor

BibEdit tries to gather all large parts of the curation functionality in one page, so that the cataloguer can avoid navigating between different screens when editing a record. Special and more seldom used interfaces, like a user settings dialog, could eiter be on a separate page or in a dialog.

The left side menu allows record-operations (submit record, delete record, cancel editing) and field-operations (add new fields, delete selected fields).

To edit, the user only needs to click on the field she would like to edit, which will make the field editable using an autogrowing textarea control. Her changes will then be saved when the field loses focus, when she clicks Return (Save) or Tab (Save and Edit next). Changes can be cancelled by clicking ESC.

Some fields are supported by BibKnowledge trough BibCheck, which periodically checks for completion suggestions as the user types.

suggesting.png

Subfields can be reordered using the up- and down arrows available on their left side or by using hotkeys after selecting the field.

New subfields can be added using the plus buttons on the right side of fields.

Holding Pen integration

When editing a record, it is desirable to have the possibility of seeing all the pending changes associated with it. Such a functionality can be provided by the Holding Pen integration. An additional panel should be displayed in the left part of the user interface showing all the Holding Pen entries corresponding to the record. Each change should allow two actions: apply and delete. Applying a change means adding change suggestions (described in a different section of this document) in the editor interface. They could be accepted or rejected by the user. When applying a change, the Holding Pen entry should be automatically removed. The deleting action should cause an automatic removal of the Holding Pen entry (from the interface + from the database) without previewing the changes.

Besides those, two global operations should be provided: removing all the changes and applying all of them. They semantics should be equivalent to apply and delete executed on all the changes in a chronological order.

The issues:

  • Should the deletion of Holding Pen entries be somehow integrated with the Undo functionality ? [should the Holding Pen entries be restored ? ]
  • The holding pen entries should have some additional identifier since one harvesting process can retrieve more changes to the same record
  • Should there be a possibility of applying the changes in a direct manner - without proposing them as a record changes ? [maybe three operations then: Apply, Propose, Delete ? ]
  • Is there some time stamp inside MARC ? When harvesting more changes in the same time, it should be distinguishable which one comes first

BibCheck

Dynamic functionality use case #1

Typing something in the field that you want to use as a lookup for the true value to be inserted.

I want to insert institution code 14567 (which is "SLAC, SSRL")

I don't know the code (neither the name nor the number).

I type ( in inst. field) "Menlo Park CA" and BibKnowledge returns a list of insts that have this address ("SLAC", "SLAC, SSRL", "KIPAC, Menlo Park"). I pick from this list and the resulting code name is stored in the field.

This is doing a (sophisticated) lookup to determine the correct value of 1 field, based on user input.

Dynamic functionality use case #2

The second case is maintaining something like a field that depends on another. In the above case, we might imagine storing the inst numerical code and the inst name (100_i 100_u respectively). If the numerical code is present, then we should be able to edit that field and see the 100_u change based on the 110a value in the inst record having a code = 100_i. Possibly one might like this to work in the opposite direction as well? Or perhaps it is saner to define a "primary" field and a set of secondary/derived fields.

Full checking

After editing a record one can call BibCheck to control the validity of certain fields in the record. The return data from BibCheck can be divided into automatic changes to the record, suggested changes to the record and warnings.

Automatic changes will be performed, but can be marked (with green as in the mock-up) and include an informative message about what was changed and why.

Manual changes can be presented so that the user can choose between fully accepting the change, ignoring the change or do some sort of manual merge. The user also has the option of accepting or ignoring all changes for the entire record.

In much the same way the warnings can be displayed in their respective fields (or where that field should have been, if it's missing) and then be removed as soon as the user has resolved the problem (i.e. we recheck the field when the user has made a change).

History functionality

Listing, viewing, comparing and reverting to earlier revisions of a record. It is likely that some of this functionality, in particular comparing revisions and displaying them simultaneously, belongs in or utilizes the BibMerge module.

Architecture

Python Web-framework

The new version of BibEdit will be rewritten to use the standard Invenio page handler.

Other options like keeping the existing admin page handler or changing to the Django framework was considered, but discarded.

AJAX framework

For Javascript framework we decided on jQuery, which has a good usage and support record, appears to be a very clean and easy implementation and was a favorite among the developers concerned. And though jQuery in it's original form has a small footprint it is highly extendable with plug-ins and separately combinable libraries for UI/effects if needed.

Use of AJAX

We will provide a richer UI supported by AJAX-technology, e.g, to provide completion suggestions and dynamic checking of field values.

New web applications, like those from Google (e.g., Gmail, Google Calendar, iGoogle) demonstrates what is possible to achieve with fully AJAX-based applications. However, since that is also where most people have experienced such applications, they are bound to have high expectations, which means we must strive to make BibEdit as user-friendly, responsive and flawless as possible.

Support for users with special needs, ancient or low-feature browsers (like text-only or without script support) will be limited, since we have a rather small user base and can require them to use a smaller set of browsers for this application. Of course the application should still recognize users with unsupported systems and politely inform them on what they need to do to be able to use the application.

Also the application will gain from running on modern computers with the newest possible Javascript engines, that gives dramatic improvements in performance. Still performance remains an important issue to be addressed by the developer with continous testing on relevant platforms and optimizations of the system as needed.

Command Line Interface (CLI)

Todays BibEdit CLI supports basic history functions (listing, viewing, diffing and reverting to earlier revisions of a record). This, along with multiedit, which are the tools most needed by administrators and superusers, will still be supported in the new version.

Interface with cataloguing workflow

Users will want to 'browse' or work through records, so BibEdit must be aware that the user is editing a record in the holding pen and provide buttons like 'Previous', 'Next', 'Save and next' and a return to the cataloguer interface on exit.

'diff/merge' strategy in stead of locking of records/fields

There has been discussions on how to avoid conflicts or loss of data as a result of changes being done in parallel, the most obvious example being that two catalogers edit the same record at the same time. The solution to this problem has been to lock records if they are being edited, or recently, also if they are in the queue to be uploaded, but this strategy has it's own problems, such as obstructing the curation workflow, unintentioned overwrites and costly and complex lock checking.

Therefore a more CVS style solution has been proposed where a changed record will be diffed against the original, so that only the changed fields needs to be submitted for uploading. If changes happen in parallel and there is no overlapping between the fieldsets being changed, the changes can easily be merged into the record. If there is conflicts that can't be solved without human intervention, the record should be returned to the source together with the proposed changes so that a manual merge can be performed.

An important topic has been where to place this functionality, in particular if it should be placed in BibEdit or in BibUpload, and we have settled on a solution that tries to get all the advantages by implementing parts of the checking in both modules. By doing a diff in BibEdit one can submit two streams, an append stream and a change stream, in stead of replacing the full record like it is done today. Also, if conflicts are detected the cataloger can be alerted of this instantly and do the merging on the fly. By also doing a diff in BibUpload one is guaranteed to catch all conflicts, even for submissions coming from other sources then BibEdit or being submitted asynchronously from off-site mirror editors.

Note that to be able to diff and merge in BibUpload it would be necessary to tag every MARCXML snippet systematically with revision numbers to know precisely which version of the record the changes should be diffed against. The tag 005 seems to be a good candidate for this sort of revision stamping. This can provide some challenges as submissions from external sites can't be expected to use this revisions system.

The supporting functionality to handle actions related to diffing and merging records is to be placed in the BibRecord module.

Modules

Server (Python)

This will be updated once the application is moved to the standard page handler.

Client (Javascript)

Javascript modules:

javascript_modules.png

Feature list

Specific features to implement or look into. Users can use this list to see if features they would like to request is already 'in the works'.

Feature list for first big inputting test

Feature Priority
UI: Extendable left side menu  
Searching for records and browsing hits in editor  
Create empty record High
New record templates High
Clone record High
UI: Confirmation dialogs before cancel/submit/delete Low
Change public name of application to 'Record Editor' Medium
UI: Allow for ordering fields and subfields with hotkeys High
Basic autocompletion of field content High

Feature list for future releases

Feature Priority
Undo last / last few simple actions High
Field/subfield copy/cut & paste1 High
Remove subfield moving arrows (replaced by copy/cut & paste + hotkeys) High
BibCheck integration High
Field templates, 'Add field from list' High
Cross-links with record merger Medium
Cross-links with fulltext file editor Medium
Cross-links with holdings/items editor Medium
Take human tags as valid input Low
History: Resurrect interface (w/!BibMerge support) Medium
UI: More options after cancel/submit/delete 2 Medium
MARCXML mode3 Low
Warn, in some discrete way, that the record has unsubmitted changes Low
UI: Move focus in a logical way after different edit actions Low
UI: Tab or some other combo does 'Save and edit next' when editing Low
UI: Select by marking start and end of selection Low
Filter to restrict which fields to display Medium/Low
Internationalization Medium/Low
Direct editing of field-/subfield codes High
UI: Optional view: Horizontal, aligned display of subfields Low
UI: Button to let you follow URLs 4 Medium
UI: Show receipt for last few actions in the status area Low
UI: Adding subfield at custom position in field Low
Logging Low
User settings (default format, customized views, language (?) ...) Low
AJAX: Handling of users with inadequate browser-/script-support Low
UI: Add WYSIWYG I18N keyboard for accent entering Medium

1: Important cloned fields that should be unique (report number, system number, publ. info) should be either not cloned at all, or cloned as hints printed under empty field values, or grayed out or something, so that it is clear we are not going to produce dupes.

2:

  • Go back to record.
  • On submit: Call BibMerge with submitted and the previous revision.
  • On submit: Display receipt of submitted record.
  • On delete: Undelete.

3: Basically a pure text editor for editing MARCXML. Could be useful for example as a simple copy/paste mechanism across records.

4: If BibEdit displays a field known to contain URLs (such as 856), and if the value of a subfield starts with `http', then print an icon or a link named ``visit this URL'' next to it, that would open that URL in a new tab/window.

Feature ideas

These are ideas that are still open for discussion and refinement, or features with too low priority to be listed above (the 'someday/maybe' category).
  • QuickKey "An idea surfaced of a QuickKey type functionality, where common long phrases can be entered with a simple key combination." Defined by who?
  • Pure text view "Be able to display a bare text record from a button/function on the BibEdit form to be able to paste text from one record to another record. Copy/Paste of multiple chunks difficult in BibEdit, don't want to have to search record in another screen." How could we implement something like this?
  • LaTeX preview Preview LaTeX formulas using jsMath (either field by field or the whole record)
  • Record media type Showing a head note/title to reflect the media type (e.g., publication, book, proceeding) of the displayed record

Developer tasks

Developers task list for this and future releases. This comes in addition to the feature list. Tasks will usually be of a more technical nature or very internal to the system, so this list is not meant to be of particular interest to the general user.

Completed tasks (previous releases)

Task Priority
Migrate to Invenio standard page handler  
Update BibEdit entry points to use new page handler  
Investigate handling of line breaks in textareas  
Rename 'Show' options  
Packaging / GIT public commit  
Fix: Debug history view  
Fix: Validate capital indicators if CERN site  
Fix: Figure out the problem with refresh in IE (and Firefox?)  
Fix: Wash record ID (don't accept '45asdf')  
Fix: Detect and warn if jQuery is missing  
AJAX: Requests should handle logout/timeout in a better way  
Use jQuery for hiding elements and storing data (in stead of hidden form fields)  

Tasks (next release)

Task Priority
Ability to tell if record is in it's original state (even after being changed) Low
Blocking a cataloger from having multiple open edit sessions on the same record Low

Tasks (future releases)

Task Priority
AJAX: Record integrity checking Medium/Low
AJAX: Detect and notify the client if something goes wrong server side (in the Python code, not in the transfer) Medium/Low
AJAX: Transaction IDs and logging Low
AJAX: Error handling Low
AJAX: Concurrent requests / request queuing (might not be necessary) Low
Diff/merge: Personalized cache/XML files Low
Diff/merge: Revision stamping Low
Diff/merge: Changeset committing Low
Diff/merge: Implement on submit diffing w/conflict resolution (BibMerge supported) Low

Tasks (before any release)

  • Web tests and regression tests
  • Browser performance and compatibility tests
    • IE7
    • IE8
    • FF2
    • FF3
    • Safari
    • Opera
    • Chrome
    • Iceape
  • Pylint
  • JSLint / JSMIN
  • Packaging
  • User tests on dev machine(s)
  • git commit
  • Put into production

See also

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng bibcheck.png r2 r1 manage 64.7 K 2008-08-13 - 15:38 LarsChristianRaae  
PNGpng bibedit.png r1 manage 78.0 K 2009-02-02 - 15:37 LarsChristianRaae BibEdit Beta screenshot
PNGpng bibedit_add_fields.png r1 manage 55.2 K 2009-02-02 - 15:38 LarsChristianRaae BibEdit Beta adding fields screenshot
PNGpng bibedit_editing.png r1 manage 15.7 K 2009-02-03 - 14:43 LarsChristianRaae  
PNGpng editing.png r1 manage 14.2 K 2008-08-13 - 15:41 LarsChristianRaae  
PNGpng history.png r2 r1 manage 45.5 K 2008-08-13 - 15:40 LarsChristianRaae  
PNGpng javascript_modules.png r1 manage 46.7 K 2008-11-14 - 16:06 LarsChristianRaae Javascript Modules
PNGpng recedit1.png r1 manage 7.6 K 2008-01-18 - 12:05 MarkoNiinimaki  
PNGpng recedit2.png r1 manage 10.2 K 2008-01-18 - 12:05 MarkoNiinimaki  
PNGpng suggesting.png r1 manage 16.2 K 2008-08-13 - 15:41 LarsChristianRaae  
Edit | Attach | Watch | Print version | History: r48 < r47 < r46 < r45 < r44 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r48 - 2009-10-14 - JocelyneJerdelet
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Inspire All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback