System Design: BibEditSpecialModes

Proposal for BibEdit Special Modes for Author, Reference and Keyword fields

General Purpose

To give the cataloguers a better interface to edit crutial parts of the metadata (authors, references, keywords) that can appear sometimes in big numbers (eg. 5000 authors), and need to be treated in a special way (eg. to be checked and validated against a knowledge base).

BibEditSpecialModes are intented to use keyboard shortcuts for faster editing, multi-colouring for designating different status (eg. field has error, field is ok, field has proposal etc), ajax for accessing knowledge bases, dropdown lists for proposals.


The Cataloguer is using BibEdit (see SystemDesignBibEdit) for editing the metadata of a record, where a link, button, or keyboard shortcut should be able to lead them to the special editing mode (different page) where only the fields of the chosen category (authors, references, keywords) can be edited.(Note that if the cataloger prefers to edit using the regular BibEdit this is possible by simply not clicking the relevant link)

There the cataloguer should be able to step through each field and apply changes, delete, insert and approve the content of the specific field, view errors and proposals of correct values.

When special editing is finished, the cataloguer will return to normal editing mode (BibEdit) where the rest of the record metadata can be edited.

Desired actions

Authors Editing Mode

See Also: SystemDesignBibEditSpecialModeForAuthors

Requirements to preserve Existing Functionality:

  • Check 100__ and 700__ fields (authors) against a knowledge base. * The knowledge base is currently just the set of authors in HEP that are "'complete" (ii.e. that have completed this task already)
  • For each author ($$a subfield) that exists indicate it as correct author.
  • For each author ($$a) that does not exist, flag it as possibly incorrect/worthy of extra attention
  • Check if affiliations ($$u) match a KB
    • KB is the collection "institutions"
  • If $$u does not exist(or is not in KB) allow cataloger to choose likely affiliations from guesses (in order)
    • Make affs same as previous author
    • prior affiliations of this author
    • affiliations entered already on this paper
    • prior affiliations of other authors on this paper
    • cataloger manual search of institution collection
  • If cataloger can find no appropriate match in the KB
    • allow them to enter name of affiliation as they like
    • generate bibcatalog task related to this record for the INST manager to deal with (task should indicate which affil is new)

Obvious enhancements that should be included later:

  • Create a separate collection of author names (HEPNAMES)
  • If author is not in the base, propose authors with a similar name and show their recent affiliation
  • If author has not previously been at this affiliation generate a warning, showing other recent affils
  • If author is not in the base and is accepted by the cataloguer, generate a bibcatalog task for the AUTHOR manager to deal with
  • Affiliations are stored as $$i, their key in the institutions collection instead of (or in addition to) $$u

Use Cases

1) New paper comes in from arXiv with 4 authors (Alice Abe, Bob Barker Jr., Carol Chivukula, and Dave D'Agostino Alice is from SLAC, Bob and Carol are from Fermilab and SLAC and Dave is from SLAC. However arXiv's OAI harvest supplies only the author names, not the affiliations

Inputter goes to BibEdit to check the record, clicks on AuthorEdit and gets something like:

[]  -> information to be shown
{}  -> choices to be offered
<> -> my comments about what is needed... 

Abe, Alice [OK]  {Affs = SLAC, Princeton U., Tokyo U.}    <all from Alice's history as an author>

Jr., Bob Barker [CHECK] { Affs = Same as Alice, Fermilab, SLAC, Chicago U.} <Check author name as the Jr, was misplaced><offer the choice of making the affs the same as the previous one or choose from guesses>

Chivukula, Carol [OK] { Affs = Same as Bob, SLAC, IIT Kanpur, Fermilab } < SLAC/Kanpur are offered from Carols history, Fermilab, because we chose that for Bob...>

Agostino, Dave D' [CHECK] {Affs = Same as Carol, SLAC, INFN, Napoli, Napoli U., Fermilab, }  <Again check author since OAI munged the name with a hyphen> 

The inputter will click "SLAC", edit "Jr., Bob Barker" -> "Barker Jr., Bob" then click "SLAC" "Fermilab" (order will matter) then "Same as Bob" then edit Agostino, Dave D' -> " D'Agostino, Dave" then click SLAC

The resulting record will be

100$$a Abe, Alice $$u SLAC
700$$a Barker Jr., Bob $$u SLAC $$u Fermilab
700$$a Chivukula, Carol $$u SLAC $$u Fermilab
700$$a D'Agostino, Dave $$u SLAC

2) A paper from ATLAS comes in from arXiv with 1 author listed, but it actually has 2,500 authors.

In this case authextract should run (perhaps it has already via bibcatalog) and the inputter may do this manually or via bibcatalog

When the special mode is needed there should already be a suggested author list from authextract (refextract for authors) so it should look like

100$$a Aas, G. $$u Saclay 700$$a Abe, A. $$u CERN etc.

The inputter should see this list and be able to see all the authors and affs (possibly page by page, but it is ok if it is all one page, and edit any given one. Just as above, things that don't match the expectations should be highlighted, however, it is not expected that inputters will check/change/add affiliations on a author by author basis (though they should have this capability. Instead, there might be a use for a button to just show all affs, (i.e. sort->uniq $$u ) and edit them as needed (which then does a search and replace on all of them in the record) In other words, if AuthExtract reports CERN Geneve for all authors that are actually at CERN, the inputter should be able to change that once. (Using find/replace in firefox/emacs might be an option for this? Or multiEdit? Regardless one sees the problem of editing 2,500 affiliations...)

If there are affs not extracted, but on the paper, these generally go at the end in a speacial 700 field with no $$a, just $$u.

References Editing Mode

Requirements to preserve Existing Functionality:

  • Check 999C5 fields (references) against a knowledge base.
    • KB is citation index (i.e. check to see if the paper in 999C5 is in Invenio as a record, using the citation index)
  • For each reference that exists show 100$$a for the record, along with first few words of title, and journal/report info
  • If reference is not in the base display warning and allow editing of the field
  • If two different 999C5 fields reference the same document in $$r(report number) and $$s(journal reference) subfields, merge into one 999C5 field automatically
  • If one 999C5 field contains multiple refs ($$r $$s that resolve to different records) allow the inputter to split into 2 999C5 fields
  • Allow insertion of new 999C5 fields when necessary, inputter will type the $$s or $$s directly, and there will be no $$m

Use Cases

Document comes from arXiv, and references from refextract, consider a few sample 999c5 fields:

999C5$$m Smith et al, Phys.Rev.D77:665434,2006  [hep-th/0504124]$$sPhys.Rev.,D77,665434 $$r hep-th/0504124
999C5$$m T. Brooks arXiv:0812.0675 $$r arXiv:0812.0675
999C5$$m foobarbaz 2008,1932 $$s foobarbaz,2008,1932
999C5$$r arXiv:0804.5611
999C5$$m arXiv/0805-1123

Inputter should see something like

Phys.Rev.D77:665434,2006 [hep-th/0504124]  [G. Smith]  <the info here should be found by looking up phys.rev.,d77,665434 in the journal index, not by parsing the $$m field>

JHEP 04(2009)001 [arXiv:0812.0675 [T. Brooks]  < note that by looking up the reportnumber, we found more information to show>

foobarbaz 2008,1932 [Unknown][CHECK]  <note that this could not be found by a journal search, so we show $$s and ask it to be checked>

arXiv:0804.5611 [S. Mele] < note that this one had no $$m, but we know the author from looking up the report number>

arXiv/0805-1123 [Unknown][CHECK]  <note this was not in std. form, so no $$r or $$m were found by refextract

The inputter is looking at the actual paper clicks "OK" on 1,2,4 above, and deletes (or edits) #3 , she edits #5 to be in standard form arXiv:0805.1123 and the editing mode shows [T. Simko] as author and that it is published in Phys.Lett.B454:111-114,2009, and this is confirmed by the inputter

She also notices that between #1 and #2 there was another reference on the paper, and inserts it by typing arXiv:0805.1123, which is then checked for first author and reference.

There are a few other actions that inputters might do to this list. Moving things around is unlikely but possible. Saving and/or undoing would be nice. something we currently do is re-extract the references (i.e. run RefExtract on this record again) however, ,this might better be part of bibMerge. This is what we were discussing about while I was there, regarding merging long strings of references.

It also occasionally happens that refextract fails, and then one inputs the references by hand, to do this one should simply be able to type $$r or $$s strings and hit return or tab and continue typing the next string etc while at the end, or as one proceeds, the checks are happening in the background.

Keywords Editing Mode

Requirements to preserve Existing Functionality:

  • Check 6531_ fields (keywords) against a taxonomy base.
  • If keyword is not legal, propose close alternatives. (Also provide notification system)
  • Propose synonyms for keywords that exist in the base. Expansion of abbreviations.
  • For keyword combinations, if not legal, check the reverse order of words. If the reverse order is legal, automatically apply the correction.
  • Check only keyword fields with $$9 subfield not set to "BibClassify".
  • simple text editor to add/modify kw's
  • simple way to change sequence (e.g. move most important to top)

Transfer of bibclassify kw's to desy kw's

  • full display of bibclassify output (incl occurrences) for selection
  • 2 options:
    • copy all
    • mark some and copy selected

-- AnnetteHoltkamp - 11 Mar 2009

The Graphical User Interface

//TODO as soon as requirements are more clear and complete.

-- KyriakosLiakopoulos - 13 Feb 2009

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2009-12-11 - unknown
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Inspire All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback