Currently all HEP ingestions but arXiv categories that we harvest directly are processed in a stand alone workflow at DESY. This functionality needs to be ported to INSPIRE. It relies on holdingpen2 to work on records before the final ingestion to INSPIRE. This is a proposal (from the DESY point of view) how the INSPIRE workflow could look like.

Updated slides form CERN Meeting Jan 2014 workflow.pdf

To start with minimum workflow for arXiv.
No matching / merging - assume all records are new.
For hep* nothing changes.
For all other articles:

  • harvest to holdingpen2 incl. fulltext
  • run BibClassify (later more, might use references)
  • what we harvest now by complete category goes directly to INSPIRE (CC)
  • depending on output of BibClassify ask Curator to select article - pre-fill form depending on info (mock-up see below and full slides)
  • ingest selected articles incl added information to INSPIRE
  • create tickets (or other means of workflow) for ingested articles
    CC, not hep*, N>0:
    same GUI as for selection - without selection button

When this is running DESY can stop harvesting all of arXiv.

The mockup for the GUI / holdingpen action are examples for arXiv and journal.
The layout is almost arbitraty, it would be nice if the action and input fields are on the right side. select and CORE buttons to be replaced by 3 buttons: reject/select/CORE.
The numbers shown between the input area and the keywords come from several procedures: We run BibClassify three times: on full-text to extract CORE keywords, on title/abstract (metadata) for automatic INSPIRE keywords and with an anti-HEP ontology for Anti-KW. We check how many of the resolvable references are in INSPIRE-HEP, how many are CORE paper. How many CORE paper have been written by the authors. Code how to get these numbers can be provided by DESY. Some of these numbers should also be displayed in the HP maintable. Can you add color? green for positive info, red for negative info.

For the holdingpen maintable:

yes, usually over 2 lines or more
something like Chin.J.Phys. 52 (2014) 707 or arXiv: 1403.2174 . For arXiv categories in 2nd line: astro-ph, nucl-ex
not needed. Either not available yet or part of identifier
as long as we can filter by date we don't need it in the display
Type, Status
do we need this displayed?
CORE info
in 2 lines Number of references // Number of keywords. Really just bare numbers, possibly with color. E.g.
10 | 14 | 15
1 | 1/3 | 0
Recect / Accept / CORE

green: existing stuff / red: new stuff

Minimum workflow for arXiv

-- KirstenSachs - 09 Apr 2014

Topic revision: r4 - 2014-07-31 - KirstenSachs
