Workflow for electronic journal feeds at DESY.
Temporary solution until holdingpen2 is ready, works without SPIRES but inherits still a lot from old SPIRES workflow.

alerts (desydoc.py)

  • via email. batch-job checks mail-box, harvests via ftp-server, ... (what else?)
  • subroutines for different journals
    • IOP, PTEP via xslt creates INSPIRE xml
    • other journals old harvesting, creates dok-file (SPIRES format)
    • APS: harvested at CERN via xslt, manual scp to DESY. Alert and harvest @ DESY to be setup.
  • convert spires.dok to inspire.xml (inpuspi.pl, spitoinspi.py)
  • starts retrival

retrival (retinspire.pl)

  • 4 versions of the information needed:
    1. dok.file (old spires format) for book keeping
    2. big xml-file containing all records for matching
    3. single xml-files of each record for upload
    4. abstract files for each record for bibclassify
  • can run either from one large xml containing several records or from dok-file and single xml-files for each record, creates the other files. creates abstract-file with title, abstract, pacs-text, free keywords
  • modify information for matching in big xml-file:
    1. delete doi for errata, publisher notes etc. (still buggy)
    2. delete authors if more than 150 authors, keep only first author (assuming author xml available)
    3. dok-file may contain arXiv ID from ADS, copy to xml
    4. dok-file may contain CNUM, copy to xml
  • run bibmatch (to be improved):
    for details see InputtingJournalsBibMatch
  • show results:
    • add recid of all possible matches to dok-file
    • create html page of possible matches. Try to determine from which search or fuzzy mode the result was found, add this info to html. display existing records based on brief format. Example: http://www.desy.de/~sachs/iop211.html

matching by curator

  • check output of bibmatch, editing dok-file.
    • for matches exactly one IRecID=1234567 line
    • no match has one IRecID= line.
  • not implemented yet: add cnum to dok-file, re-run retinspire foo.dok to include search for author+cnum
  • move dok-file to directory zu_punkten

additional information in dok file (enrichdoki.py)

  • run bibclassify on abstract file
  • search for authors in inspire
  • additional information in dok-file:
  • automatic pre-selection:
    • D?: one core KW/pacs
      more than 2 authors: have one core-paper in common
      1 or 2 authors: more than 9 core-papers
    • C?: more than one core KW/pacs
    • D: everything from this journal is selected (e.g. JHEP)
    • d: many articles from this journal are selected (e.g. Astropart.Phys.)
  • move dok-file to directory enriched

select articles by physicist

based on information in dok-file (tags case-insensitiv)

  • IRecID=C -> select as core paper
  • IRecID=D -> select without core tag
  • to add core tag to already existing record:
  • add fieldcode
  • add cnum to dok-file for conference articles
    • 3c: C12/09/30.2

prepare for upload (cleamxml.py) by physicist

  • create xml files in inspire/insert, inspire/append and inspire/holdingpen for upload
    info from dok-file and single xml-files of the records
  • convert journal names and affiliations
  • checker for fieldcodes
  • delete affiliations for non-core papers
    non-core → no curation of affiliations
  • add automatic INSPIRE keywords
  • add tags: batchupload, temporary record, published
  • for POS add FFT field
  • for merges: delete title if ~identical
    write xml-file for append with some info that is always added and take this out of holdingpen record: abstract, ...??
  • what else??

notification for cataloger and upload by physicist

  • dokirt.py eventually will create RT ticket, for now create html page with links for BibEdit and pdf of first 2 pages of each article. Example: http://www.desy.de/~sachs/iop211_curation.html
  • senddokmail.py create and send email to cataloger
  • copy xml files from DESY to CERN for upload

curation of records by cataloger

  • open up to 20 records in BibEdit, link given via email or web-page
  • new records: check everything, incl. references if we have them
    mostly without references at this stage since data are based on dok-file
  • merges:
    • "apply changes" for holding pen record
    • go through all prompts: "add changes as a new field" / "discard changes" one-by-one
    • finally "apply all the changes"
  • delete "temporary record"
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2013-08-21 - KirstenSachs
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Inspire All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback