Record structure comparison between SPIRES and Invenio

Contents:

SPIRES

Bare SPIRES record (not comprehensive)

As promised here is a "bare" SPIRES record selected (somewhat) randomly just to give an idea:


 IRN = 3418758;
 DOC-TYPE = Preprint;
 REPORT-NUM = SLAC-PUB-9865;
 ASTR;
   AUTHOR = Brooks, T.C.;
   AUTHOR = Convery, M.E.;
   AUTHOR = Davis, W.L.;
   AUTHOR = DelSignore, K.W.;
   AUTHOR = Jenkins, T.L.;
   AUTHOR = Kangas, E.;
   AUTHOR = Knepley, M.G.;
   AUTHOR = Kowalski, K.L.;
   AUTHOR = Taylor, C.C.;
   AFFILIATION = Case Western Reserve U.;
 ASTR;
   AUTHOR = Oh, S.H.;
   AUTHOR = Walker, W.D.;
   AFFILIATION = Duke U.;
 ASTR;
   AUTHOR = Colestock, P.L.;
   AUTHOR = Hanna, B.;

   AUTHOR = Martens, M.;
   AUTHOR = Steets, J.;
   AFFILIATION = Fermilab;
 ASTR;
   AUTHOR = Ball, R.;
   AUTHOR = Gustafson, H.R.;
   AUTHOR = Jones, L.W.;
   AUTHOR = Longo, M.J.;
   AFFILIATION = Michigan U.;
 ASTR;
   AUTHOR = Bjorken, J.D.;
   AFFILIATION = SLAC;
 ASTR;
   AUTHOR = Abashian, A.;
   AUTHOR = Morgan, N.;
   AFFILIATION = Virginia Tech.;
 ASTR;
   AUTHOR = Pruneau, C.A.;
   AFFILIATION = Wayne State U.;
 COL-NOTE = MiniMax Collaboration;
 TITLE = Analysis of charged particle / photon correlations in hadronic multiparticle production;
 PUB-NOTE = Phys.Rev.D55:5667-5680,1997;
 SLAC-TOPICS = SLAC, there, 09/96;
 DATE = Sep 1996;
 JOUR-SUB = Phys.Rev.D;
 PPF-SUBJECT = Experimental, S;
 P = 35;
 PPA = 9716;
 PPF = 9639;
 CITATION = PHLTA,B217,169;
 CITATION = PHLTA,B266,482;
 CITATION = IMPAE,A7,4189;
 CITATION = APPOA,B23,561;
 CITATION = PHRVA,D46,246;
 CITATION = HEP-PH 9211282;
 CITATION = NUPHA,B399,395;
 CITATION = PHRVA,D51,2482;
 CITATION = HEP-PH 9411329;
 CITATION = HEP-PH 9501210;
 CITATION = PRLTA,72,970;
 CITATION = RPPHA,58,611;
 CITATION = JTPLA,33,67;
 CITATION = APNYA,66,509;
 CITATION = IMPAE,A2,1447;
 CITATION = IMPAE,A4,1527;
 CITATION = MPLAE,A8,2747;
 CITATION = JTPLA,59,585;
 CITATION = PHRVA,D49,5805;
 CITATION = HEP-PH 9503325;
 CITATION = PHRVA,D9,3113;
 CITATION = NUIMA,138,241;
 CITATION = NUIMA,140,533;
 CITATION = PHLTA,B206,707;
 CITATION = ZEPYA,C43,75;
 CITATION = HEP-PH 9309235;
 CITATION = BAPSA,41,902;
 CITATION = BAPSA,41,938;
 CITATION = PHRVA,D50,6811;
 CITATION = PRPLC,65,151;
 CITATION = NUPHA,B370,365;
 CITATION = PHRVA,D48,5;
 CITATION = CPHCB,46,43;
 EXPERIMENT = FNAL-E-0864;
 PPFIN-ACCT = LIRYG;
 DESY-KEYWORDS = data analysis method;
 DESY-KEYWORDS = hadron hadron, interaction;
 DESY-KEYWORDS = anti-p p, annihilation;
 DESY-KEYWORDS = multiple production;
 DESY-KEYWORDS = charged particle, hadroproduction;
 DESY-KEYWORDS = photon, associated production;
 DESY-KEYWORDS = pi, charged particle;
 DESY-KEYWORDS = pi0;
 DESY-KEYWORDS = multiplicity, moment;
 DESY-KEYWORDS = symmetry, chiral;
 DESY-KEYWORDS = critical phenomena;
 DESY-KEYWORDS = pi, condensation;
 DESY-KEYWORDS = numerical calculations, Monte Carlo;
 DESY-ABS-NUM = D96-20944;
 DESY-CLASS-CODE = G;
 DESY-CLASS-CODE = D;
 DATE-UPDATED = 07/08/2004;
 ACCOUNT-UPD = LI.KAL;
 DATE-ADDED = 09/18/1996;
 ACCT-ADDED = CITES;
 TT = Analysis of charged-particle/photon correlations in hadronic  multiparticle production.;
 BULL = HEPPH-9609375;
 URL = ADSABS;
   URLDOC = 1997PhRvD..55.5667B;
 URL = PHRVA-D;
   URLDOC = V55/P05667/;
 URL = SLACPUB;
   URLDOC = 9865;
 TIME-UPD = 11:47:06;
 PACS = 13.87.Ce;
 PACS = 14.40.Aq;
 PACS = 14.70.Bh;
 NEW-DESY-CHECK = Preprint - Brooks, T.C. (rec.Sep.96) 35 p.;
 CATDATE = 09/20/1996;
 CATTIME = 10:44:34;
 DOI = 10.1103/PhysRevD.55.5667;

Notes

  • Note I haven't changed this a bit from what SPIRES naturally spits out. We can, of course, reformat it within SPIRES, or, since this format is easy to parse, we can do it outside of SPIRES.
  • Note that SPIRES contains more information than what is shown here. SPIRES has a concept of virtual elements in a record, which are treated by formats and searches as if they are real elements, but are actually calculated from other information when requested.
    • For example, citecount is the total count of citations to this article. It is not shown above, but if one asks for it, SPIRES immediately calculates the number of papers currently citing this one and presents the number as
       CITECOUNT = 43;
      These virtual elements are not always "meta-metadata", like citecount, but we do need to be cognizant of them because we may need to store them as real fields in Invenio (?), or perhaps as some sort of stored procedure.
    • Another example would be URLs to this paper. For those that don't use DOI (and for some who do) we store the full url as a display time calculated value (virtual element) that is calculated from information about the publisher website, etc. Similarly for DOI based urls, and arXiv ones. It is very handy to have these values accessible for format writing as if they are real data in the record, but it is also handy to have them stored in other ways. I'm interested to hear how Invenio provides this sort of thing.
  • Note also that there is other information about this article in other places in SPIRES, which I have not included here in this first pass (abstracts, arxiv categories, citations as extracted from the paper, possibly an author extraction...)
  • Note that some elements are expressed as
    ELEMENT = value, value, value;
    but this is a convenience from SPIRES, the values are actually stored in repeated elements.

-- TravisBrooks - 18 Jun 2007

XML SPIRES Record


<goal_record>
 <irn>3418758</irn>
 <doc-type>Preprint</doc-type>
 <report-num>SLAC-PUB-9865</report-num>
 <astr>
  <astr1>
   <author>Brooks, T.C.</author>
  </astr1>
  <astr1>
   <author>Convery, M.E.</author>
  </astr1>
  <astr1>
   <author>Davis, W.L.</author>
  </astr1>
  <astr1>
   <author>DelSignore, K.W.</author>
  </astr1>
  <astr1>
   <author>Jenkins, T.L.</author>
  </astr1>
  <astr1>
   <author>Kangas, E.</author>
  </astr1>
  <astr1>
   <author>Knepley, M.G.</author>
  </astr1>
  <astr1>
   <author>Kowalski, K.L.</author>
  </astr1>
  <astr1>
   <author>Taylor, C.C.</author>
  </astr1>
  <affiliation>Case Western Reserve U.</affiliation>
 </astr>
 <astr>
  <astr1>
   <author>Oh, S.H.</author>
  </astr1>
  <astr1>
   <author>Walker, W.D.</author>
  </astr1>
  <affiliation>Duke U.</affiliation>
 </astr>
 <astr>
  <astr1>
   <author>Colestock, P.L.</author>
  </astr1>
  <astr1>
   <author>Hanna, B.</author>
  </astr1>
  <astr1>
   <author>Martens, M.</author>
  </astr1>
  <astr1>
   <author>Steets, J.</author>
  </astr1>
  <affiliation>Fermilab</affiliation>
 </astr>
 <astr>
  <astr1>
   <author>Ball, R.</author>
  </astr1>
  <astr1>
   <author>Gustafson, H.R.</author>
  </astr1>
  <astr1>
   <author>Jones, L.W.</author>
  </astr1>
  <astr1>
   <author>Longo, M.J.</author>
  </astr1>
  <affiliation>Michigan U.</affiliation>
 </astr>
 <astr>
  <astr1>
   <author>Bjorken, J.D.</author>
  </astr1>
  <affiliation>SLAC</affiliation>
 </astr>
 <astr>
  <astr1>
   <author>Abashian, A.</author>
  </astr1>
  <astr1>
   <author>Morgan, N.</author>
  </astr1>
  <affiliation>Virginia Tech.</affiliation>
 </astr>
 <astr>
  <astr1>
   <author>Pruneau, C.A.</author>
  </astr1>
  <affiliation>Wayne State U.</affiliation>
 </astr>
 <col-note>MiniMax Collaboration</col-note>
 <title>Analysis of charged particle / photon correlations in hadronic
multiparticle production</title>
 <pub-note>Phys.Rev.D55:5667-5680,1997</pub-note>
 <slac-topics>SLAC, there, 09/96</slac-topics>
 <date>Sep 1996</date>
 <jour-sub>Phys.Rev.D</jour-sub>
 <ppf-subject>Experimental, S</ppf-subject>
 <p>35</p>
 <ppa>9716</ppa>
 <ppf>9639</ppf>
 <citation>PHLTA,B217,169</citation>
 <citation>PHLTA,B266,482</citation>
 <citation>IMPAE,A7,4189</citation>
 <citation>APPOA,B23,561</citation>
 <citation>PHRVA,D46,246</citation>
 <citation>HEP-PH 9211282</citation>
 <citation>NUPHA,B399,395</citation>
 <citation>PHRVA,D51,2482</citation>
 <citation>HEP-PH 9411329</citation>
 <citation>HEP-PH 9501210</citation>
 <citation>PRLTA,72,970</citation>
 <citation>RPPHA,58,611</citation>
 <citation>JTPLA,33,67</citation>
 <citation>APNYA,66,509</citation>
 <citation>IMPAE,A2,1447</citation>
 <citation>IMPAE,A4,1527</citation>
 <citation>MPLAE,A8,2747</citation>
 <citation>JTPLA,59,585</citation>
 <citation>PHRVA,D49,5805</citation>
 <citation>HEP-PH 9503325</citation>
 <citation>PHRVA,D9,3113</citation>
 <citation>NUIMA,138,241</citation>
 <citation>NUIMA,140,533</citation>
 <citation>PHLTA,B206,707</citation>
 <citation>ZEPYA,C43,75</citation>
 <citation>HEP-PH 9309235</citation>
 <citation>BAPSA,41,902</citation>
 <citation>BAPSA,41,938</citation>
 <citation>PHRVA,D50,6811</citation>
 <citation>PRPLC,65,151</citation>
 <citation>NUPHA,B370,365</citation>
 <citation>PHRVA,D48,5</citation>
 <citation>CPHCB,46,43</citation>
 <experiment>FNAL-E-0864</experiment>
 <ppfin-acct>LIRYG</ppfin-acct>
 <desy-keywords>data analysis method</desy-keywords>
 <desy-keywords>hadron hadron, interaction</desy-keywords>
 <desy-keywords>anti-p p, annihilation</desy-keywords>
 <desy-keywords>multiple production</desy-keywords>
 <desy-keywords>charged particle, hadroproduction</desy-keywords>
 <desy-keywords>photon, associated production</desy-keywords>
 <desy-keywords>pi, charged particle</desy-keywords>
 <desy-keywords>pi0</desy-keywords>
 <desy-keywords>multiplicity, moment</desy-keywords>
 <desy-keywords>symmetry, chiral</desy-keywords>
 <desy-keywords>critical phenomena</desy-keywords>
 <desy-keywords>pi, condensation</desy-keywords>
 <desy-keywords>numerical calculations, Monte Carlo</desy-keywords>
 <desy-abs-num>D96-20944</desy-abs-num>
 <desy-class-code>G</desy-class-code>
 <desy-class-code>D</desy-class-code>
 <date-updated>07/08/2004</date-updated>
 <account-upd>LI.KAL</account-upd>
 <date-added>09/18/1996</date-added>
 <acct-added>CITES</acct-added>
 <tt>Analysis of charged-particle/photon correlations in hadronic  multiparticleproduction.</tt>
 <bull>HEPPH-9609375</bull>
 <url-str>
  <url>ADSABS</url>
  <urldoc>1997PhRvD..55.5667B</urldoc>
 </url-str>
 <url-str>
  <url>PHRVA-D</url>
  <urldoc>V55/P05667/</urldoc>
 </url-str>
 <url-str>
  <url>SLACPUB</url>
  <urldoc>9865</urldoc>
 </url-str>
 <time-upd>11:47:06</time-upd>
 <pacs>13.87.Ce</pacs>
 <pacs>14.40.Aq</pacs>
 <pacs>14.70.Bh</pacs>
 <new-desy-check>Preprint - Brooks, T.C. (rec.Sep.96) 35 p.</new-desy-check>
 <catdate>09/20/1996</catdate>
 <cattime>10:44:34</cattime>
 <doi>10.1103/PhysRevD.55.5667</doi>
</goal_record>


  • This is just the straightforward XML translation of the above record. The only oddity is the extra "astr1" surrounding the author elements which makes this a bit noisy. This is trivial for us to generate
    • note to cognoscenti- I did this by set format $genform.xml displaying a hep record, this generates a format definition that I stored in formats as LI.TCB.PREPRINT.XML
  • This does not include calculated elements, or information from associated databases (abstracts/pre-url/others) both of which eventually need to be considered.
  • We should use this reocrd as a starting point of spires output, devise translations to import to Invenio, then see what else we need to include in this record or as subsidiary dbs.

Explanation of Non-intuitive SPIRES fields

  • SLAC-Topics - if the document involved a SLAC author, this is filled in to help us track special things about these papers. This probably should have no analog in Inspire, but instead should be moved to our local repository or a connected local DB.
  • PPF-Subject - a very brief subject classification including subject info as well as "S"=> Published. Now deprecated in favor of field code (16 values) and type code (8 values), but not all records have been translated yet. The values of FC and TC are available on our wiki and I can paste it in here eventually...
  • P - pages (number of pages in paper- usually we use the eprint verison, which may be substantially different from published version)
  • ppf - the YYWW (i.e. last 2 digits of year, and the "weeknumber" - 1-52) when the paper was input. Here input means that all human-assisted checks have been done. The appearance of this number serves as a sign that the record does not need these checks unless something else happens to them, and also serves as a workflow management tool (stats on papers input_/remaining, etc). There is also an email alerting service based on sending out records after _input but this is relatively unimportant.
  • ppa - the YYWW that the paper obtained published information (i.e. it was a preprint, but now it is published). This generates another alerting service, but other than that, it is unimportant.
  • DESY-CLASS-CODE - This is another category coding that will eventually be rolled into FC
  • date/acct upd/add - these 4 elements track the most recent date and account that touched the record, as well as the original account and time for the record's creation
  • cattime/date - these two elements refer to the date and time of human checks inputting as described above. Thus catdate and ppf are redundant information.
  • NEW-DESY-CHECK - standardized rep nr + inst + date + date received + pages. to be matched against R? to be discontinued? (Annette)
  • TT - tex-title This is the title as written by the author upon submission to arXiv.org It often contains tex markup. ( TT contains as well title variants from DESY. To be discontinued. Need to define clear common rules (Annette) ) The regular title is modified by SPIRES to attempt to standardise various words, abbreviations, symbols to make things more readable and consistent. We actually need 3 titles:
    • a title that creates search terms (including words from old titles, common spelling changes, acronyms, etc.)
    • a title the represents the way the author put it on arXiv/journal/etc
    • a title that displays symbols correctly
  • DESY-PUB-NOTE - since 2001 book info that does not fit into PBN or CPBN - where should it go? Before 2001 all pub info from DESY - needs to be merged into PBN + CPBN. (Annette)
  • DESY-CHECK - complete bibl info from DESY used till 1996 (consisting of DPBN + NDCK). Need to check whether all info has been merged correctly into PBN, CPBN, R? (Annette)

Comprehensive SPIRES Record

Rather than retype fields in the table below (I started...) I will include below the full set of SPIRES elements in its own table - this is from the internal record documentation, modified slightly by me to fit in the table

  • Elem type - Fixed/Optional/Required/Virtual
    • opt means that it may or may not occur
    • fix or req means it must occur
    • vir means it is calculated at display time from other values in the record or elsewhere in the database(s) - these are moved to the end in a separate table unless they are in a structure in this main table
  • Occ R(epeatable)/NR(non-repeatable)
  • Element - the name of the element some elements are in structures that can be repeated. The structure is listed as an element, then sub-elements are listed with ":" in front so you can see they are in the structure listed above. In 2 cases (circ and abstracts, there are virtual links to all elements available from other databases (abstracts and circ...)
  • Status O - obsolete or D - deprecated or S - Suspicious (probably a better way, but still in current use) All others are Current.
  • Notes most notes are in the mapping table below, but some fields only appear here

Bibliographic elements

Elem type (see above) Occ Element Status Notes
Fix NR IRN
Fix NR DOC-TYPE D All records have this, but I don't know that it is used or maintained properly...
Opt R REPORT-NUM
Opt R Structure: ASTR
Opt R Structure: :ASTR1
Req NR ::AUTHOR
Opt NR ::DESY-AUTHOR S authors containing umlauts from DESY (Annette)
Vir NR ::AUTHOR-SORT S Calculated trivially from author -easy to do in other ways
Opt R :AFFILIATION   These must match authority file, institutions database
Vir R :COUNTRY   Calculated by looking up country information in inst. db
Vir R :DESYLOOKUP   Alt way of writing inst name calc from inst. db
Opt R CORP-AUTHOR   allows an entity to write paper, not personal name
Opt R COL-NOTE
Opt NR TITLE
Opt R PUB-NOTE
Opt R EXTRA-PUB-NOTE O  
Opt R SLAC-TOPICS S Slac coding for admin reasons
Opt R DATE    
Opt NR JOUR-SUB    
Opt NR LANGUAGE   Language it is written in
Opt NR PPF-SUBJECT D ready to be translated to FC/TC
Opt NR HOLDINGS S This is important...do we link to local holdings catalogs?
Opt NR AV D Is it on Microfiche...?
Opt NR P   pages
Opt R CITATION
Opt R MEETING-NOTE S This currently contains conf. info, but should rather be part of a conf structure looking up auth. file
Opt R NOTE   Free form notes for display
Opt R REPORT-CANCEL   To allow a non-displaying record
Opt NR TRANS-NOTE O?? ???
Opt R SUBJECT-HEADING O? ??
Opt R LIB-NEWS-CLASS O Library subject codes
Opt R Structure: CPN   Conf. info should replace Meeting Note and cnum
Req NR :CONF-PUB-NOTE    
Opt NR :CALL-NUM D? Call num of proceeedings
Opt R :CPBNX    
Opt NR :CONF-PUB-JOUR    
Opt R EXPERIMENT   Authority file- experiments database
Opt R TITLE-CHANGE-J D (use old title instead) Searched with title
Opt R DESY-PUB-NOTE S Should be combined with PBN
Opt R DESY-KEYWORDS    
Opt R SLAC-EXPERIMENT O?  
Opt R DESY-CLASS-CODE O to be mapped into FC (Annette)
Opt NR DESY-CHECK O bibl info from DESY till 1996 (Annette)
Opt R NEW-DESY-CHECK S preprint bibl info from DESY since 1997 (Annette)
Opt R CALTECH-TAG O? ??
Opt R ENERGYRANGE-CODE O? one-digit code for energy range of reactions (Annette)
Opt R SLAC-DETECTOR O? ??
Opt R TITLE-VARIANT S Searched with title, used for acronyms
Opt R OTHER-AUTHOR S Searched with author
Opt R TT   Tex or arXiv title or DESY title variant
Opt R BULL   arXiv number
Opt R DESYR S standardized rep nr from DESY - to be discontinued? (Annette)
Opt R Structure: URL-STR
Req NR :URL   Key to lookup in URL list (pre-url)
Opt R :URLDOC   Record specific piece of URL
Opt R :URLNOTE O ???
Vir NR :TRUE-URL   Calculated from URL, URLDOC, and info from pre-url file -> the actual location
Opt R PR-STATUS O? ???
Opt R PACS    
Opt R TOPCIT S Divides recs into broad citation categories for searching
Opt R CONF-NUMBER S key to conferences file, should be part of cpn...
Opt R OLD-TITLE   Searched with title, used for title changes
Opt R CERNKEY   wink
Opt R FIELD-CODE    
Opt R TYPE-CODE    
Opt R FREE-KEYWORDS   Used for non-controlled vocab keywords (i.e. author supplied)
Opt R DOI    
Opt R TEXKEY   key of the record in latex cite format not yet used, but niportant for latex users

Workflow elements

Elem type (see above) Occ Element Status Notes
Opt R DATE-RECEIVED O used only if DATE unknown - basically equivalent to DATE-ADDED? (Annette)
Opt R SLAC-DIST O slac-specific coding
Opt NR PPA   YYMM of journal addition
Opt R PPF   YYMM of checking of record
Opt R PPFIN-ACCT person who checked record
Opt R DESY-ABS-NUM   DESY's unique document id (after keywording) (Annette)
Opt R PDGSC   ???
Opt R SLAC-DATE D Date of SLAC registration of record (used in SLAC file instead)
Opt R HIDDEN-NOTE   Internal notes
Opt R CATDATE   Date of checking
Opt R CATTIME   Time of checking
Opt R TIME-UPD    
Opt R DATE-ADDED    
Opt R ACCT-ADDED
Opt R OLDPPA ???
Opt R XDRN   DESY mark/ref nr, removed when keyworded, useful for checking hep relevance (Annette)
Opt R DATE-UPDATED    
Opt R ACCOUNT-UPD
Opt R Structure: ORDER-STR O Ordering materials???
Req NR :SOURCE
Opt R :ORDER-DATE
Opt R :REQUESTER  
Opt R :ORDER-NOTE
Opt R :COST
Opt R :ORDER-PLACED-BY

Virtual (calculated) elements

Elem type (see above) Occ Element Status Notes
Vir NR DATE-SORT   falls through date, dateadded, etc until gets a date - guarantees a date and puts it in numeric sortable form
Vir NR JOURNAL-YEAR   year of jour pub (from pbn or dpbn)
Vir NR REPORTNO-SORT  
Vir NR SLAC-REPORTNO D rept nums that begin with SLAC
Vir NR PUB-CATEGORY   ??
Vir NR FIRST-AUTHOR    
Vir NR FIRST-AUTHORSORT
Vir NR JINDEX   jnl + vol (Annette)
Vir R LA-URL   arXiv url (Annette)
Vir R Structure: phan CIRC-STR (Subfile CIRC) Access to the circulation records for this object at SLAC
Vir R SLAC-EXP ??
Vir R LANL-NUMBER   arXiv number in real form (bull is currently screwed up, not the actual form arXiv used, fixing this RSN...)
Vir R ECONF-TITLE D econf related stuff
Vir NR YEAR   from date
Vir R SPICITE D the way it might appear in our citations calc from PBN should use Mycite instead (relies on this, but only internally)
Vir R SPICITE2 D ??
Vir R SPICITE3 D Same as spicite, but from dpbn
Vir R BBADDRESS O holdover from when we munged the arXiv ids
Vir R BBDESCRIP O ditto
Vir R CITECODEN O? ???
Vir R GETCONF S Fetches meeting info from conferences using CNUm not used as it should be
Vir NR ALL-JOUR S similar to spicite, but includes year and cobmines pbn and dpbn
Vir NR PBN_DPBN_SORT D  
Vir NR ALL-JOURTITLE S  
Vir NR JOURNAL-PAGE   Page of journal, from spicite/pbn/dpbn
Vir R SPICITE4   ???
Vir NR BOX S SLAC specific location in storage (from "Storage" )
Vir NR AUTHCOUNT   number of authors- convenient from Authors
Vir NR CITECOUNT   number of papers citing this one, complicated, but very important calculation, from citation of other records
Vir NR DISPJ   preferred way to display journal information
Vir NR MYCITE   The various objects this paper might be cited by in (SPIRES) ref. lists. from spicite, lanl, report, spicite3
Vir NR CITEFORM   the various ways that the above objects might appear (i.e. with or without a volume letter, etc etc.
Vir R MYCITES S Mycite, broken up into repeating elements.
Vir R PRIMARCH   The main arXiv category for the paepr (not crosslisted) - from abstracts file
Vir R ARCH   arXiv categories incl cross listings (Annette)
Vir R Structure: phan ABSTRACTS (SubfileABSTRACTS)   access to the abstracts file which includes arXiv information
Vir R BOTHREFS   The references of this paper, listed as both eprint refs and journals (easier to check for dupes/accuracy/etc)

Invenio

Record structure documentation

Record structure examples

1. For any record in CDS, the internal MARCXML format is available as an output format option via the detailed record page. For example, search for hep-th/0102003, click on proposed detailed record link, then on MARCXML output format in order to inspect it. This functionality is available for any record in CDS.

2. To describe MARC markup, one usually uses two notations:

a) human-friendly:

100 $a Ellis, John $e editor

b) machine-friendly (MARCXML):

<datafield tag="100" ind1=" " ind2=" ">
  <subfield code="a">Ellis, John</subfield>
  <subfield code="e">editor</subfield>
</datafield>
          

In the examples below I'll use the MARCXML notation.

3. The above SPIRES test record in MARCXML would be:

<record>

<!-- IRN = 3418758; -->

<controlfield tag="001">3418758</controlfield>

<!-- DOC-TYPE = Preprint; -->

<datafield tag="690" ind1="C" ind2=" ">
  <subfield code="a">PREPRINT</subfield>
</datafield>

<!-- REPORT-NUM = SLAC-PUB-9865; -->

<datafield tag="037" ind1=" " ind2=" ">
  <subfield code="a">SLAC-PUB-9865</subfield>
</datafield>

<!-- ASTR;
   AUTHOR = Brooks, T.C.;
   AUTHOR = Convery, M.E.;
   AUTHOR = Davis, W.L.;
   AUTHOR = DelSignore, K.W.;
   AUTHOR = Jenkins, T.L.;
   AUTHOR = Kangas, E.;
   AUTHOR = Knepley, M.G.;
   AUTHOR = Kowalski, K.L.;
   AUTHOR = Taylor, C.C.;
   AFFILIATION = Case Western Reserve U.;

 ASTR;
   AUTHOR = Oh, S.H.;
   AUTHOR = Walker, W.D.;
   AFFILIATION = Duke U.;
 ASTR;
   AUTHOR = Colestock, P.L.;
   AUTHOR = Hanna, B.;

   AUTHOR = Martens, M.;
   AUTHOR = Steets, J.;
   AFFILIATION = Fermilab;
 ASTR;
   AUTHOR = Ball, R.;
   AUTHOR = Gustafson, H.R.;
   AUTHOR = Jones, L.W.;
   AUTHOR = Longo, M.J.;
   AFFILIATION = Michigan U.;
 ASTR;
   AUTHOR = Bjorken, J.D.;
   AFFILIATION = SLAC;
 ASTR;
   AUTHOR = Abashian, A.;
   AUTHOR = Morgan, N.;
   AFFILIATION = Virginia Tech.;
 ASTR;
   AUTHOR = Pruneau, C.A.;
   AFFILIATION = Wayne State U.;

-->

<datafield tag="100" ind1=" " ind2=" ">
  <subfield code="a">Brooks, T.C.</subfield>
  <subfield code="u">Case Western Reserve U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Convery, M.E.</subfield>
  <subfield code="u">Case Western Reserve U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Davis, W.L.</subfield>
  <subfield code="u">Case Western Reserve U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">DelSignore, K.W.</subfield>
  <subfield code="u">Case Western Reserve U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Jenkins, T.L.</subfield>
  <subfield code="u">Case Western Reserve U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Kangas, E.</subfield>
  <subfield code="u">Case Western Reserve U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Knepley, M.G.</subfield>
  <subfield code="u">Case Western Reserve U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Kowalski, K.L.</subfield>
  <subfield code="u">Case Western Reserve U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Taylor, C.C.</subfield>
  <subfield code="u">Case Western Reserve U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Oh, S.H.</subfield>
  <subfield code="u">Duke U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Walker, W.D.</subfield>
  <subfield code="u">Duke U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Colestock, P.L.</subfield>
  <subfield code="u">Fermilab</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Hanna, B.</subfield>
  <subfield code="u">Fermilab</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Martens, M.</subfield>
  <subfield code="u">Fermilab</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Steets, J.</subfield>
  <subfield code="u">Fermilab</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Ball, R.</subfield>
  <subfield code="u">Michigan U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Gustafson, H.R.</subfield>
  <subfield code="u">Michigan U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Jones, L.W.</subfield>
  <subfield code="u">Michigan U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Longo, M.J.</subfield>
  <subfield code="u">Michigan U.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Bjorken, J.D.</subfield>
  <subfield code="u">SLAC</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Abashian, A.</subfield>
  <subfield code="u">Virginia Tech.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Morgan, N.</subfield>
  <subfield code="u">Virginia Tech.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
  <subfield code="a">Pruneau, C.A.</subfield>
  <subfield code="u">Wayne State U.</subfield>
</datafield>

<!-- COL-NOTE = MiniMax Collaboration; -->

<datafield tag="710" ind1=" " ind2=" ">
  <subfield code="g">MiniMax Collaboration</subfield>
</datafield>

<!-- TITLE = Analysis of charged particle / photon correlations in hadronic multiparticle production; -->

<datafield tag="245" ind1=" " ind2=" ">
  <subfield code="a">Analysis of charged particle / photon correlations in hadronic multiparticle production</subfield>
</datafield>

<!-- PUB-NOTE = Phys.Rev.D55:5667-5680,1997; -->

<datafield tag="773" ind1=" " ind2=" ">
  <subfield code="a">10.1103/PhysRevD.55.5667</subfield>
  <subfield code="c">5667-5680</subfield>
  <subfield code="p">Phys. Rev. D</subfield>
  <subfield code="v">55</subfield>
  <subfield code="y">1997</subfield>
</datafield>

<!-- SLAC-TOPICS = SLAC, there, 09/96; -->

FIXME: what is this for?

<!-- DATE = Sep 1996; -->

<datafield tag="269" ind1=" " ind2=" ">
  <subfield code="c">1996-09-00</subfield>
</datafield>

<!-- JOUR-SUB = Phys.Rev.D; -->

NOTE: we use 773 like for PUB-NOTE; if there is no volume/page
information, it means it was submitted to that particular journal.

<!-- PPF-SUBJECT = Experimental, S; -->

<datafield tag="650" ind1="1" ind2="7">
  <subfield code="a">Experimental, S</subfield>
</datafield>

FIXME: what is S?

<!--

 P = 35;
 PPA = 9716;
 PPF = 9639;

-->

FIXME: what is P, PPA, PPF?

<!--

 CITATION = PHLTA,B217,169;
 CITATION = PHLTA,B266,482;
 CITATION = IMPAE,A7,4189;
 CITATION = APPOA,B23,561;
 CITATION = PHRVA,D46,246;
 CITATION = HEP-PH 9211282;
 CITATION = NUPHA,B399,395;
 CITATION = PHRVA,D51,2482;
 CITATION = HEP-PH 9411329;
 CITATION = HEP-PH 9501210;
 CITATION = PRLTA,72,970;
 CITATION = RPPHA,58,611;
 CITATION = JTPLA,33,67;
 CITATION = APNYA,66,509;
 CITATION = IMPAE,A2,1447;
 CITATION = IMPAE,A4,1527;
 CITATION = MPLAE,A8,2747;
 CITATION = JTPLA,59,585;
 CITATION = PHRVA,D49,5805;
 CITATION = HEP-PH 9503325;
 CITATION = PHRVA,D9,3113;
 CITATION = NUIMA,138,241;
 CITATION = NUIMA,140,533;
 CITATION = PHLTA,B206,707;
 CITATION = ZEPYA,C43,75;
 CITATION = HEP-PH 9309235;
 CITATION = BAPSA,41,902;
 CITATION = BAPSA,41,938;
 CITATION = PHRVA,D50,6811;
 CITATION = PRPLC,65,151;
 CITATION = NUPHA,B370,365;
 CITATION = PHRVA,D48,5;
 CITATION = CPHCB,46,43;

-->

<datafield tag="999" ind1="C" ind2="5">
   <subfield code="m">E. Witten</subfield>
   <subfield code="p">253</subfield>
   <subfield code="t">Adv. Theor. Math. Phys.</subfield>
   <subfield code="v">2</subfield>
   <subfield code="y">1998</subfield>
</datafield>
[...]

<!-- EXPERIMENT = FNAL-E-0864; -->

<datafield tag="693" ind1=" " ind2=" ">
  <subfield code="e">FNAL-E-0864</subfield>
</datafield>

<!-- PPFIN-ACCT = LIRYG; -->

<datafield tag="693" ind1=" " ind2=" ">
  <subfield code="a">LIRYG</subfield>
</datafield>

<!--

 DESY-KEYWORDS = data analysis method;
 DESY-KEYWORDS = hadron hadron, interaction;
 DESY-KEYWORDS = anti-p p, annihilation;
 DESY-KEYWORDS = multiple production;
 DESY-KEYWORDS = charged particle, hadroproduction;
 DESY-KEYWORDS = photon, associated production;
 DESY-KEYWORDS = pi, charged particle;
 DESY-KEYWORDS = pi0;
 DESY-KEYWORDS = multiplicity, moment;
 DESY-KEYWORDS = symmetry, chiral;
 DESY-KEYWORDS = critical phenomena;
 DESY-KEYWORDS = pi, condensation;
 DESY-KEYWORDS = numerical calculations, Monte Carlo;

-->

<datafield tag="653" ind1="1" ind2=" ">
  <subfield code="9">DESY</subfield>
  <subfield code="a">data analysis method</subfield>
</datafield>
<datafield tag="653" ind1="1" ind2=" ">
  <subfield code="9">DESY</subfield>
  <subfield code="a">hadron hadron, interaction</subfield>
</datafield>
[...]

<!-- DESY-ABS-NUM = D96-20944; -->

<datafield tag="088" ind1=" " ind2=" ">
  <subfield code="9">DESY</subfield>
  <subfield code="a">D96-20944</subfield>
</datafield>

<!--
 DESY-CLASS-CODE = G;
 DESY-CLASS-CODE = D;
-->

FIXME: what are these codes?

<!-- DATE-UPDATED = 07/08/2004; -->

FIXME: is the the date of metadata/author/fulltext update?

<!-- ACCOUNT-UPD = LI.KAL; -->

FIXME: username who updated record? or who has rights to do so?

<!-- DATE-ADDED = 09/18/1996; -->

<datafield tag="961" ind1=" " ind2=" ">
  <subfield code="c">20070121</subfield>
  <subfield code="x">19960918</subfield>
</datafield>

NOTE: creation ($x) and modification ($c) dates are also usually
stored elsewhere (bibrec), not in metadata

<!-- ACCT-ADDED = CITES; -->

FIXME: username who created record? (tag 859 in this case). Or expresses other rights?

<!-- TT = Analysis of charged-particle/photon correlations in hadronic  multiparticle production.; -->

FIXME: what is TT?

<!-- BULL = HEPPH-9609375; -->

<datafield tag="037" ind1=" " ind2=" ">
  <subfield code="a">hep-ph/9608375</subfield>
</datafield>

<!--
 URL = ADSABS;
   URLDOC = 1997PhRvD..55.5667B;
 URL = PHRVA-D;
   URLDOC = V55/P05667/;
 URL = SLACPUB;
   URLDOC = 9865;
-->

<datafield tag="856" ind1="4" ind2=" ">
  <subfield code="u">http://foo/bar</subfield>
  <subfield code="y">Fulltext in ADS</subfield>
</datafield>

NOTE: Invenio can either store external links or just external IDs and
use create links dynamically based on some rules and knowledge bases.

<!-- TIME-UPD = 11:47:06; -->

Note: see DATE-UPDATED, Invenio stores dates with granularity of seconds

<!--
 PACS = 13.87.Ce;
 PACS = 14.40.Aq;
 PACS = 14.70.Bh;
-->

<datafield tag="650" ind1="1" ind2="7">
  <subfield code="2">PACS</subfield>
  <subfield code="a">13.87.Ce</subfield>
</datafield>
<datafield tag="650" ind1="1" ind2="7">
  <subfield code="2">PACS</subfield>
  <subfield code="a">14.40.Aq</subfield>
</datafield>
<datafield tag="650" ind1="1" ind2="7">
  <subfield code="2">PACS</subfield>
  <subfield code="a">14.70.Bh</subfield>
</datafield>


<!-- NEW-DESY-CHECK = Preprint - Brooks, T.C. (rec.Sep.96) 35 p.; -->

FIXME: what is NEW-DESY-CHECK?

<!--
 CATDATE = 09/20/1996;
 CATTIME = 10:44:34;
-->

FIXME: how this differs from DATE-UPDATED and ACCOUNT-UPD?

<!-- DOI = 10.1103/PhysRevD.55.5667; -->

Note: usually stored with the publication reference in 773, see above.

</record>

An exhaustive record example

FIXME. But see Invenio markup documentation links.

Mapping of SPIRES and Invenio bibliographic fields

SPIRES Invenio Notes SPIRES O(bsolete)/D(eprecated) SPIRES Example
IRN (NR) 001 record ID   7199236
DOC-TYPE (NR) 980 collection indicator D Preprint
REPORT-NUM 037 we also use 088 to store additional report numbers   SLAC-REPRINT-1999-091
ASTR 100 or 700 first author into 100, additional authors into 700 (astr is the name of a block of authors with same affiliations-has no value itself)    
AUTHOR 100/700 $a author name   Brooks, Travis C.
DESY-AUTHOR (NR) ??? alternate author name for non-ascii chars AUTHOR = Stohr, J.;DESY-AUTHOR = Stoehr, J.;
AFFILIATION 100/700 $u author affiliations (repeatable)   St. Petersburg, INP
CORP-AUTHOR   allows author to be entity rather than name D??? IRN = 4719409; CORP-AUTHOR = NIKHEF, Amsterdam;
COL-NOTE 710 collaboration L3 Collaboration
TITLE 245 title   Recalculation of proton Compton scattering in perturbative QCD
PUB-NOTE 773 publication reference   Phys.Rev.D61:032003,2000
SLAC-TOPICS   SLAC "Keywords" for SLAC documents D
DATE 269 imprint   Apr 2001 (actually stored in internal format as 20010400 )
JOUR-SUB 773 we use 773 like for PUB-NOTE; if there is no volume/page information, it means it was submitted to that particular journal Phys.Rev.D
PPF-SUBJECT 65017 subject category D
P (NR)   number of pages
CITATION 999 references
EXPERIMENT 693 $a experiment
DESY-KEYWORDS 6531 $9 DESY keywords attributed by DESY
DESY-CLASS-CODE   ??? D or O
TT   arXiv title/TeX title
BULL 037 arXiv report number
URL 8564 external fulltext links; note that Invenio can either store external links or just external IDs and use create links dynamically based on some rules and knowledge bases.
TIME-UPD 961, but usually stored elsewhere see DATE-UPDATED, Invenio stores dates with granularity of seconds
PACS 65017 PACS subject categories; various subject categories can be stored in 650
DOI 773 $a DOI is stored with the publication reference

SPIRES-to-Invenio Record Conversion Tools

A SPIRES2MARC.xsl stylesheet is available in the DevelopmentInspireCodeRepository repository under bibconvert directory. See the README file located there. An example of usage:

$ ls -l spires.xml                                      # dump records from SPIRES into 'spires.xml'
$ bibconvert -c SPIRES2MARC.xsl < spires.xml > marc.xml # convert records to MARCXML
$ xmllint --format marc.xml                             # inspect nicely formatted MARCXML
$ xmllint --noout marc.xml                              # check compliance to XML standard
$ xmlmarclint marc.xml                                  # check compliance to MARCXML standard
$ bibupload -ir marc.xml                                # upload records into Invenio in insert-or-replace mode
Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2008-02-06 - TiborSimko
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Inspire All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback