-- MikeSullivan - 18-Aug-2011

Reference Counting Differences in Inspire and Spires.

There is a population of HEP records that harbor a discrepancy in the citation counts between Spires and Inspire

Here is a breakdown of the scope of the problem:

Total records with cite discrepancies          11000

1 cite diff                                     8076    76 have higher count in Inspire

2-6 cite diffs                                  2600    16 have higher count in inspire

7-20 cite diffs                                  300

21+ cite diffs                                   100

One of the records above is referenced by these spicites and PBNS:

 IRN = 564052;
 TITLE = On the Nonlinear Quantum Field;
 PUB-NOTE = Nuovo Cim.A46:1-16,1978;
 Inspire record:  http://inspirebeta.net/search?ln=en&p=fin+irn+564052&action_search=Search  (cited by 11 records)

 IRN = 291226;
 TITLE = Conformal Relativity: A Theory of Mass. 1. Survey of Theoretical Results;
 PUB-NOTE = Nuovo Cim.B46:1-15,1978;
 PUB-NOTE = Erratum-ibid.B48:311-312,1978;
 Inspire record:  http://inspirebeta.net/search?ln=en&p=fin+irn+291226&action_search=Search  (cited by 11 record)

In Spires, both of these records are referenced by the following HEP record:

 IRN = 1956078;
 CITATION = APNYA,148,346;
 CITATION = APNYA,157,181;
 CITATION = PHRVA,156,1546;
 CITATION = PRLTA,52,1713;
 CITATION = JMAPA,27,1523;
--->  CITATION = NUCIA,46,1;

 In the reference list,  the reference is listed w/o the A or the B in the
volume portion of the reference,  and in HEP is counted by both papers.

This is incorrect of course, Inspire is correct not to count this particular

Our past teams of Spires programmers wrote search and counting rules to make the
task of collecting the thousands of ambiguous references into countable cites,
reasoning that the volume/page collisions were very unlikely.

There were a thousands of papers where there was no ambiguous collision where a journal
volume number and page would be unique, whether or not there was a A,B,C,D, etc letter on
the volume number.    This created a more significant reference count discrepancy between
Spires and Inspire, somewhere in the order of 30k.

I corrected these records in Spires and passed them into Inpsire knocking down the
discrepancy to it's current number.

In references as the above NUCIA,A46,1 and NUCIA,B46,1 collision, I made substantial gains
in the citecounts by creating tables of co-references, and counting them.  Thus if a
paper with reference NUCIA,46,1 also contained a dozen other references in common with
NUCIA,A46,1 and none co-referencing NUCIA,B46,1,  I modified the Spires record from NUCIA,46,1
to NUCIA,A46,1.

Indeed, this was the case for the above papers.  The result of the changes I batched into
Spires-HEP was that NUCIA,B46,1 lost 4 citations,  NUCIA,A46,1 retained the same count.

In INSPIRE-HEP,  NUCIA,A46,1 gained the four citations,  and NUCIA,B46,1 retained the same count,
more accurately reflecting the references in the paper.

There is still the above discrepancy in IRN 1956078.  In this case, there were no co-references
to go by.

In this process, I systemized the references and cites as much as possible to place the letter
in front of the volume.  In this manner I reconciled a further 10k citations,  but I missed a
number of records in the process.  I tried to stick to the letter-less problem first, and it was
only in that process that I ensured the letter order was correct.

The exception was NUPHZ (yes there always has to be an exception), where I placed the letter after
the volume because this was how it was listed in the Pub-note for almost all the papers.

Remaining Discrepancies:

1)  Papers as NUCIA,A46,1 above where I was unable to resolve a collision with another journal through a database means
2)  a substantial number of papers where letter exists, but is reversed.
3)  seemingly plentiful problems with some CONFP and ECONF references.  Haven't investigated closely
4)  about 4k I have not looked closely at

There is one issue where I discovered a problem in Inspire where it was improperly counting bad references.
I will make an RT ticket on this, but it is summed up so:

In SPIRES,  the record IRN=3053713 described by spicite NUPHZ,42,270 is cited in Inspire 18 times, and in Spires 14 times.`
Among the citing records in inspire is the 6th record:


In the list of references you see CITATION,42,270.  This information was imported verbatim from Spires.  In the case of
this record, it really does cite this paper,  but any other paper with a matching volume and page will get cited as well.
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2011-08-18 - MikeSullivan
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Inspire All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback