Mappings to rules for BibCheck

Mappings of the tasks below to a CernBibCheck style language.

;; S16 Conference Information
;;  actually should store only code and t Formats should fetch c,d,f
(check-field-instance ("111" 0  9999 :mandatory-subfields "cdfgt"))

;;S18  S19 Journal References
(check-field-instance("773" :mandatory-subfields "p" :group-optional-subfields "cvy"  :inclusive-optional-subfields "nxa"))
(check-field-replace-subfield-content-via-kbr("773" $$p "<journal_kbr>" report )) 

;;S20 Dates  should be repeated for all date fields (260c, 269c,111d,f 773y,961c,x)  or could use regexp
(check-field-subfield-content ('260' $$c 4 10 :content "date")  

;;S21  pages should 61 or 61-70  (could conceivably check that end page is larger?
(check-field-subfield-content-regexp('773' $$c "\\d+\\-?\\d*")

;; S22 affiliations correct   (should we replace with inst code in 100/700i?)
(check-field-replace-subfield-content-via-kbr("100" $$u "<inst kbr >" report )) 
(check-field-replace-subfield-content-via-kbr("700" $$u "<inst kbr>" report )) 

;; S34 exp codes   coll_exp.kbr should contain collaboration names, and the experiment codes that generally match them
(check-field-replace-subfield-content-via-kbrs("693" '("710" $$g "<coll_exp kbr>" match-exactly )) )
(check-field-replace-subfield-content-via-kbr("693" $$e "<exp kbr>" report )) 

;;S41/42  Correct published and other collections
(check-field-instance('980' 1 9999 :inclusive-mandatory-subfields "ac")
;; peer_review.kbr should contain names  of journals we consider to be published and '$$aPUBLISHED'
(check-field-replace-subfield-content-via-kbrs("980" '("773" $$p "<peer_review.kbr>" match-exactly )) )  
;;need a way to remove published tag?
;;need a way to do rest of FC/TC assignments??  

(check-field-subfield-content-via-kba('980' $$a "<collections.kba>") )

;;Now translate inprocs from the HEP filedefn

;; check that arXiv number is properly formatted??  use regexp + $$9 conditional on 037a??  Also other report numbers are stdized via
s/[\=\/\s\-]+/-/g  
;;but not arXiv!

;; check that citations (999C5) are properly formatted? 999C5r should look like eprint, 999C5s should have journal vol page? via regexp?  worthwhile?




New CernBibCheck functions proposed

Based on what I've found in CernBibCheck I might add the following functions:

  • Might be nice to have a way of allowing manual exceptions to any rule? I.e. in metadata add something that causes BibCheck to always pass certain rules
  • check-field-instance f-tag sf-code :group-optional-subfields str1 all subfields in str1 must appear if any appear, i.e. they must appear as a group.
  • check-field-subfield-content f-tag sf-code :content predicate add predicate "date"
  • check-field-subfield-content-regexp f-tag sf-code regexp action-when-no-match value-to-add
  • Testing uniqueness? Handled by control fields?
  • looking up in a knowledge bas that is a collection, not a file?

Descriptions of tasks

Brief functional descriptions of the jobs described in the SLAC Automated section of ComparisonSlacFermilabDesyCernEnrichmentScripts

The numbers here should match that table.

Spirestasks #9 Proceedings information into CONFERENCE

Adding published proceedings information to the CONFERENCE subfile

SPIRES protocols: conf.check.proceedings

conf.published.proc

Searches through the entire BOOKS subfile for records with the CONF-NUMBER element, makes a list of all the CONF-NUMBERs.

Searches the CONFERENCE subfile for all records with no PUB-NOTE element, makes a list of all the CNUNs. Each list is sorted, then compared to find lines foundin both lists, conference numbers found in the BOOKS subfile (proceedings have been published), but with no publications note in the COFERENCE file. This list is retained. The protocol fetches information from the BOOKS subfile for each conference one by one (book call number, title, editor) in a format ready to merge into the CONFERENCE record. The information is presented on the screento an experienced person for review and possible editing before being merged into the CONFERENCE subfile record.

Spirestask #16 Add MEETING-NOTE to HEP

Finding HEP records with CNUM and adding MEETING-NOTE if needed

SPIRES protocols: hep.check.meeting.note

hep.add.meeting.note

Checks records added to HEP (we check last 30 days) which include the element CNUM, but do not have the element MEETING-NOTE. Checks the RESULT for records with more than one occurrence of the element CNUM and stores the IRN separately to be updated manually, since there will be 2 or more occurrences of element MEETING-NOTE. Using the virtual element GETCONF to get MEETING-NOTE information from the CONF subfile, the protocol uses the format ADD.MN to collect a text file listing IRN, CNUM and MEETING-NOTE for the records, to be checked and merged into the proper HEP records.

Spirestasks #18 Check new PUB-NOTE

Checking and correcting PUB-NOTE

SPIRES protocols: hep.check.newlyadded.journals (makes text files for tasks no.18-21)

hep.check.new.pbn

Checks newly updated records (we check last 3 days, but not TODAY) for records with a PUB-NOTE (Published in ...). The text file alljour.list is created or added to with lines containing the IRN of each record and the PUB-NOTE element:

Published in AIP Conf.Proc.45:138-147,1978.

This text file is checked by an experienced person for anomalies in the PUB-NOTEelement. Any mistakes can be corrected bu searching for the PUB-NOTE and editing the record.

Spirestasks #19 SPICITE elements starting with NONE

Correcting SPICITE elements starting with NONE

SPIRES protocols: hep.check.newlyadded.journals (makes text files for tasks no.18-21)=

hep.fix.bad.pbn

Checks newly updated records (we check last 3 days, but not TODAY) for SPICITE elements starting with NONE, indicating that the PUB-NOTE could not be turned into a SPICITE element with the normal five-letter CODEN. The text file nones.list is created or added to and contains the IRN of each record and the SPICITE virtual element in this form:

41095820 NONE,29,(1955)1

Using the IRN a record is accessed and presented on the screen to an experiencedperson with choices to edit the record, skip to the next record or quit out of the protocol. The PBN element is edited so the virtual element SPICITE will congtain the correct five-letter CODEN. The line in the text file nones.list is then deleted and the IRN in the next line is read in order to process the next record.

Spirestasks #20 Incorrect or missing DATE elements

Checking and correcting incorrect or missing DATE elements

SPIRES protocols: hep.check.newlyadded.journals (makes text files for tasks no.18-21)=

hep.fix.bad.dates

Checks newly updated records (we check last 3 days, but not TODAY) for records with incorrect or missing DATE elements. Creates the text file noyear.list containing the IRN, DATE and DATE-RECEIVED elements:

  • 6171249 Jul 1001
  • 6213588 1915
  • 6205461
  • 7560320 Dec 2007

Reading the first IRN, the protocol searches for the record in the HEP subfile, displays the record on the screen with choices to edit the record, skip to the next record or quit out of the preotocol. An experienced person will edit and update the record. The line containing the IRN is then deleted from the text file noyear.list and the next line is read.

Spirestasks #21 Bad page numbering

Check and correct page numbers

SPIRES protocols: hep.check.newlyadded.journals (makes text files for tasks no.18-21)

hep.fix.bad.page

Checks newly updated records (we check last 3 days, but not TODAY) for records with weird page ranges, e.g. 64-, 46-35, by subtracting the second pagenumber from the first in the PUB-NOTE element and calling the remnant DELTA. A text file is created or lines added to an existing file for all records where the DELTA is a negative numbe or 0. The text file contains IRN, DELTA and the PUB-NOTE elements with the journal title converted to a five-letter CODEN:

  • 6045812 -41 CSJAAA,5,41-.2005
  • 6099572 -82 JPAGB,A38,4665-4583.2005

Reading the IRN, a protocol searches for the first IRN in the HEP subfile, displays the record to be edited manually, and updates the record, then deletes the line containing the IRN in the text file (so the process can be interrupted and continued at anytime) and the next line is read to process the next record.

Spirestasks #29 Duplicate REPORT-NUM or SPICITE elements

Checking and correcting records with duplicate REPORT-NUM or SPICITE elements

SPIRES protocols: hep.check.duplicate

hep.deal.wwith.duplicates

Checks records updated the last 30 days (excludes those with HIDDEN-NOTE=rdupok;records that have been checked and may have duplicate REPORT-NUM elements for some reason). Collects a text file of all REPORT-NUM and SPICITE elements which is then sorted and uniqued. Each line is made a search command in HEP, and IRNs with a RESULT of 2 or more are checked for HIDDEN-NOTE = dupok: and collected into another text file. This text file is again sorted and uniqued and the form:

  • SPICITE = PRTEA,4,37;
  • REPORT-NUM = rl 78 059;

Each line is made a search command in HEP and the resulting set of 2-4 records presented on the screen to an experienced person who will edit, annotate or delete the affected records. Before going on to the next record, the person is given a menu of choices: deleting the line containing the search beacuse the duplication has been fixed and going on to the next set of records, edit some more, adding HIDDEN-NOTE = rdupok; to records which cannotbe changed (so they don't come up in the initial search again and agin), skipping this set of records or quitting out of the program. The line in the text file containing the SPICITE or REPORT-NUM element is then deleted and the next line is read to start the next search.

Spirestasks #33 New authors in HEP

Checking new authors noticed at input into HEP

SPIRES protocols: ppfin.checks

hep.check.new.authors

At the time of input into HEP each author name is checked in HEP. If the name does not appear in HEP, the person doing the input is prompted to check for misspelling and given a chance to coorect the input. If the author is new, the name is added to a text file in the form:

arXiv:0709.0009:Daniel, Scott F.

Reading the first line, the author name is presented to an experienced person with choices to browse the authorname in PRE-FOLIOAUTH, correct the name, skip this name or quit the protocol. PRE-FOLIOAUTH contains all the author names in HEP, using the format AUTHCHECK displays the author name with the number of records for the author in HEP. An experienced person will then decide whether to accept the new author name or correct it in the HEP record. The line containing this name in the text file is then deleted and the next line is read, presenting the next AUTHOR.

Spirestasks #34 Records with COLLABORATION, but no EXPERIMENT element

Checks record with the COLLABORATION element, but missing the EXPERIMENT element SPIRES protocol: hep.check.exp hep.add.experiments

Checks records in HEP updated the last 4 days with the COLLABORATION element, but no EXPERIMENT element (excluding deleted records, temporary records, and records with a HIDDEN-NOTE = noexpok, records that we have decided do not need the EXPERIMENT element). Stores IRN, BULL and COLLABORATION elements in a text file in the form:

7603878 arXiv:0801.0697 BABAR

Using the IRN on the first line searches for the COLLABORATION in the EXPERIMENTS subfile and displays the record. Searches the EXPERIMENTS subfile to find the experiment connected with this collaboration and constructs the elemen EXPERIMENT = xxx; ready to be merged into the HEP subfile. Displays the proposed element, then gives choices to merge it into HEP, edit before merging, skip this record or quitting out of the pretocol. If the COLLABORATION is not found in the EXPERIMENTS subfile, offers choices to skip the record, look at the record on another screen, or merge HIDDEN-NOTE = noexpok; into the record. If the search in EXPERIMENTS results in more than one COLLABORATION, displays the records and suggests solving this on another screen, then again presents the choices to merge, skip, edit, etc. Finally the line ocntaining the IRN, BULL and COLLABORATION is deleted in the text file and the next line is read.

Spirestasks #36 Missing FIELD-CODE element

Checks for missing FIELD-CODE elements and prompts for input SPIRES protocols: hep.missing.fieldcode

hep.add.fieldcode

Checks records added to HEP for the last 4 days (excluding deleted records) to find records without the FIELD-CODE element and stores the IRN in a text file.

Using the IRN on the first line, attempts to construct a FIELD-CODE element from the BBDESCRIP element, presents it to an experienced person with choices to merge it into the record, otherwise prompts for a FIELD-CODE element to be typed in. Offers choices to merge, skip the record or quit, then erases the line containing the IRN that has been done and goes on to the next IRN.

-- TravisBrooks - 06 Feb 2008

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2008-09-04 - TravisBrooks
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Inspire All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback