ID |
Name |
Description |
Proposed Solution |
Fully Automated |
S1 |
Create the SLAC leaklist |
Checks SLAC items in HEP which are leaks or misses, moves them to the SLAC file used by TechPub |
Rebuild at SLAC - Inst Repository |
Yes |
S2 |
Check old temps |
Check old temporary entries from core archives |
Workflow |
S5 |
PBN from arXiv |
Add PBN from arXiv |
Build |
S6 |
Missing topcited papers |
Missing HEP papers with near or more than 100 citations |
Build |
S7 |
Physrev |
Employs a protocol in subfile PHYSREV |
Build (Holding pen) |
S8 |
Spellcheck for CONF file |
Run CONF.SPELLCHECK on the CONF file |
CernBibCheck?? |
S9 |
Add pbn to CONF file |
Add published proceedings info to CONF file |
Non migrate yet |
S10 |
Physrev Cites |
Adding the cites from Physrev papers |
Build (holding pen) |
S11 |
SPIRES email |
User Email |
Not need to migrate, need editor... |
S12 |
URL for SciDir, journals |
Find papers without URL, add the URL |
GoDirect/DOI |
S13 |
URL for other journals |
Find papers without URL, add the URL |
GoDirect/DOI |
S14 |
URL for Phys.Rev. papers |
Find papers without URL, add the URL |
GoDirect/DOI |
S15 |
Add to HEP |
Add documents to HEP from batch input file, protocol checks for duplicates |
Build (holding pen) |
S16 |
Add HEP meeting note |
Add MN to records with CNUM, but no MN |
CernBibCheck |
S17 |
PPF title spellcheck |
Spellchecking PPF titles |
CernBibCheck?? |
S18 |
Check new PBN |
Catch suspicious PBN, espcially those with two or more journal references |
CernBibCheck |
S19 |
Fix NONE PBN |
Fix incorrectly coded journal entries (PBN,DPBN) |
CernBibCheck |
S20 |
Fix bad dates |
Updated preprints with bad or no date |
CernBibCheck |
S21 |
Fix PBN bad page range |
Fix PBN with bad page range |
CernBibCheck |
S22 |
Correct wrong affiliations |
Get a list of incorrect AFF and correct them |
CernBibCheck (With inst auth. file) |
S23 |
URL problems |
HEP URLs with typo in URL field |
Link checker? |
S27 |
SLAC TechPub |
Updates SLAC queue status that describes actions needed |
SLAC specific soution, must be built (inst repository harvest...) |
S28 |
SLAC-Coding |
SLAC items needing coding |
Slac Specific |
S29 |
Fix HEP duplicates |
Check duplicate report-num or spicite |
Build |
S31 |
Audit PPF papers |
Audit randomly chosen papers |
Workflow |
S33 |
Check new authors |
Check authors in the newauth file, put there by inputters |
Build |
S34 |
Check exp element |
Check records with cn, but no exp, add exp |
Build |
S35 |
Giva |
Retrieve lists of authors from PDF or copy from another report |
(See CERN C1, C2) Build |
S36 |
Add field-code |
Check for missing field-code and adding it to HEP |
Build |
S37 |
PPF input duplicates |
Duplicates discovered during PPF input |
Build/Workflow |
S38 |
Title changed in arXiv |
Puts arXiv title as main title, purs old HEP title into OLD-TITLE |
Build/ Holding pen |
S39 |
Duplicate Texkeys |
Removes duplicate texkeys |
CernBibCheck?? Not sure |
yes |
S40 |
Duplicate Cites |
Removes duplicate Citations to journal and eprint |
CernBibCheck |
yes |
S41 |
SCL |
Fix missing and incorrect SCL |
CernBibCheck |
yes |
S42 |
TranslateSCLFCTC |
Translate SCLs to FC and TC as the come through |
CernBibCheck |
yes |
S43 |
Inst Affiliation Change |
Change affiliations from inst changes submitted on web |
build fixed with ids...? |
yes |
S44 |
Process |
Processes all files |
not needed (bibindex) |
yes |
S45 |
Topcite |
Check for new topcited papers, only promotions, short job |
build? |
yes |
S46 |
Post updates to afs |
Old routine to post updates for mirrors (deprecated) |
not needed |
yes |
S47 |
Removes |
Populate removes db before processing |
not needed |
yes |
S48 |
Backups |
Make backup copies on sunspi5, where a separate job archives to TSM |
not needed |
yes |
S49 |
AddCites |
Add Cites from citesj to the database (expand to cites2?) |
build holding pen |
S50 |
Full Topcite |
Check for new topcited papers, incl. brand new, long job |
build (same as 45- may be easier in invenio) |
yes |
S51 |
SPIJOBS daily web listing |
Daily listing of HEP labs jobs |
not yet |
yes |
S52 |
Prepare CERN table |
Lists the PPF report numbers and the related IRN |
??? |
yes |
S54 |
Weeding FNAL and DESY hardcopy-at notes |
Strips hardcopy-at notes when url, pbn, etc appear |
formatting?? |
yes |
S55 |
DK related to SLAC |
Prepares a list of papers with DK related to SLAC without SLAC coding, mails to Ann and Travis |
not needed |
yes |
S56 |
Prepares institutions list for the web |
Prepares the list of HEP institutions for Web listing |
not yet |
yes |
S58 |
Experiments list for the web |
Prepares experiments list for the Web |
not yet |
yes |
S59 |
Ad Theses |
Add Theses to HEP from UMI |
build holding pen |
S?? |
conferences |
..conf (only uses 13 and 5) addition of conferences |
no migration?? |
S?? |
..hep.correct.citation |
Makes certain mass citation changes... |
??? |
ID |
script |
description |
to do |
comments |
D1 |
springer.py, iop.py,many perl programs |
download journal toc or take publisher feeds as input, create hep records (desy format), checks for duplicates, store abstract and references |
add write module for inspire format, into holding pen |
no urgent need to convert |
D2 |
aipaut.pl |
extracts ASTR from AIP abstracts |
|
not part of aip.pl because of access restrictions |
D3 |
checker.pl |
checks hep records (desy format) against authority files for author names, affiliations, rep nrs, keywords, journal titles, field and type codes, pacs, exp nr; expands keyword abbreviations |
add kw check to CernBibCheck, immediate exp of kw abbrev |
|
D4 |
getps (perl) |
download a single arxiv paper, convert to ps, add barcode, print |
|
|
D5 |
addbarcode (perl) |
inserts barcodes (arxiv nr) on first page of arxiv ps files |
|
useful when adding keywords to hep records |
D6 |
eprepget (perl) |
calls getps for daily eprints of selected archives |
|
printed copies for keyworders |
D7 |
cronret.pl |
retrieves most recent papers from hep for authors of last week's eprints |
|
help for keyworders |
D8 |
inspacs.pl |
replaces pacs nrs by verbal descriptions |
|
help for keyworders |
D9 |
dowlist (perl) |
creates html file for weekly list of accessions (preprints, articles, books), sorted according to type and field codes |
|
|
D10 |
inpuspi.pl |
converts desy to spires format including some syntax checks |
|
obsolete |
D11 |
akwli (zsh) |
creates and prints automatic keywords for eprints of last week |
needs tag |
suggestions for keyworders |
D12 |
akwins.pl |
inserts automatic keywords (from title, abstract, author keywords) into file of journal records (desy format) |
|
only very quick human check |
D13 |
jnlcheck.pl |
finds new issues on publishers' web site |
|
to be replaced more and more by publisher feeds |
D14 |
doipdf.pl |
extracts doi's from file with journal records, retrieves pdf url from abstract file, downloads and prints pdf |
store pdf url |
for keywording |
D15 |
doiref.pl |
extracts doi from jnl records, checks first for existing reference file in refdir,then looks for pdf file in pdfdir, then looks for pdf url in abstract file to download pdf file and extract references from pdf, converts to spires format (raffle) |
|
|
D16 |
doira2spi.pl |
extracts doi's from jnl records, concatenates corresp abs and refs files as input for spires |
|
obsolete |
D17 |
|
select papers from toc, email lists..., tag (hep relevant, to be keyworded ...) |
holding pen |
flush unselected rest |
ID |
Name |
Description |
To do |
Comments |
C1 |
agiv500 |
Gives the list of authors from a paper in PostScript format, extract the author pages, gives a file of authors and affiliations in format Aleph500 |
Improve extraction of affiliations Accents are deleted, to be improved |
The result file has to be edited with sysno of the Aleph record and to be check for errors. Affiliations have to be cleaned because not well extracted (done only for CERN papers) |
C2 |
agiv500pdf |
Gives the list of authors from a paper in PDF format, extract the author pages, gives a file of authors and affiliations in format Aleph500 |
Improve extraction of affiliations Accents are deleted, to be improved |
Same as above |
C3 |
Aleph authority |
Authority data base to standardise and add accents on authors |
|
Previously existed also for periodic titles but replaced by a knowledge file used by uploader tool and several other scripts. |
C4 |
upload22.x |
Uploader tool is a system (semi-automatic) that performs the transformation of bibliographic records from different sources into the structure supported by the local database. The process of upload involves the following steps: download of bibliographic data from the source extraction and transformation of records downloaded from the source into the local structure matching of records with the current contents of CDS. Every source has to be configured (actually 190 configurations for 190 different sources), lots of knowledge files are integrated in to format the data http://doc.cern.ch/uploader/KB/ . These configurations are written and maintained by the library staff according to their needs. |
Improve the matching, problems with matching title + first author. No problem when the matching with ArXiv number can be done. |
4 types of result files after matching: a source.correct file to update an existing record and correct existing fields ; an source.append file to update an existing record and add new fields ; a source.new file to upload new records; a source.nc file to be check manually because the result of the matching was ambiguous. This tool is the first version of biconvert but is always used by the library staff. Uses many KBs. |
C5 |
bibconvert |
Used for converting arXiv XML metadata to CDS MARCXML format |
|
to be improved by using journal title knowledge file to clean publication references |
C6 |
Electronic submissions |
Used by secretaries or authors to submit CERN divisional papers, CERN theses, CERN Internal notes, scientific committee papers, conference announcements.......... |
|
Submission templates are doing some metadata enrichment during author submission. |
C7 |
check_format500 |
Checking errors of format for a data file before upload in Aleph |
|
|
C8 |
chkenc |
Checking errors of accents (alert if not in UTF8) for a data file before upload in Aleph |
|
|
C9 |
reportcheck.py |
This program prints a list of errors or missing or duplicate report numbers found in the input file (containing a list of report numbers), that start with the given department pattern |
|
done by Kyriakos; checks numbers of a series |
C10 |
dptmtsReportCheck.py |
This program takes a list of department patterns (like CERN-TH-) and for every pattern it creates a file after the departments name containing all errors that are associated with the specific department. This program uses 'reportcheck.py' program to find the errors |
|
idem |
C11 |
fft_aps.py |
Downloading fulltext from an input file of APS URLs corresponding to CERN affiliated papers. |
|
|
C12 |
fft_jhep.py |
Downloading fulltext from an input file of JHEP URLs corresponding to Open Access papers. |
|
|
C13 |
ALEPH Library Staff Menu |
Aleph utils used for extracting data from the database. These data are then used for global changes, add new fields or correct existing fields according very specific searches in CDS or in Aleph |
|
ALEPH used mostly for multi-record editing. Otherwise CDS queries (see item C14) are mostly used instead of ALEPH's iutils. |
C14 |
CDS query language |
Lots of searches (saved as Favorites) are made into CDS to detect cataloguing incompatibilities and checking : Example: checking if all papers from a special issue of JHEP have been entered in the database |
|
using search engine web api |
C15 |
CernBibCheck, Config AU |
To delete the special signs in the authors |
|
Example: Cart, &C corrected by Cart, C |
C16 |
CernBibCheck, ConfigRNba14et21pr260 |
To clean the tag 088, and the tag 260 with the tag 088, for the BAS 14 (theses) and the BAS 21 (books) |
|
General CernBibCheck comment: more multi-field condition checks wanted. |
C17 |
CernBibCheck, ConfigRNba11-16 |
To clean the tag 088, and the tag 269, with the tag 088, for the BAS 11 (preprints) and the BAS 16 (scientific committee) |
|
|
C18 |
CernBibCheck, ConfigBA13-773et260 |
To update the tag 260c (year) with the year of publication (tag 773y) |
|
|
C19 |
CernBibCheck, ConfigLatexCERN |
To detect errors of latex in the title (like $ impairs) in the tags 245 & 246 |
|
|
C20 |
CernBibCheck, Configsujetsxx |
To correct the XX subjects with the journal titles from the knowledge `773p---65017a' |
|
|
C21 |
CernBibCheck, Configlkravider |
To detect all the notices which have a tag LKR empty (who make noise with nchkall) |
|
|
C22 |
CernBibCheck, Config8564uniqETmanquants |
To obtain a list of notices with not uniq url or with missing url |
|
|
C23 |
script3digitrncern (on top of CernBibCheck) |
To format automatically with 3 digits a serie of report numbers of CERN documents, to obtain a complete order on the web (example: EP-1981-59 ( EP-1981-059) |
|
|
C24 |
CernBibCheck, ConfigDoublonscernep |
To obtain a list of similar notice with the same tags: 037, 088, 100, 245, 246 |
|
|
C25 |
CernBibCheck, ConfigBA11 |
To clean all the tags of the preprints |
|
|
C26 |
CernBibCheck, ConfigBA13 |
To clean all the tags of the articles |
|
|
C27 |
CernBibCheck, ConfigBA14 |
To clean all the tags of the theses |
|
|
C28 |
CernBibCheck, ConfigBA16SCICOM |
To clean all the tags the scientific committee papers |
|
|
C29 |
CernBibCheck, 48 different knowledges |
For different configurations, just in the KB/PREPRINTS directory and a lot of others elsewhere |
|
wanted mgmt tool with links between various KBs (one field appears in many) |
C30 |
Autocheck |
With an excel file of around 100 formulas (that I defined, I can also add news items or delete others). This program checks the formulas on CDS and imports automatically the results, files of errors, on my directory |
|
|
C31 |
cleantitle |
Nchkall doesn't run with the big knowledge of the titles of journals. So, this script checks the titles of journals of a file with the knowledge to update them. Example: JHEP---J. High Energy Phys. |
|
|
C32 |
Xenu |
This program detects the errors of links in the url from a file of URLs extracted from the database. It checks if the file exists and if it is not empty (0 byte). But it is very difficult to prepare the file of url : we have to format all the different series of url to be recognised by xenu, because of the set link, and we have a lot of different forms of url, different sorts of barcodes,... |
|
Link checker with manual workaround around setlink URLs. Invenio's fulltext indexing and fulltext document checking tools could be used instead. |
C33 |
Doublon |
This program detects the double notices (similar tags: titles, authors, abstracts,...) |
|
Common Lisp based, just like CernBibCheck. To be integrated into Invenio. |
C34 |
Correct_journals.py |
This script detects the errors in a knowledge: 2 different good forms for a same title of journal |
|
Small script to assure KB is okay. To be integrated to Invenio. |
C35 |
findsimilar.py |
This script detects similar good forms in the knowledge of journals |
|
Small script to assure KB is okay. To be integrated to Invenio. |