Converting Insts to Inspire
Introduction
SPIRES has a database, institutions, that contains the allowed entities that can be used as affiliations on author lists. This db also contains information about each inst., included address, director, etc, etc. At the very least the authority aspect of this db must be ported over so that input can work correctly. At best, the entire DB can be moved and maintained in inspire (but it is my contention that this should come later...) This db also contains that lookups used commonly by DESy and also by authors themselves to refer to insts, so it serves as a knowledge base for the process of determining affiliations, and a nicely maintainable knowledge base at that.
Purpose
Some possible uses of inst:
- authority for affiliations in hep
- affiliation history on author page
- finer granularity gives more meaningful profile
- authoritative list of HEP institutes
- cern list not well maintained, probably to be replaced by Inspire inst
- granularity on institutional level at least for HEP inst
- mailing lists
- recent example: LHC bible
- conference poster etc
- institutional level advisable since mail to general university address might get dumped
- publication list / annual report for core HEP institutions
- institutional metrics
- SCOAP3 accounting
Requirements:
- granularity
- departmental level at least for HEP institutes, preferably as well for closely related disciplines (astrophysics, nuclear physics...) to facilitate exchange
- core tag
- field code
- hierarchical structure
- country -> univ -> department
- history
- predecessor, successor (e.g. for mergers)
--
AnnetteHoltkamp - 05-Dec-2010
First Steps
In order to faciliatate the creation of author. list inputting tools, Marko and I are working on porting the db to a xmlMARC format like that used here
http://cdsweb.cern.ch/collection/HEP%20Institutes
I have created a SPIRES format (xmlinspire) for insts that spits out basic XML. I will probably need to modify this to unravel complicated lookups like country name (we use a indirected country code which then looksup a name, so that names of countries can change easily (political reality...) However for the moment I will simply take this xml output at face value, more should eventually be done to get these subtleties... See field mapping below for blow-by blow account fo the mapping
Dont forget to strip a coupld of latin1 encoded chars...:
iconv -fLatin1 -tUTF8 as in the HEP case (using a linux box..)
XSL
Created XSL to do the mapping and created /inspire/inst directory in cvs to contain this work. That has a make file similar to the one in bibconvert and the xml file of the inst. dump is in
a similar ftp location. Travis tested the conversion, though he has not yet gotten the upload to work and create a new collection...
Field Mapping (Travis' suggestion)
There are far less elements here than in HEP, so this is relatively straightforward, just using a few examples from the cds site, I've guessed the following mapping:
SPIRES Name |
MARC field |
notes |
INST |
970b |
Our key, used in HEP records. Could go in 001 possibly. Made up the "b" subfield |
IC |
--- |
Unknown use at the moment. |
IMC |
270z |
Mailing /postal code Chose "z" subfield at random for "zip". LOC: 270e |
country.code |
270c |
Our code for the country name, I can replace with name from lookup, but better to do this indirectly... |
inst.catch.name |
110a |
Our std name for the inst |
address |
270a |
our free form address, includes free form name (multiply occuring) |
State.Code |
270s |
our state /province code. LOC: 270e |
Report.code |
---- |
prefix on report numbers???? |
DEPartment |
--- |
would be nice in 110b, but we didn't really use this... |
city |
270b |
city |
desy.aff |
595a |
DESY name for this aff 9=DESY |
type |
980a |
we tag various insts of note... just guessing where this goes... |
director |
270p |
I think you have contact, we have director.... |
director-note |
270n |
director-date |
270d |
xtra-indexi |
595b |
These are xtra words that should help find the inst, but often aren't in the record. I.e. common searches |
desylookup |
595a |
Ditto above, these are other ways of writing the name, useful for lookups |
oaff |
595a |
ditto above, not sure what the difference is here |
phone.number |
270l |
phone. LOC: 270k. 270l for fax. still needed? |
email.contact |
270m |
email (of contact/or director...) |
date-updated |
961c |
date we last touched it... |
date-added |
961x |
added date |
note1 |
500a |
random extra information, human readable... |
district |
??? |
lat long corrdinates used to create maps |
url |
856u |
url for entity (usually inst, not dept) |
time-zone-info |
??? |
Time zone relative to UTC |
street-address |
270a |
street address |
Field Mapping (final)
based on CDS usage and
LoC marc doc
marc field |
subfield |
content |
spires field |
Note |
110 |
a |
Institution |
part of address |
corporate name in native language, well-known acronym in brackets behind |
|
b |
Department |
part of address |
subordinate unit in native language, well-known acronym in brackets behind |
|
t |
newICN |
|
HEP affil following new standards |
|
u |
inst.catch.name |
ICN |
HEP affiliation (spires name) |
|
x |
|
obsolete ICN |
ICN of obsolete inst for which this inst should be used instead |
371 |
a |
address |
part of address |
street etc, city with postal code + additions (native language) |
|
b |
city |
part of address |
in English |
|
c |
state or province |
part of address |
|
|
d |
country |
country.name |
in English |
|
e |
postal code |
part of address |
bare form |
|
g |
country code |
|
x |
|
|
"secondary" actually was used only once! for Castel Gandolfo http://inspirehep.net/record/905456 |
034 |
d |
longitude |
district |
|
|
f |
latitude |
district |
|
|
2 |
source |
e.g. bibcheck geocode including version |
|
q |
type |
match type/quality |
035 |
9 |
external identifier schema |
"HAL" or "GRID" |
|
a |
external identifier |
e.g. "grid.12345.1" |
043 |
t |
time zone |
|
|
372 |
a |
field of activity |
|
University, Research center, Company |
410 |
a |
name variants |
DLU, DESYAFF |
+ standard acronyms |
|
9 |
source of name variant |
|
desy... |
410 |
g |
xtra words |
xtra-index |
to help find insts |
510 |
a |
name of related inst |
ICN 110__u |
|
|
w |
type of relation |
|
a predecessor, b successor, t parent inst, r otherwise related |
|
0 |
record nr of related inst |
|
|
|
i |
specify relation if $$w"r" |
595 |
a |
hidden note |
|
|
65017 |
a |
content classification |
|
same as HEP categories, FC |
667 |
a |
nonpublic note |
note1 |
|
6781 |
a |
historical data |
|
administrative history |
680 |
i |
public note |
|
|
8564 |
u |
url |
|
inst website |
961 |
c |
|
date-updated |
|
961 |
x |
|
date-added |
|
980 |
a |
tags |
type |
various tags (which are still useful?) |
980 |
a |
CORE |
PPF/NON-PPF |
should move to 690C, don't use NONCORE (default) |
|
b |
DEAD |
|
Questions
- Do we want to maintain director information?
- Do we need report.code (report nr prefix)?
- Do we still need phone/fax? needs maintenance
- Where to store historical ICN's? Also in 595 with $9 Inspire?
Elements Not mapped
Several elements above were not mapped because I wasn't sure what to do with them, but they seemed important (except where noted). Below are ones that might be useful, but probably aren't...
Opt Sing String 00/09 REMOVAL.DATE, RD
Opt Sing Hex 00/0A LIST.CODE, LC, NOT-USED
Opt Sing String 00/0C INSTCODE, ICODE
Opt Mult String 00/12 ACCELERATOR, ACC
Opt Mult Hex 00/13 DESY-DATEUPD, ACC-NOTE, AN, DDATE
Opt Mult String 00/14 DESY-ACCTUPD, ACC-DATE, AD, DACCT|
Opt Mult String 00/18 PO.BOX, PHONE-NOTE, PNOTE
Opt Mult String 00/1A FTS-NUMBER, FTSN
Opt Mult String 00/1B TELEX, TX
Opt Mult String 00/1C CABLE, CA, EXP, EXPERIMENT.CODE
Opt Mult Struc 00/1D COMPUTER-STR
Req Sing String 01/00 key . COMPUTER-NETWORK, NET, NETWORK
Opt Mult Struc 01/01 . NODE-STR
Req Sing String 02/00 key . . NODE-ADDRESS, NODE, NODE-ID
Opt Mult String 02/01 . . COMPUTER, COMP
Opt Mult String 02/02 . . DEPARTMENT, DEPT
Opt Mult String 02/03 . . MAIL-CONTACT, MCON
Opt Mult String 00/1F TIME-ZONE-INFO, TZ
Opt Mult Struc 00/20 TELECOPIER-STR, TELESTR
Req Sing String 03/00 key . TELECOPIER-NUM, FAX, TCOP
Opt Sing String 03/01 . TELECOPIER-NOTE, TNOT
Opt Mult Hex 00/22 BITNET.NODE, BN, TIME-UPDATED, TU
Opt Mult String 00/24 ACCTADD, AA
Opt Mult String 00/25 ACCTUPD, AUP
Opt Mult String 00/26 DAFF.UPLOW, DAFFU
Opt Mult String 00/27 CERN.AFF, CAFF
--
TravisBrooks - 24 Nov 2007