Using MARC on inspire?

A discussion was started by Travis on the use of MARC in Inspire

At SPIRES we've always rejected MARC as being unwieldy, inflexible, or both, depending on what one is trying to do.  We are certainly not alone in this.  There are 2 places where I would like more information about the utility and neccessity of MARC in invenio.

1) MARC as the internal structure.

Using MARC as an internal record structure surely lessens flexibility (if it doesn't then why bother with it?), and what does it gain that couldn't be had with user-defined fields combined with input and output formats for MARCXML?

2) MARC as the admin interface.

Editing, formatting, and analysis tools should get away from using obscure notation like 100u etc., right?  I notice that the bibEdit module allows a verbose view that provides some mappings to logical names, and similarly bibFormat, however these are very incomplete in my instance. How does one build these mappings, and where are they used?  What about the APIs?  Do they allow using these mappings so that developers don't need the lookup table, or do Invenio developers get used to it just like SPIRES developers get used to our idiosyncrasies...?

One reason SPIRES has tremendous inertia is the amount of "holdover" items from days gone by that weren't eliminated when they outgrew their utility.  Reliance on MARC as an internal format smells a bit like one of those within Invenio. If we go to the trouble to moving off of SPIRES, I'd like to have as few of those holdovers as possible in a new system, otherwise we haven't gained much.

The floor is open for discussions

-- SalvatoreMele - 07 Jun 2007



Add your comments at the bottom of the page...

-- SalvatoreMele - 07 Jun 2007


On Wed, 06 Jun 2007, Brooks, Travis C. wrote:

> To what extent is the use of MARC neccessary in invenio?

It is the main metadata representation in Invenio. One can work in another format for various tasks, but at the end a good mapping to MARC is required unless we modify Invenio internals considerably.

> At SPIRES we've always rejected MARC as being unwieldy, inflexible,
> or both, depending on what one is trying to do. We are certainly
> not alone in this.

Can you provide an example of your worries? MARC has always been flexible for us to store any kind of document types we needed, from preprints through multimedia material to institute addresses or exhibition objects.

> 1) MARC as the internal structure.
>
> Using MARC as an internal record structure surely lessens
> flexibility (if it doesn't then why bother with it?), and what does
> it gain that couldn't be had with user-defined fields combined with
> input and output formats for MARCXML?

MARC is a well established standard in the library world with a semantics/syntax that permits one to code any new metadata information without having to extend the standard. If you use naturally-looking user-defined field schema such as "Doe, JohnOn the foo and bar", then you are more probable to twist it along the way in order to store new kinds of information.

In spite of its age, I don't think MARC is much of a burden, since we work with a "modern" MARCXML representation of the standard, so we can use "modern" XML tools to manipulate it, etc.

> 2) MARC as the admin interface.
>
> Editing, formatting, and analysis tools should get away from using
> obscure notation like 100u etc., right?

Yes, for editing and other tasks, people can work with more sensible notation, hiding MARC away to a large extent.

> I notice that the bibEdit module allows a verbose view that provides
> some mappings to logical names, and similarly bibFormat, however
> these are very incomplete in my instance. How does one build these
> mappings, and where are they used?

They are taken from the "tag" table. The demo tag names listed there are not very complete indeed.

> What about the APIs? Do they allow using these mappings so that
> developers don't need the lookup table, or do Invenio developers get
> used to it just like SPIRES developers get used to our
> idiosyncrasies...?

In a sense, the two approaches were experimentally observed: here at CERN we got used to use MARC directly, while at EPFL the developers preferred for many tasks to use "pybliographer" that reads MARC into its own internal format that is then transformed according to needs.

(As for the APIs, we should probably discuss them separately in depth, depending on the kind of APIs you are most interested in.)

> One reason SPIRES has tremendous inertia is the amount of "holdover"
> items from days gone by that weren't eliminated when they outgrew
> their utility. Reliance on MARC as an internal format smells a bit
> like one of those within Invenio.

I understand your questioning, but I don't think MARC outgrew its utility already. It is still the standard the most widely used in the library world. And, technically speaking, it is not too bad to work with thanks to MARCXML.

To quote Wikipedia on the similar questioning:

"The future of the MARC formats is a matter of some debate in the worldwide library science community. On the one hand, the formats are quite complex and are based on outdated technology. On the other, there is no alternative bibliographic format with an equivalent degree of granularity. The huge user base, billions of records in tens of thousands of individual libraries (including over 50,000 belonging to the OCLC consortium alone), also creates inertia." http://en.wikipedia.org/wiki/MARC_standards

-- TiborSimko - 07 Jun 2007


One question I still have is the following:

Yes, MARC is the standard in the library community, though there are plenty who feel it is outdated. However, using a standard is not always the best choice for the fundamental design of the system.

Standards necessarily inhibit flexibility. If they do not inhibit flexibility, then they don't do anything. Right? HTTP, to be basic, makes you trade stateful communication for a simple standard. SQL makes you adopt a certain language and methodology. So what is the fundamental tradeoff involved in MARC? There must be something you can't do in MARC that you could do if you designed your own data model.

Usually, as with HTTP, or SQL, the tradoff is a good one, and clearly worth making, but it is nice to know what you are giving up.

My worry is that MARC is not quite aligned with what we are doing (fulltext, citations, much associated metadata that is field specific) and that MARC's tradeoffs, while potentially useful for managing a collection of bibliographic objects, are not optimal for our usage.

Mind you, just because MARC isn't optimal, doesn't mean we can't use Invenio. Either we can eliminate MARC from Invenio, or simply willingly accept MARC's shortcomings because of other advantages.

-- Main.tbrooks - 12 Jun 2007


> My worry is that MARC is not quite aligned with what we are
> doing (fulltext, citations, much associated metadata that is
> field specific) and that MARC's tradeoffs, while potentially
> useful for managing a collection of bibliographic objects, are
> not optimal for our usage.

The citations and the information about fulltext can be nicely stored in MARC. An example of storing citations could be this hep-th/0003295 demo site record, see its 999 tags. An example of storing fulltext metadata information could be this CERN-GE-0706020 photo, see all the various 856 photo resolutions.

As discussed during the phone conference, we do not use MARC to store the information about the use of the record, such as the number of downloads etc. This is stored outside of MARC in SQL tables. We use MARC solely to store information about the metadata. Anyhow, if the usage of MARC seems unsatisfactory even for handling metadata, we can think of a nice abstraction layer on top of the current Invenio data model.

But let us see what concrete issues with MARC the ComparisonSlacCernRecordMarkup thread may bring.

P.S. I have moved your 12-Jun-2007 contribution on this page in order to have a chronological order of contributions.

-- TiborSimko - 22 Jun 2007

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2008-02-05 - TiborSimko
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Inspire All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback