Gone Flat Land: Why XML Seems Promising

A nerdy post on library science and the future of library cataloging.
image credit: "Tempus Fugit" by abbeyprivate
This essay was written as a requirement for an introduction to cataloging course. An entry like this is not typical stones of erasmus fare, but I post it for all my library and cataloging buddies out there. I warn you, though, I made a C+ in Cataloging. I took the course as an online component. While I like the Reference and Information Services course I took online (which garnered me an A+) I found the Cataloging course online more challenging. My satisfactory grade is most likely attributable to my difficulty keeping up with deadlines, but I also found the assignments hard to conceptualize. Most catalogers use a cataloging application (e.g., Connexion) on a PC to create MARC records or to copy catalog. But, for this class, we had to use a generic MS word document to fill in the fields which I found to be terribly awkward. So, a word from the experienced: if you take an online class in cataloging make sure you have access to a good MARC program.
Anyway, here is my report on XML from a C+ point of view. Enjoy:
MARC has been the standard bibliographic protocol for library data management since the 1960s. The protocol was initiated by the Library of Congress and served as a bridge between the physical card catalog and an electronic version. Early on in library science automation, developers attempted to duplicate the format and operability of the physical card catalog into a digital clone. While this may seem to ostensibly work, the flexibility of the web, the interoperability of displayed knowledge, the emergence of meta language, mark up language and graphic user interfaces has all but made the MARC format seem like a DOS prompt compared to a Windows 7 interface. MARC is not Web 2.0 I am sure we can all agree. But what is its alternative? XML (eXtendable Markup Language) seems to be an apt successor to the old style automation.

MARC is a bridge technology. Catalogers who use MARC are stuck at the level of a floppy disk when we should be in the Google cloud. Now, I say “should” sparingly. Partly capitalism is to blame. Libraries are meant to exchange freely ideas. Corporations are made to make money. The original beauty of the MARC record (which stands for MAchine-Readable Cataloging) is its virtual compatibility. The non-proprietary structure of MARC format allows for programmers on any platform to design a GUI interface to manage, display and to manipulate the records. MARC was made in an optimistic era where computers were seen as potential portals into a new utopia.

When MARC was developed it was a huge data management victory, since the Library of Congress demanded an automation system to control its huge bibliographic data, an alternative to paper records was needed. (1) Early computer programmers sought a file type that could hold minimal amounts of data in a world where kilobyte storage was expensive and the data was immense. Fifty years later most libraries are still using the MARC format to display and manage bibliographic records. Today, programmers do not worry about file size — in a world where Google gives users two gigabytes of storage free in their email accounts — so the rudimentary, rigid structure of MARC seems obsolete where every nanobyte is tabulated ad nauseam.

For example, in MARC, the end user must indicate in the record how many spaces must be left in order to eliminate a computer’s ability to search for “the.” The end-user must set an indicator so the end application will ignore the “the” in a title such as “The Sailor Moon.” This kind of tedious data management, while once serving a purpose, can be achieved by much simpler means. MARC still retains vestiges of its paper cousin, the card catalog. Fiander recommends MARC toss out, for example, the notion of the “main entry” (25). MARC has been outmoded by formats that focus on the display of textual information rather than on a tripartite structural model. No matter the access point retrieved by a data retrieval system, there is no need to have a field that is essentially “main” (25). In MARC, data must be placed in fields and into subfields if it is to be displayed correctly on an OPAC. Although MARC is a complex structural descriptor, its overall granularity — or its ability to parse information over schematic fields is repetitive and overlapping (Tennant).

Over the years, new fields have been created to accommodate for new pathways in the access to knowledge. For example, with MARC21 catalog librarians can add access points to URL (Uniform Resource Locator) within the MARC record by adding 856 fields to the record file. MARC’s subsequent renovations have allowed catalogers to help patrons locate information more effectively. The renovations of MARC seem to foreshadow the beauty of XML. I will explain it by quoting someone else: “XML enables you to identify chunks, such as biographies within a larger book of history or place descriptions from a novel or a travel guide” (Shatzkin). MARC CAN do this too. But, it is more tedious. It is the inherent un-inflexibility of MARC that tarnishes, not its sublime adherence to cataloging standards.

The beauty of MARC is its marriage to the Anglo-American Cataloging Rules and its strict adherence to bibliographic rules and regulations. The benefit of MARC is for the most part, it works. The question is not to eliminate MARC’s inherent structural integrity, but to eliminate some of the dinosaur features that make MARC implausible as a structural design meant to survive the exponential leaps in document description. For example, as has been noted, MARC includes redundancy of information (seven or so fields where a personal name can be inserted) which could be reduced to one field, but because the MARC format, still tied to physical records, rests on the assumption that information is to be found on cards, not on a personal computer accessing a public OPAC.

Since MARC was created, data representation across platforms have become a necessity with the advent of the World Wide Web. The Web introduced us to HTML (Hyper Text Mark Up Language) which is essentially a language by which the end application can render textual and graphical information regardless of the platform. HTML instructs how information is displayed by the use of tags. These information formats are hierarchical and satisfy the needs of the display. If a records needs an italicized element, the librarian inserts an tag . With XML (eXtendable Markup Language), a cataloger does a similar process, but instead of using tags to display information, the tags classify information. For example, in MARC I have to insert information into the proper field. With XML, a cataloger designates a surrogate as a title, for example (Chan 460).

MARC is a flat structure and not conducive to flexible inclusions of information (Tennant). MARC’s strength is also its weakness. With MARC we are like the flatlanders happy in our two-dimensional universe. With XML we can step out and be transformational. XML is not married to its own protocol. MARC’s adherence to its own rigidity needs to be scrapped but its allegiance to authority control needs to be retained (roughly the idea of Fiander).

Its rigid structure, conforming so closely to the Anglo-American Cataloging rules, is an asset because bibliographic descriptors need strong bibliographic control — no one denies that authority control is important — but the cost and the time to retro-catalog thousands of records is a huge endeavor that won’t be transformed overnight (25-26).

While much work has been devoted to updating cataloging rules, etc., not much as has been done to revolutionizing “data description formats” like XML (Fianlander 17). A problem with MARC is its archaic flat structure. If a person learns MARC, the structure of the format is not compatible to other information designs. With XML, the format lends itself to other data hierarchies, like HTML, etc. The benefit of learning XML builds a bridge to understand other informational structures that have the potential to build on one another.
An XML document is very similar to an HTML document:
XML is similar to MARC in that it is a descriptor structure for information, but it did not come out of the arena of Library Science. XML is unique in that it allows a document to contain information about the document within the document itself in the same way that MARCs directory is enclosed within the directory itself (Chan 459). XML sprang from web development in the early nineties. In XML, the inclusion of a cover graphic for a book is easier rendered as well as a Table of Contents. RSS feeds can be set up to allow users to see popularly accessed table of contents or feeds can display information topical to current events (see Sutherland). XML is like the old display table in libraries; librarians can now create information displays on current events without having to rove the shelves looking for interesting looking books. The possibilities are endless; the problem, of course, is money.

Its designation as a meta-markup language allows it to flexibly insert information into a surrogate, enriching content, while at the same time not confined to a flat rigidity (Chan 460). Insertion of data points should not be so arcane and tedious that one has to hire a tax attorney to translate terms and insertion points. I know the latter from experience: to put a Table of Contents into a MARC record is like translating the hieroglyphic script. A Table of Contents in MARC is a series of fixed fields and subfields, that in the end, make searching for the right information, in XML, “it’s a breeze” (Tennant). With XML the cataloger has the flexibility to create records with appropriate labeling tags to fit the needs of the library, the museum, or whichever body is cataloging the record.

One of the challenges confronting most libraries seeking to implement a categorical changeover from MARC to XML based records is cost: the cost of retroconversion, the cost of training librarians for the new file format. I think the greatest motivation to suck up the cost is interoperability. With XML, records are so much more flexibly managed, taking data management away from a “physical” record. With XML, it is easier to “catalog” information in a world that is heading away from physical placement on a shelf, to information in a cloud. There needs to be a paradigm shift from the catalog helping the patron to LOCATING information to helping the patron ACCESS information. With XML., for example, there is a greater potential to aggregate large sums of data and to classify it within discernible categories based on an application written for that purpose. For example, with XML, I can write an application that displays the book jackets of all books with links to Amazon. While, in a way, this is theoretically doable with MARC, I do not even want to contemplate how this would be done.

With an XML document, there is more room for access rather than location. With XML what needs to happen, instead of hanging onto arcane structures, is to somehow find a way to marry the ingenuity of the computer program with the prodigy of the library cataloger. Perhaps the issue is not retroconversion or the problems of a learning curve, but perhaps a simple gay between programmers and librarians. What needs to be dismantled is a stereotype that librarians merely shelve books. Also, we need to eliminate librarians who are not willing to embrace information technology. Librarianship is not as quick as capitalism; perhaps that is a good thing, but I think after forty years, mark up language is the way to go.
(1) The Library of Congress still maintains its “Main Card Catalog” accessible to researchers who are looking for items published before 1980 not cataloged in the library’s OPAC.
Chan, L. (1994). Cataloging and classification: an introduction. (New York: McGraw-Hill, 1994). 403-412.

Fianders, D. (2001). Applying XML to the bibliographic description. Cataloging & Classification, 33(2), 17-28.   (Flanders, 2001)

Miller, D. XML and MARC: a choice or a replacement presented at the ALA joint MARBI/CC:DA Meeting, Chicago, 2000. http://xmlmarc.stanford.edu/ALA_2000.htm

Shatzkin, M. (2008). What the Hell Is XML?. Publishers Weekly, 255(50), 29-30. Retrieved November 27, 2009, from Library Lit & Inf Full-Text database.

Sutherland, M., & Clark, J. (2009). Virtual Journal Room: MSU Libraries Table of Contents Service. Computers in Libraries, 29(2), 6-7, 41-3. Retrieved November 27, 2009, from Library Lit & Inf Full-Text database.

Tennant, R. (2002). Marc must die. Library Journal, Retrieved from http://www.libraryjournal.com/article/CA250046.html

No comments:

Post a Comment

Be courteous. Speak your mind. Don’t be rude. Share.