The Evolving Catalog

A woman using the card catalog at the main reading room of the Library of Congress, circa 1940. Photo: Library of Congress

OCLC printed its last library catalog cards on October 1, 2015, ending an era that lasted more than 150 years. As technology changes library cataloging, we look back at its history and forward into its future.

Today when we say “technology,” it is often shorthand for “computer technology.” Of course this is not the only technology in our lives, but it is the one that defines our modern age. A century and a half ago, the defining technology was electricity and all things electric.

The light bulb was literally the bright idea of the day. Today we have LED light bulbs that we can control with a smartphone app, turning on the lights when we are still on our way home, or creating a romantic atmosphere by changing the color and intensity of the light at the touch of a screen.

If we move back in time, we see ages defined by their technological innovations: steam power, water power, or the precision use of metals that made it possible to create accurate timepieces and to automate the production of fine cloth. We can go back to the printing press, clearly a defining technology for all that came after it. Printing technology depended both on innovation with metals and also on the development of papermaking techniques that greatly improved on previous writing surfaces, like sheepskin, papyrus, wax, clay, and stone.

Basically, it’s technology all the way back to fire and the first stone axes. We naturally take for granted the technologies that precede our own age, and we marvel at the ones that are new. Libraries of course have been technology-based from the beginning of their history. Some of the earliest libraries that we know of were furnished with writings in the form of scrolls. Medieval libraries held bound manuscripts. The big leap forward was the Gutenberg revolution and the concomitant increase in the production of copies of texts. The number of books increased, and their abundance made them more affordable. Other technologies also affected libraries, such as the development of electric lighting, which reduced the threat of fire and allowed readers to make use of the library outside of daylight hours.

In the 18th and 19th centuries, not only were more copies of books produced than ever before, but the numbers of new writings and new editions also grew. Library holdings thus increased as well, which led to difficulties in keeping up with an inventory of the items held. Today we assume that every library has a catalog, but even in the 1800s some libraries had no actual record of their holdings or relied on a brief author list. Much “finding” done in libraries at the time relied on the memory of the librarian. Charles Ammi Cutter (1837–1903), writing about the catalog of the Harvard College Library in 1869, took pity on the librarian overseeing a collection of 20,000 books without a proper catalog, who had to attempt to answer subject-based queries using only his own knowledge of the collection’s content.

The birth of the catalog

Pages from Lawrence University’s handwritten library catalog, circa 1855. Photos: Lawrence University Archives — Pages from Lawrence University’s handwritten library catalog, circa 1855. Photo: Lawrence University Archives (Appleton, Wisc.)

The library catalog technology of Cutter’s day was a printed book. Printed book catalogs had the same advantages as books themselves: They could be produced in multiple copies and were highly portable. A library could give a copy of its catalog to another library, thus making it possible for users to discover, at a distance, that a library had the item sought. The disadvantages of the printed book catalog, however, became more serious as library collections grew and the rate of growth increased. A library catalog needed near-constant updating. Yet the time required to produce a printed book catalog (in an era in which printing required that each page be typeset) meant that the catalog could be seriously out of date as it came off the press. Updating such a catalog meant reprinting it in its entirely, or staving off an expensive new edition by producing supplementary volumes of newly acquired works, which then made searching quite tedious.

In the mid-1800s the library card catalog was already winning hearts and minds. Cutter attributed the 1861 development of the card catalog to Ezra Abbot (1819–1884), assistant librarian of Harvard College. Although neither the book catalog nor the card catalog meets all needs as efficiently as one would desire, the card catalog had already proven itself as an up-to-date instrument for library users and librarians alike.

Markus Krajewski, in his book on the history of card files, Paper Machines: About Cards and Catalogs, 1548–1929 (2011), shows that cards on paper slips had been used in earlier times, in particular by the early bibliographers and encyclopedists who needed to create an ordered presentation of a large number of individual entries. It was libraries, however, that demonstrated how useful and flexible the card catalog could be.

Cards were lauded by Melvil Dewey (1851–1931) in his introduction to early editions of his Decimal Classification, although his classification and “relativ index” in no way required the use of a card system. However, the “Co-Operation Committee” of the newly formed American Library Association (ALA) announced its decision on the standardization of the catalog card in 1877; not coincidentally, Dewey’s library service company, The Library Bureau, founded in 1876, was poised to provide the cards to libraries at a cost lower than custom-produced card stock.

The Library Bureau soon branched out into the provision of catalog furniture and a variety of card-based products for a growing business records market. In fact, before long, providing cards to libraries was only a small portion of The Library Bureau’s revenue as businesses and other enterprises in the United States and Europe turned to card systems for record keeping. Krajewski considers these card systems the early precursors of the computerized database because of the way they atomized data into manipulatable units and allowed the reordering of the data for different purposes.

Uniform innovations

It should be obvious that both the book catalog and the card catalog were themselves technologies, each with different affordances. They also were affected by related technological developments, such as changes in printing technologies. The typewriter brought greater uniformity to the card catalog than even the neatest “library hand” could, and undoubtedly increased the amount of information that one could squeeze into the approximate three-by-five surface. When the Library of Congress (LC) developed printed card sets using the ALA standard size and offered them for sale starting in 1902, the use of the card catalog in US libraries was solidified.

Henriette Avram presents a magnetic tape containing 9,300 records to Richard Coward of the British National Bibliography, 1967. Photo: American Libraries, October 1989

After Dewey, the person who had the greatest effect on library technology was Henriette Avram (1919–2006), creator of the Machine Readable Cataloging (MARC) format. This was not only an innovation in terms of library technology, it was generally innovative in terms of the computing capability of the time. In the mid-1960s, when MARC was under development, computer capabilities for handling textual data were very crude. For example, look at a magazine mailing label. You will see uppercase characters only, limited field sizes, and often a lack of punctuation beyond perhaps a hash mark for apartment numbers. This is what all data looked like in 1965. However, libraries needed to represent actual document titles, author names, and languages other than English. This meant that the library data record needed to have variable length fields, full punctuation, and diacritical marks. Avram delivered a standard that was definitely ahead of its time.

Although the primary focus of the standard was to automate the printing of cards for LC’s card service, Avram worked with staff at LC and other libraries involved in the project to leverage the MARC record for other uses, such as the local printing of “new books” lists.
To make these possible, the standard included nontext fields (in MARC known as “fixed fields”) that could be easily used by simple sort routines. The idea that the catalog could be created as a computerized, online access system from such records was still a decade away, but Librarian of Congress L. Quincy Mumford announced in his foreword to Avram’s 1968 document The MARC Pilot Project that MARC records would be distributed beginning in that year, and that this “should facilitate the development of automation throughout the entire library community.” And it did.

Dewey did not anticipate the availability of the LC printed card service when he proposed the standardization of the library catalog card, yet it was precisely that standardization that made it possible for libraries across America to add LC printed cards to their catalogs. Likewise, Avram did not anticipate the creation of the computerized online catalog during her early work on the MARC format, but it was the existence of years of library cataloging in a machine-readable form that made the Online Public Access Catalog (OPAC) a possibility.

Going online

The next development in library catalog technology was the creation of that computerized catalog. It would be great to be able to say that the move from the card catalog to the online catalog was done mainly with the library user’s needs in mind. That wasn’t my experience working on the University of California’s online catalog in the early 1980s. The primary motivators for that catalog were the need to share information about library holdings across the entire state university system (and the associated cost savings) and to move away from the expense and inefficiency of card production and the maintenance of very large card catalogs.

At the time that the library developed the first union catalog, which was generated from less than half a dozen years of MARC records created on the systems provided by the Ohio College Library Center (later known solely as OCLC) and the Research Libraries Group’s RLIN system, the larger libraries in the University of California systems were running from 100,000 to 150,000 cards behind filing into their massive card catalogs. This meant that cards entered the catalog about three months after the book was cataloged and shelved. For a major research library, having a catalog that was three months out of date—and only promising to get worse as library staffing decreased due to budget cuts—made the online catalog solution a necessity.

Library catalogs moved online in the 1980s, some using the Dynix system. Photo: Skylarstrickland/Wikimedia

We, and by “we” I mean all of us in library technology during this time, created those first systems using the data we had, not the data we would have liked to have had. The MARC records that we worked with were in essence the by-product of card production. And now, some 35 years later, we are still using much the same data even though information technology has changed greatly during that time, potentially affording us many opportunities for innovation. Quite possibly the greatest mistake made in the last two to three decades was failing to create a new data standard that would be more suited to modern technology and less an imitation of the library card in machine-readable form. The MARC record, designed as a format to carry bibliographic data to the printer, was hardly suited to database storage and manipulation. That doesn’t mean that databases couldn’t be created, and to be sure all online catalogs have made use of database technology of some type to provide search and display capabilities, but it is far from ideal from an information technology standpoint.

The real problem is the mismatch of the models between the carefully groomed text of the catalog entry and the inherent functionality of the database management system. The catalog data was designed to be encountered in an alphabetical sequence of full headings, read as strings from left to right; strings such as “Tolkien, J. R. R. (John Ronald Reuel), 1892–1973” or “Tonkin, Gulf of, Region—Commerce—History—Congresses.” Following the catalog model of which Charles Cutter was a primary proponent, the headings for authors, titles, and subjects are designed to be filed together in alphabetical order in a “dictionary catalog.”

Managing and retrieving data

Database management systems, which are essential to permit efficient searching of large amounts of data, work on an entirely different principle from the sequential file. A database management system is able to perform what is called “random access,” which is the ability to go seemingly directly to the entry or entries that match the query. These entries are then pulled from the database as a set. A set of retrieved entries may be from radically different areas of the alphabetical sequence, and once retrieved are no longer in the context intended by the alphabetical catalog.

Database management systems include the ability to treat each word in a sentence or string as a separate searchable unit. This has been accepted as a positive development by searchers, and is now such a common feature of searching that today most do not realize that it was a novelty to their elders. No longer does a search have to begin at the same left-anchored entry determined by the library cataloging rules; no longer does the user need to know to search “Tonkin, Gulf of” and not “Gulf of Tonkin.” Oddly enough, in spite of the overwhelming use of keyword searching in library catalogs, which has been shown to be preferred by users even when a left-anchored string search was also available, library cataloging has continued its focus on headings designed for discovery via an alphabetical sequence. The entire basis of the discovery mechanism addressed by the cataloging rules has been rendered moot in the design of online catalogs, and the basic functioning of the online catalog does not implement the intended model of the card catalog. Parallel to the oft-voiced complaint that systems developers simply did not understand the intention of the catalog, the misunderstanding actually goes both ways: Significant differences in retrieval methods, that is, sequential discovery on headings versus set retrieval on keywords, did not lead to any adaptation of cataloging output to facilitate the goals of the catalog in the new computerized environment. Library systems remain at this impasse, some 35 years into the history of the online catalog. The reasons for this are complex and have both social and economic components.

It is not easy to explain why change was not made at this point in our technology history, but at least one of the factors was the failure to understand that cataloging is a response to technical possibilities. Whether the catalog is a book, a card file, or an online system, it can only be implemented as an available technology.

Unlike most other communities, the library community continues to develop some key data standards that it claims are “technology neutral.” It is, however, obvious that any data created today will be processed by computers, will be managed by database software, will be searched using database search capabilities, and will be accessed by users over a computer network. One ignores this technology at great peril.