The Past Is Prologue

Archiving the internet

January 26, 2015

Vannevar Bush"s concept of the memex.

The January 26 New Yorker has an article about Brewster Kahle, the Internet Archive, and its Wayback Machine. Jill Lepore’s article is titled “The Cobweb” and it sits under the heading of Annals of Technology. Take a look. In many ways it’s about us, what we value, and the challenges we face.

We tend to think of the web as fixed in time, immutable, just as stable as any collection of books. But we know it’s not. Lepore gives the example of the exuberant post by Igor Girkin about his group’s downing of Malaysia Airlines Flight 17 on July 17, 2014, in the Donetsk region of Ukraine. Just two hours later the post was taken down and would have been lost forever were it not for the Wayback Machine. She also cites many footnotes in legal and scholarly articles that point to web resources that are no longer there. The web is constantly changing. It’s like the ocean. The tide comes in bringing new shells and when it goes out it takes some of those shells with it. The beach never looks the same.

Brewster Kahle’s vision of a complete archive of the internet is magnificent, but preserving something that changes minute to minute is a tall order, even today.

For me, one of the most intriguing aspects of the article is the recounting of the visionaries who set the stage for the internet and how their visions started with libraries. Lepore doesn’t directly mention Vannevar Bush, but his July 1945 Atlantic Monthly article “As We May Think,” in which he discussed his concept of “memex,” really got people thinking of what might someday be possible. Bush’s memex was a physical desk that included storage for microfilm copies of a library of books and journals, several display screens to view the microfilms, a scanner to add content, and a printer. The memex would give the scholar the capability to link documents and passages through a coding system. Memex was to be a version of external memory for the scholar.

J. C. R. LickliderTwenty years later, J. C. R. Licklider (right), then working at Bolt, Beranek, and Newman in Cambridge, Massachusetts, was tapped by the Council on Library Resources (funded by the Ford Foundation) to lead a two-year study on the future of libraries. His study was published by MIT in 1965 as Libraries of the Future (PDF file). Licklider saw the printed page as a good display medium but, as Lepore writes, “bad at storing, organizing, and retrieving” information. She continues quoting Licklider: “‘We should be prepared to reject the schema of the physical book itself,’ he argued, and to reject ‘the printed page as a long-term storage device.’ The goal of the project was to imagine what libraries would be like in the year 2000. Licklider envisioned a library in which computers would replace books and form a ‘network in which every element of the fund of knowledge is connected to every other element.’”

The main criteria of the system Licklider envisioned from the point of view of the user were:

  1. Be available when and where needed.
  2. Handle both documents and facts. (“Facts,” used here in a broad sense, refers to items of information or knowledge derived from one or more documents and not constrained to the form or forms of the source passages. It refers also to items of information or knowledge in systems or subsystems that do not admit subdivision into document-like units.)
  3. Permit several different categories of input, ranging from authority-approved formal contributions (e.g., papers accepted by recognized journals) to informal notes and comments.
  4. Make available a body of knowledge that is organized both broadly and deeply — and foster the improvement of such organization through use.
  5. Facilitate its own further development by providing tool-building languages and techniques to users and preserving the tools they devise and by recording measures of its own performance and adapting in such a way as to maximize the measures.
  6. Provide access to the body of knowledge through convenient procedure-oriented and field-oriented languages.
  7. Converse or negotiate with the user while he formulates his requests and while responding to them.
  8. Adjust itself to the level of sophistication of the individual user, providing terse, streamlined modes for experienced users working in their fields of expertness, and functioning as a teaching machine to guide and improve the efforts of neophytes.
  9. Permit users to deal either with metainformation (through which they can work “at arms length” with substantive information), or with substantive information (directly), or with both at once.
  10. Provide the flexibility, legibility, and convenience of the printed page at input and output and, at the same time, the dynamic quality and immediate responsiveness of the oscilloscope screen and light pen.
  11. Facilitate joint contribution to and use of knowledge by several or many co-workers.
  12. Present flexible, wide-band interfaces to other systems, such as research systems in laboratories, information- acquisition systems in government, and application systems in business and industry.
  13. Reduce markedly the difficulties now caused by the diversity of publication languages, terminologies, and “symbologies.”
  14. Essentially eliminate publication lag.
  15. Tend toward consolidation and purification of knowledge instead of, or as well as, toward progressive growth and unresolved equivocation. It may be desirable to preserve, in a secondary or tertiary store, many contributions that do not qualify as “solid” material for the highly organized, rapidly accessible nucleus of the body of knowledge.
  16. Evidence neither the ponderousness now associated with overcentralization nor the confusing diversity and provinciality now associated with highly distributed systems. (The user is presumably indifferent to the design decisions through which this is accomplished.)
  17. Display desired degree of initiative, together with good selectivity, in dissemination of recently acquired and “newly needed” knowledge.

I provide these 17 user-based criteria to give a sense of the depth of thinking in this study and to invite comment on how far we have come toward meeting these criteria as well as comment on which of these criteria are no longer relevant or desirable.

Even more fascinating is Licklider’s depiction of how a user would interact with the system he envisions. Have a look at pages 45–58 and follow step by step through an interaction (he calls it a conversation) between a user and the system using a real research question. Licklider takes 1994 as a target date for the system to be in place. Elements of the system were in place years earlier, but is the entire system in place even today?

One might wonder how this relates to our present ebook issues and the work of the Digital Content Working Group. Well, libraries (particularly academic libraries) provide access to vast arrays of online content for researchers and scholars. Journals in electronic formats are widely available, though often extremely expensive. Books in electronic form are extremely expensive and not as widely available. In public libraries journal literature is widely available through online databases, but ebooks are not due to cost and inadequate retrieval systems.

It’s 2015 and we haven’t really met the vision put forward in 1965 for the year 2000. There’s no doubt that the computing power, information storage, and software are ready to go. What is lacking is the funding libraries need to invest in online resources, cooperation from the publisher and vendor communities, and a clear vision and plan for statewide or even national access to the electronic content our users require.