Preserving the Born-Digital Record

Many more questions than answers

May 28, 2015

Preserving the Born Digital Record

We are in trouble. The world is producing vast amounts of born-digital material. The volume, complexity, and dynamism of this information challenge us to think creatively about its capture, organization, and long-term preservation and usability. What is the role of the library? Is this a source of failure or opportunity for the global library community? Internet pioneer Vint Cerf warns us about the risk of a “digital dark age” if we do not develop the technologies, tools, financial resources, shared responsibilities, and will to address this risk to our cultural, scientific, societal, and community records. 

This is an issue of integrity, of the collective adherence to a code and standard of values, of maintaining human records as complete, unimpaired, and undivided as possible. The ability to consult the evidence and sources used by researchers and authors will be lost if those digital records are not available. The ability to research and investigate the history and current state of our world will be compromised if born-digital materials are gone or changed. The ability to access the sources of record will be difficult if they are deposited and dispersed into multiple and disparate sites. This is the challenge of repository chaos.

At the core of born-digital content preservation and archiving are four principles. We must hold the content, the archive as repository, because we cannot preserve what we have not collected. We must enable access, the repository as persistence. We must secure the content, the archive as curation. And we must take care of the content, the repository as steward.

Born-digital content comes in an ever-expanding array of forms and formats. Consider just the following examples:

  • published and licensed works such as e-journals, ebooks, e-video, and e-audio, from commercial and trade sources, from academic publishers, from the growing array of independent publishers and distributors, and the revolution in self-publishing and self-distribution
  • the output of e-government
  • online learning and training materials
  • research data from universities and corporations
  • social media in all of its wonderful expressions
  • electronic archives that come with personal papers
  • organizational records, including email, manuscripts, business papers, and financial information
  • websites and web documents
  • pictorial images
  • spatial data and longitudinal observations
  • software applications, both proprietary and open source
  • video games
  • medical data, with the inherent challenges of patient privacy
  • live feeds, like RSS and news information from around the world
  • visualizations and simulations
  • interoperable metadata, like MARC, BIBFRAME, and schema.org

And so on, with so many new things that will grow in intensity and intricacy.

New technologies are feeding the explosion in born-digital content. Each year Educause and the New Media Consortium publish the Horizon Report, which documents and describes important developments. Some examples from the past few years will illustrate the symbiosis and demonstrate the explosion:

  • mobile devices and tablets
  • cloud computing with distributed processing and applications
  • geo-everything, such as geolocation and geotagging
  • the personal web and customized management of online content
  • linked data connecting and relating structured information
  • semantic-aware applications that link meaning to answers
  • smart objects and smart spaces that connect information and the physical world
  • open content with wide distribution and repurposing
  • massive open online learning experiences
  • electronic books and the array of platforms and applications
  • Big Data and big science driving new forms of research information management
  • games as learning tools with participation and interaction
  • visualizations that bring meaning and understanding to data

Born Digital 2The challenge of born-digital content comes at the point when libraries are confronting critical trends. We are experiencing rapidly shifting user behaviors and expectations. We are trying to figure out how to move away from redundant, inefficient library operations and aging service paradigms. We recognize the need to achieve scale and network effects through aggregation in an environment of advanced open architecture and the acceleration of collective innovation. We are facing metadata chaos in terms of quality, currency, and accuracy. We face a new economic context and a mandate for systemic change. We are not sure how to deal with conditions of massive surveillance, security meltdowns, threats to network neutrality, and corporate control of the infrastructures of discovery and content.

How does born-digital content fit into what libraries do? Libraries select, acquire, and synthesize information. Libraries enable users to navigate, disseminate, interpret, understand, use, and apply information. Libraries preserve and archive information. These activities are carried out in support of teaching and learning, research and scholarship, and community health and development. We respond to a societal and global mandate. How will our roles and processes be extended to embrace born-digital content, or will the massive challenges spawn a new vision, purpose, method, and system?

Quality equals content plus functionality. How do we make sure that the born-digital content is preserved but also remains usable long term? That means that we understand and accommodate the important characteristics of digital information:

  • accessibility and availability, with no constraints on time and geography
  • the searchability and researchability (i.e., being able to ask new questions)
  • the currency and real-time nature of the information
  • its dynamism and fluidity and linkability
  • the collaborative and interactive elements
  • its encyclopedic potential but also its modularity
  • its volatility and fragility

Born-digital resources also force us to consider the relationship among form, text, and function, where content is no longer tied to format. We are encouraged to be more sensitive to context, renderability, and versioning over time. We see the inevitability of physical and format obsolescence, the importance of authenticity and provenance, and the role of standards such as globally unique identifiers.

The scope, depth, and cost of the threat mean that individual libraries cannot advance born-digital content preservation on their own. We need to radicalize cooperation, promoting new combinations and new public–private partnerships through national and global systemic strategies. Whether it is the creation of centers of excellence, or new thinking about mass production, or new infrastructures, or new initiatives and programs, we must start from a position of collaboration so as to maximize quality, productivity, and innovation. An excellent example of such an effort in the US is the work of the Digital Preservation Network (DPN), a backbone of diverse preservation infrastructure replication built on sound principles of audit and rights succession. We will not have the technologies, tools, workflows, or standards unless we work together in new ways.

The scope, depth, and cost of the threat mean that individual libraries cannot advance born-digital content preservation on their own.

It will be challenging to create a robust and successful born-digital content preservation capacity without new thinking about copyright. Libraries are capturing and preserving digital materials as fair use. Efforts to produce new exceptions or limitations in Section 108 of the Copyright Act for purposes of digital preservation have not been successful. Our law is out of sync with technology and user needs. Where does the preservation of born-digital content intersect with orphan works, with transformative use, with the public interest? What should be the relationship between licensing and copyright limitations?

What about the issue of open content and proprietary rights? How do we manage national copyright provisions in a global networked context?

How many libraries have well-­developed plans for born-digital content capture, description, and preservation? How many libraries have put in place the funding to enable and sustain these plans?

How are those agencies and foundations that fund libraries and support learning and scholarship responding to the challenge? Do we truly understand user expectations for digital content and how it will be used? What digital content has persistent value, and how will we make sound conditions on what to collect and preserve? How will persistence and quality be ensured? How will collaborative efforts be structured and good governance and sustainability ensured? What is needed for operational, organizational, and architectural scalability? It is our predetermined professional role, fate, and destiny to serve society’s interest and to take on responsibility through the collective library for the preservation of born-digital content.