Saving Digital Ephemera

Who’s collecting and archiving podcasts, tweets, emails, and other fleeting content?

January 4, 2016

Digital ephemera

In 2005, before the words “podcast” and “boom” ever appeared in the same sentence, an archivist named Jason Scott, proprietor of textfiles.com, attempted to collect every podcast in existence. (Many of those first files are still sitting on DVR discs in Scott’s attic.)

Larger institutions also got involved in attempting to preserve digital ephemera. That includes the Library of Congress (LC), which reached an agreement with Twitter in 2010 to build an onsite research archive.

“Archiving and preserving outlets such as Twitter will enable future researchers access to a fuller picture of today’s cultural norms, dialogue, trends, and events to inform scholarship, the legislative process, new works of authorship, education, and other purposes,” reads a 2013 white paper from LC on the topic.

However, at Twitter’s current size, its users send 200 billion tweets per year, and LC’s project eventually became unsustainable.

Academic libraries are helping to fill the void with social media research and data collection. George Washington Libraries at George Washington University in D.C., created an open source tool called Social Feed Manager to capture social media data for research, archiving, and academic work. In 2013, the project received a $24,550 Sparks! Ignition Grant from the Institute of Museum and Library Services (IMLS).

Likewise, Syracuse University’s School of Information Studies has created Social Media Tracker, Analyzer, and Collector Toolkit at Syracuse (STACKS), an open source project that collects and analyzes social media data related to the 2016 presidential campaign.

“It’s getting less and less expensive to save things digitally; it’s less of an issue,” says Rachael Bower, director of the Internet Scout Research Group, at the University of Wisconsin–Madison. “On the other hand, storing oodles and oodles of digital material with no easy way to access it, to look through it and know what you have, doesn’t seem ideal either.”

Librarians and archivists must also consider the speed with which technology evolves. An archival copy of a podcast, say, must include the relevant software to play the actual show, even if advances in computing will eventually make that software obsolete.

Alexis Rossi, director of media and access at Internet Archive (IA), which maintains the Wayback Machine—a program that constantly browses the internet to record and replicate websites at specific moments in time—says preserving digital files remains a subjective task, for the most part.

“People self-select,” she says. “Somebody has decided that I have this amazing collection of [personal] material, and I need to find a home for it.”

But IA’s mission, as Rossi describes it, is “to archive all of human knowledge and to make it accessible to everyone.” She says more than 2 million people use the site daily, and her colleagues are working to make it more searchable. (IA recently received a couple of large grants, including one for more than $350,000 from the Institute of Museum and Library Services (IMLS) to help expand the capacity for national web archiving.)

The Wayback Machine also houses collections of podcasts and blogs on the site, where individuals upload their own material onto secure IA servers. It’s where textfiles.com’s Scott now works.

Stanford University Libraries, meanwhile, with the assistance of a recent $685,000 National Leadership Grant for Libraries from IMLS, is developing the second phase of ePADD, an open-source discovery module that will provide researchers with easier access to email archives.

But most of the work around born-digital content is still preliminary. Kari R. Smith, a digital archivist at the Massachusetts Institute of Technology, says that, within umbrella organizations like ALA and the Society of American Archivists, there are round tables and working groups that are constantly looking at how to describe and capture this kind of material and how to ensure like-minded people don’t waste finite resources on projects with duplicate aims.

“Making sure you’ve got some sense of why you’re preserving what you’re preserving long term,” Bower says, “is incredibly critical.”

RELATED ARTICLES:

Christy Karpinski and a selection of political buttons from the Busy Beaver Button Museum in Chicago. Photos: Rebecca Lomax/American Libraries (Karpinski); Busy Beaver Button Company (buttons)

Bookend: Pushing Buttons

Button Museum archives tiny pieces of history

Kansas City (Mo.) Public Schools students work on the computers at Kansas City Public Library. (Photo: Kansas City (Mo.) Public Library)

Linking Students to Libraries

Student IDs serve as library cards in Kansas City and Nashville

One thought on “Saving Digital Ephemera”

  1. We can’t start second-guessing ourselves about the “whys” and “hows” of storing and accessing digital (or analog) information, otherwise we run a risk of depriving the future of information that new technologies could have accessed and that new inquiries could have discovered. We need to become hoarders in the best sense of the word and on the grandest of scales – amassing with zeal and abandon and storing with care and attention. A project such as the University of Lincoln’s forensic examination of medieval wax seals is just one of many examples of how new technologies and methods are being used to discover and extract information from previously overlooked sources. In the digital realm, one only needs to consider the many and growing intersections of big data and digital humanities projects to realize that the more data that projects such as Chronicling America’s digital newspaper repository can offer, the better. Now imagine, with more than a nod to Nicholson Baker’s excellent “Double Fold”, that we had not only retained all of the physical newspapers that were poorly microfilmed and destroyed in the past but also all of their inserts, comics, and advertisements, which were discarded without having been microfilmed. Newer digital technologies could have created superior scans of a far richer store of information for discovery by an even broader spectrum of disciplines. The mission and primary focus of each and every institution should be, to paraphrase Alexis Rossi, to archive all of its knowledge and to make it accessible to everyone. Even, at a certain level, attempting to identify redundant aims across institutions could lead to misguided decisions. My copy of an incunable might appear to be the same as yours and, for example, not be selected for digitization in a consortium digitization project. However, future technologies could have the potential to discover certain previously indiscernible characteristics via the digital copy (say, in a woodcut or the typeface) that would have indicated distinction and perhaps even uniqueness. With apologies to Ruskin, go to the digital, rejecting nothing, selecting nothing, and scorning nothing…. The future will be grateful.

Comments are closed.