Saving Digital Ephemera

Who’s collecting and archiving podcasts, tweets, emails, and other fleeting content?

January 4, 2016

Digital ephemera

In 2005, before the words “podcast” and “boom” ever appeared in the same sentence, an archivist named Jason Scott, proprietor of textfiles.com, attempted to collect every podcast in existence. (Many of those first files are still sitting on DVR discs in Scott’s attic.)

Larger institutions also got involved in attempting to preserve digital ephemera. That includes the Library of Congress (LC), which reached an agreement with Twitter in 2010 to build an onsite research archive.

“Archiving and preserving outlets such as Twitter will enable future researchers access to a fuller picture of today’s cultural norms, dialogue, trends, and events to inform scholarship, the legislative process, new works of authorship, education, and other purposes,” reads a 2013 white paper from LC on the topic.

However, at Twitter’s current size, its users send 200 billion tweets per year, and LC’s project eventually became unsustainable.

Academic libraries are helping to fill the void with social media research and data collection. George Washington Libraries at George Washington University in D.C., created an open source tool called Social Feed Manager to capture social media data for research, archiving, and academic work. In 2013, the project received a $24,550 Sparks! Ignition Grant from the Institute of Museum and Library Services (IMLS).

Likewise, Syracuse University’s School of Information Studies has created Social Media Tracker, Analyzer, and Collector Toolkit at Syracuse (STACKS), an open source project that collects and analyzes social media data related to the 2016 presidential campaign.

“It’s getting less and less expensive to save things digitally; it’s less of an issue,” says Rachael Bower, director of the Internet Scout Research Group, at the University of Wisconsin–Madison. “On the other hand, storing oodles and oodles of digital material with no easy way to access it, to look through it and know what you have, doesn’t seem ideal either.”

Librarians and archivists must also consider the speed with which technology evolves. An archival copy of a podcast, say, must include the relevant software to play the actual show, even if advances in computing will eventually make that software obsolete.

Alexis Rossi, director of media and access at Internet Archive (IA), which maintains the Wayback Machine—a program that constantly browses the internet to record and replicate websites at specific moments in time—says preserving digital files remains a subjective task, for the most part.

“People self-select,” she says. “Somebody has decided that I have this amazing collection of [personal] material, and I need to find a home for it.”

But IA’s mission, as Rossi describes it, is “to archive all of human knowledge and to make it accessible to everyone.” She says more than 2 million people use the site daily, and her colleagues are working to make it more searchable. (IA recently received a couple of large grants, including one for more than $350,000 from the Institute of Museum and Library Services (IMLS) to help expand the capacity for national web archiving.)

The Wayback Machine also houses collections of podcasts and blogs on the site, where individuals upload their own material onto secure IA servers. It’s where textfiles.com’s Scott now works.

Stanford University Libraries, meanwhile, with the assistance of a recent $685,000 National Leadership Grant for Libraries from IMLS, is developing the second phase of ePADD, an open-source discovery module that will provide researchers with easier access to email archives.

But most of the work around born-digital content is still preliminary. Kari R. Smith, a digital archivist at the Massachusetts Institute of Technology, says that, within umbrella organizations like ALA and the Society of American Archivists, there are round tables and working groups that are constantly looking at how to describe and capture this kind of material and how to ensure like-minded people don’t waste finite resources on projects with duplicate aims.

“Making sure you’ve got some sense of why you’re preserving what you’re preserving long term,” Bower says, “is incredibly critical.”