Historians Await Access to the Library of Congress’s Twitter Archive

May 17, 2010

The microblogging service Twitter has gifted its entire archive of tweets, totalling billions of 140-character posts dating back to March 2006, to the Library of Congress.

"The Twitter digital archive has extraordinary potential for research into our contemporary way of life," said Librarian of Congress James H. Billington. "This information provides detailed evidence about how technology-based social networks form and evolve over time. The collection also documents a remarkable range of social trends. Anyone who wants to understand how an ever-broadening public is using social media to engage in an ongoing debate regarding social and cultural issues will have need of this material.”

Highlights of the collection include the first-ever tweet from Twitter co-founder Jack Dorsey; President Obama's tweet after winning the 2008 presidential election; two tweets by photojournalist James Buck, who was arrested in Egypt and whose use of Twitter set off events that contributed to his freedom; and Green Revolution tweets related to protests of the 2009 Iranian presidential elections.

"It's very exciting that tweets are becoming part of history," Twitter co-founder Biz Stone wrote on Twitter’s blog, "The open exchange of information can have a positive global impact."

How they'll be used

"We are interested in offering collections of tweets that are complementary to some of the Library's digital collections: for example, the National Elections Web Archive or the Supreme Court Nominations Web Archive," explained Library of Congress spokesperson Matt Raymond on an LC FAQ about the acquisition, two weeks after he blogged, "It boggles my mind to think what we might be able to learn about ourselves and the world around us from this wealth of data. And I'm certain we'll learn things that none of us now can even possibly conceive.”

The collection is not yet in LC's possession, and specific plans for its use and any partnerships based around the archive have not yet been drawn up. "We look at it as kind of a case study for taking in digital content," LC Director of Integration Management Beth Dulabahn told American Libraries, noting that LC will have to address both technical issues like information-finding tools, and policy matters such as who will be able to access the collection.

Privacy is an area that has been considered. Only public tweets will be included in the archive. "A tiny percentage of accounts are protected but most of these tweets are created with the intent that they be publicly available," Stone wrote. Deleted tweets, private account information, and linked information such as pictures and websites will not be included. And only tweets at least six months old will be made available for research.

Those policies may be revised in the future, but "until we can see what content is there, we can't begin to think about what kind of restrictions to place on it," Dulabahn said.

Historical significance

While the announcement was greeted with some easily predicted snark—the first comment on LC's FAQ about the gift said, "It's critical the future generations know what flavor burrito I had for lunch," for example—some historians see great significance to the collection.

"Twitter is tens of millions of active users," wrote Daniel J. Cohen, associate professor of history at George Mason University, in the April 30 New York Times. "There is no archive with tens of millions of diaries… Twitter is of the moment; it's where people are the most honest."

"I think Twitter will be one of the most informative resources available on modern-day culture, including economic, social, and political trends, as well as consumer behavior and social trends," said Margot Gerritson, head of Stanford (Calif.) University's Center of Excellence for Computational Approaches to Digital Stewardship, an LC partner.

For its part, LC observed in an April 14 press release announcing the gift that "The archive follows in the Library's long tradition of gathering individuals' firsthand accounts of history, such as 'man on the street' interviews after Pearl Harbor; the September 11, 2001, Documentary Project; the Veterans History Project; and StoryCorps." The collection expands the Library of Congress's web-based holdings by about 3%, adding about 5 terabytes of data to the 167 terabytes of information—including legal blogs, political candidate websites, and websites of members of Congress—that it has archived since 2000.