Archiving Against the Clock

Libraries and universities join forces to save government data

By Timothy Inklebarger | June 1, 2017

Justin Schell (standing, top right), director of Shapiro Design Lab at the University of Michigan’s Shapiro Undergraduate Library, was an organizer of a data rescue event at the library in January. Photo: Christina Czuhajewski

It’s been more than four months since Donald Trump was inaugurated as the 45th president of the United States, and librarians and data scientists are hard at work to preserve government research they fear could be lost or removed by his administration.

The effort began at University of Toronto and Van Pelt Library at the University of Pennsylvania prior to Trump’s inauguration and has since spread to as many as two dozen universities and libraries across the US and Canada.

The fear that government research and information—particularly that produced by the Environmental Protection Agency, NASA, and the National Oceanic and Atmospheric Administration—could be lost is not unfounded. It happened under the administration of former Canadian Prime Minister Stephen Harper, who according t news reports allowed fishery and oceanographic data to be destroyed after taking office in 2006.

It can happen here, too, says Dawn Walker, a PhD student at the Faculty of Information at University of Toronto, who has helped organize data recovery events at various universities.

Data scientists are using two methods to gather the publicly available data from government websites. Web crawlers scan websites and collect information and vital data sets for storage. But the more complex data sets require computer-savvy individuals—some of whom call themselves “baggers”—to write custom computer code to collect the information. Many of those scripts can be used on different pages, according to Walker. She notes that the Environmental Data and Governance Initiative—a network of academics and nonprofits working to preserve government data—has created an extension that can be added to the Google Chrome web browser that allows users to nominate data sets for archiving.

The data undergoes a thorough review process to ensure the integrity of the information collected, she says. The data sets are then downloaded to a repository at datarefuge.org, where the information is publicly available.

Justin Schell, director of Shapiro Design Lab at the University of Michigan’s Shapiro Undergraduate Library and organizer of a data rescue event, says he first learned of the guerrilla archiving events after reading about the University of Toronto’s hackathon in December. He says he’s been working on the project every day since.

Schell says collecting the data is only part of the challenge in preserving it. Verifying its validity and organizing, describing, and ensuring that the information is in an accessible format are just as vital, he says.

Those tasks entail, in many cases, contacting government officials, scientists, academics, and others who are knowledgeable about climate data. That includes cross-referencing data with studies available at data repositories like the Interuniversity Consortium for Political and Social Research, one of the largest in the world, Schell says.

It’s not uncommon for those collecting research and scientific data to end up with multiple and differing copies of the same information, Schell says. He says that leaves archivists and librarians to answer the question: “Which one is verifiable?”

“We’re trying to better understand what’s in these data sets to make sure we’re not just getting part of the picture,” he says, adding that effort takes “a lot of networking.”

Walker says librarians also have been instrumental in the “bagging” process, leading the conversation about best practices in archiving and preserving the material.

“It’s really exciting to see how this has been a way for people with expertise in a variety of areas to come together [on the issue],” she says. “Now there’s a public interest in [archiving the data], and a lot of people are looking for ways to continue that momentum.”

TIMOTHY INKLEBARGER is a writer living in Chicago.

Tagged Under

My ALA Journey

Working together, we can change the world

A “book” who identifies as Palestinian converses with two “readers” at a Human Library event hosted by Williams College Libraries in Williamstown, Massachusetts.

If These Books Could Talk

Patrons check out people at Human Libraries

Latest Library Links

2h

Chris Hoffman writes: “When I was younger, I was told ‘never use your real name on the internet.’ But the world has changed, and I don’t follow that advice anymore. Likewise, there’s a lot of well-meaning online security advice that has outlasted its usefulness. There’s a core of truth to each one of the security practices I criticize below, but you shouldn’t blindly follow these old tips. At best, you’ll be wasting your time. At worst, you’ll be putting yourself more at risk. Read on to learn more about the five outdated security practices you shouldn’t use anymore.”

PC World, Apr. 17
19h

Bobbi L. Newman writes: “This week, an article from the BBC caught my attention: ‘Without support, many menopausal workers are quitting their jobs.’ Supporting employees going through menopause is important for those experiencing it and for everyone’s wellbeing. It reflects a commitment to employee wellbeing and a strategic approach to workforce management. Libraries can adopt strategies to support employees experiencing menopause, enhancing their wellbeing and workplace productivity. Remember, with all wellbeing, the goal is to support and empower staff to make the choices that best improve their health. Here are some practical approaches.”

Librarian by Day, Apr. 19; BBC, Apr. 9
1d

Libraries have a long history of helping to deliver on a wide variety of development goals, from literacy and school readiness to research productivity and urban cohesion. Their unique potential has been recognized not just by the governments or others that traditionally fund them, but also by a range of other funders, private and public alike. The International Federation of Library Associations and Institutions has created a dataset to help librarians easily discover examples of private philanthropic grants, as well as other funding sources, that other libraries have been able to leverage.”

International Federation of Library Associations and Institutions, Apr. 22
2d

Jackie Jennings writes: “It feels like the debate over whether #BookTok is bad has been raging since the moment the term was first coined. I’m starting off with a strong stance: BookTok is indeed bad. However, the problem with BookTok is not crappy books or bogus influencers. The problem with BookTok is TikTok itself. BookTok isn’t actually a community driven by fans, writers, influencers, or even publishers: it’s part of a social media corporation, controlled by the most mysterious, fickle god of all, the algorithm.” Not surprisingly, librarian recommendations can overcome some of BookTok’s limitations.

Jezebel, Apr. 18; Book Riot, Apr. 22
2d

ALA announced the launch of its state Intellectual Freedom Helpline grant program April 22. Over the next two years, 10 pilot program sites will operate a confidential reporting system that will help connect those experiencing censorship attempts with professional support, in-state peers, or referral to ALA’s Office for Intellectual Freedom, as appropriate. State or school library associations or agencies wishing to either establish an Intellectual Freedom Helpline in their state or expand existing efforts may apply for $10,000 grants through July 14.

ALA Office for Intellectual Freedom, Apr. 22
2d

In celebration of the release of his latest nonfiction title, The Secret Lives of Booksellers & Librarians, bestselling author James Patterson is honoring select American Bookseller Association and American Library Association members with bonuses. He announced plans April 11 to give $200 each to 250 library workers across the country. The deadline for ALA members may nominate members to receive bonuses through April 30. Winners will be announced at ALA’s 2024 Annual Conference in San Diego.
3d

Catherine Hollerbach writes: “In early 2020, when the world shut down for COVID, many people got interested in houseplants. Anne Arundel County (Md.) Public Library’s Crofton Library embraced this trend and then some!” While preparing to reopen after the COVID shutdown, the library installed plants at the information desk to discourage patrons from sticking their heads through gaps in newly installed acrylic shields. They were well received and cared for, and the library gradually added more plants and built educational tools, programming, and partnerships around the plants.

Public Libraries Online, Apr. 18