Data, Data Everywhere

As the Big Data beast fattens, will privacy and ethics get gobbled up?

By Joseph Janes | April 16, 2012

The Michigan Theater is at 603 East Liberty St. in Ann Arbor. Athletes Tom Brady and Cal Ripken have the same body mass index, 27—lower than Dr. Phil’s but higher than Abraham Lincoln’s. Austria’s fertility rate peaked in 1963 and has been falling steadily ever since. Q Lending Inc., of Coral Gables, Florida, received the smallest bailout from the TARP program, at $10,000.

I’m sure you found all of these as fascinating as I did, undoubtedly also wondering where this was going. These facts and a few gazillion others come to you courtesy of Factual, the brainchild of mathematician Gilad Elbaz, who gave us the company that is now Google’s AdSense. In Factual’s 500 terabytes of storage, there’s data from sources governmental and private, on topics broad and narrow, profound and trivial. It’s worth a wander through the website and its featured data sets to see just what it’s been vacuuming up.

A feature article in the March 24 New York Times tells us the company’s plan is “to build the world’s chief reference point for thousands of interconnected supercomputing clouds,” and goes on to describe of Factual’s clientele and and how they use the product. It also names a few competitors, including Infochimps, Gnip, and of course Wolfram Alpha, which partially powers Siri. Factual, by the way, is hiring; its “data specialist” jobs sound more than a little familiar, even if the page describing them lists 2010–2011 internship opportunities. Oops—I guess bad data can creep in everywhere.

This came hard on the heels of the announcement that the Statistical Abstract of the United States had been saved at the last moment by ProQuest. I’m glad of that; it seemed a shame that the government no longer felt it was worth publishing. I should be clear: I’ve never been a fan of the Abstract. (I’m a World Almanac sort of guy.) While its various elements are valuable and come in handy, the way in which it was organized—particularly the index that gave table numbers rather than pages—seemed stubbornly user-hostile to me. And the web version, consisting of large PDF slabs of tables, has gone from understandably simple to gratingly low-tech. Adding Excel versions was nice, though the whole thing still comes off as antediluvian.

Maybe ProQuest will attend to these shortcomings. In any event, these make for a sharp and illustrative counterpoint. One way of thinking about compiling Lots of Data is to organize it, by category—which perhaps yields some context and texture—add some metadata and a search mechanism, all in the service of providing access, so individual people can find a specific fact or set of facts in answer to a question.

Another way, only now feasible, is to mush it all together and see what can be learned. Not by an individual, necessarily, but rather by throwing tons of computing power at it to see what emerges. Both are attempts to somehow wrap our arms and minds around the vertiginous scope and complexity of data being generated and stored every second.

The name “Big Data” gets thrown around a lot, to denote this massive-data-conglomeration phenomenon. We’re told this will be an opportunity for information-focused people to collect, curate, manage, organize. All likely true, and all worth pursuing as extensions of work we’re familiar with.

Go one step further, though. How about professionals who work to humanize this field? Those who think about questions of privacy, authority, quality, authenticity, rationality, and ethicality. Who center these processes in efforts to better the human condition and the lives of individuals. Who build tools to gyre and gimbal in the taffeta of data to find just the right thread for a person in need. Somebody like, I don’t know, a reference librarian . . . but that’s another story.

JOE JANES is associate professor at the Information School of the University of Washington.

Tagged Under

Opinion & Commentary

Bookmobiles: A Proud History, a Promising Future

On National Bookmobile Day, the mobile libraries are running strong

My Year of RDA

Latest Library Links

20h

Sarah Wild writes: “The Library of Congress (LC) is home to more than 175 million works humans have produced, from ebooks to ancient scrolls, which it aims to preserve for future generations. But even a library this extensive can only preserve a fraction of the books published annually around the world, let alone other formats. To learn more about how LC makes its weighty decisions about shaping our society’s collective memory, Scientific American spoke with the library’s collection development officer Joseph Puccio, who retired last month, and director for preservation Jacob Nadal.”

Scientific American, July 16
24h

Nick Tanzi writes: “For many library workers (and the public at large), artificial intelligence (AI) is an unfamiliar, poorly-understood technology. This is unsurprising when we look at the speed at which AI has been moving; we are in an environment of constant change, rapid innovation, and little regulation. There are enough problematic aspects of AI (algorithmic bias, hallucination, privacy concerns, etc.) where it would seem prudent to simply avoid the technology, at least until things settle further. This would be a mistake! Early engagement is critical to understanding artificial intelligence, and a working knowledge of AI within the library is necessary.”

The Digital Librarian, July 17
2d

Shannon McClintock Miller writes: “Genrefication is the process of organizing, classifying, and categorizing items into genres, making it easier for our readers to browse and find books they want to read. It also helps our teachers find books they want to tie to the curriculum and use with their students in the classroom. I would love to share 10 tips for genrefication that I have used within our library at Van Meter (Iowa) School.”

The Library Voice, July 21
2d

Michael Kimmelman writes: “The other day, I wrote about two new branch libraries in New York City, which share an unusual feature: They’re both paired with 100 percent affordable housing developments. The economics of building subsidized housing in America depends on land that costs little or nothing, which almost inevitably means building on public land. But having sold off much of what it owned, New York no longer has a large inventory of big, usable lots for deeply affordable projects. The city does own library branches, however. And they occupy public land in the heart of many neighborhoods.”

New York Times Headway, July 17
3d

Cindy Hohl writes: “As we look ahead to meet the information needs of our membership, it is important that we stand united to remember why we work in this trusted profession. ALA has some big goals ahead with the hiring of the next ALA executive director, celebrating the Association’s 150th anniversary, creating a strategic plan to elevate the role of ALA throughout the field, and ensuring that we have strong member leadership to offer guidance and support. Our core values help us see that what one holds sacred is a touchstone in advancing this work as we strive to serve everyone in #AGoodWay together.”

American Libraries column, July/Aug.
3d

Trey Walk writes: “Over the past few years, new policies and laws in Florida have stopped teachers from discussing sexual orientation and gender identity, while also repressing any honest efforts to grapple with systemic racism and slavery. [But] many people have taken a stand against these prejudiced policies, including in Florida. Their goal: to build a brighter future for the state. We hope today’s leaders fighting censorship in Florida can offer a blueprint to people across the US to further truly free and just education.”

Human Rights Watch, July 18
4d

Angela Dennis writes: “Young scholars, parents, and staff from the East Tennessee Freedom Schools program, a local summer enrichment program, marched in downtown Knoxville July 17 to protest literary censorship in public schools and libraries across the country. Recent legislation in Tennessee has fueled concerns about literary censorship. In May, Gov. Bill Lee signed into law an expansion of the Age-Appropriate Materials Act. Opponents fear the new law might exclude the perspectives of marginalized groups in educational materials. The Knox County Schools Board recently revised its library policy to align with the new state law, further fueling debate.”

Knoxville News Sentinel, July 18