Unlocking the Riches of HathiTrust

It’s a whole new world for digital access, a round table of fair-use experts agree

January 16, 2013


The constitutionality of digital fair use was upheld this past October, when US District Court Judge Harold Baer summarily dismissed the Authors Guild’s year-old lawsuit against the HathiTrust library collaborative to block the use of its growing repository of millions of full-text book scans. Calling the project “the enduring work of libraries,” HathiTrust Executive Director John Wilkin told American Libraries the organization continues to plan “more and better” uses of its scanned content. An appeal is pending. Meantime, blogerati Karen Coyle, Barbara Fister, and James Grimmelmann shared with AL how they see this decision shaping the future of sharing digitally preserved print materials.

AMERICAN LIBRARIES: What does the HathiTrust decision mean for libraries who are considering digitizing their own holdings, either through Google or on their own?

BARBARA FISTER (Library Babel Fish): This is encouraging for librarians. Judges seem to be recognizing the imbalance of power between rights holders and the general public. The Constitution clearly meant to use government-granted limited monopolies as an incentive to advance science and culture, but legislation that has extended the reach of copyright—and the penalties for violating it—has primarily benefited owners of intellectual property, not the public, and not creators who are severely limited in how they can build on others’ work. It is encouraging that judges feel what libraries do is not only legal but worth defending. This decision should embolden us to go forward with projects that we believe are legal without quite so much anxiety about potential penalties. Of course, the Authors Guild is appealing, so we’ll see what happens next.

KAREN COYLE (Coyle’s InFormation): It is clear that the judge’s decision-making was based heavily on the uses made of the materials—keyword-based indexing of texts, with the results including only page numbers and the number of times the keyword term appears on the page, and providing full text of documents to users who qualify as visually disabled. Libraries wishing to make other uses, such as providing some materials for classrooms, must look elsewhere.

JAMES GRIMMELMANN (The Laboratorium): It provides both potential legal cover for digitizing books and some compelling uses for the scans. Most libraries are not going to be building their own full-text search engines, but all libraries have print-disabled patrons. The decision opens up major new possibilities for giving them access to the full range of a library’s collection.

Does the ruling apply just to Google scans? What about the Digital Public Library of America (DPLA) and other digital repositories that scan works?

FISTER: Though the case was primarily about Google Books, it has much wider implications. The judge’s interpretation of fair use, including indexing, the need to preserve the cultural record, and making books accessible to the blind, is a stirring endorsement of library values.

COYLE: I see nothing in the ruling that limits itself to Google scans. Many libraries have done some scanning, often under the exceptions provided for libraries in Section 108 of the copyright code. Note that the digitization of books in the public domain is not under discussion in this suit or in any of the other lawsuits against Google and its partners. The question is only whether in-copyright books can be digitized for these designated purposes. Public domain works, of course, carry no restrictions on digitization, copying, or any other uses.

GRIMMELMANN: The ruling technically applies only to Google’s library partners named as defendants. Its effects are likely to extend much further. DPLA has not, so far, been an effort to digitize in-copyright books and make them available to patrons. That’s not likely to change in the short term; even under HathiTrust’s reasoning, a library can’t simply digitize a book and put it online.

What are the implications of this decision for orphan works?

FISTER: None, at this point. HathiTrust suspended its orphan works program when the Authors Guild objected to HathiTrust’s procedures, and the judge would not rule on the legality of something that HathiTrust was not actually doing at the time of the suit. In a way, this in itself is significant. The judge also said that rights holders couldn’t forbid libraries from doing something on the grounds that someday, maybe, rights holders would find a way to monetize what libraries were doing. It pretty much was a declaration that there is no “future maybe” tense in judicial language.

COYLE: For now, HathiTrust will give orphan works the same fair use treatment as works with identifiable copyright owners. The question of whether an institution like HathiTrust can release orphan works through some process of due diligence is open and mostly uninvestigated.

GRIMMELMANN: The decision will put orphan works in search engines and open them up to researchers who work with entire corpuses of books (rather than with individual ones). This may help reunite some orphans with their copyright owners, reducing the scale of the problem. And it will also inform the inquiry the Copyright Office is making for ways forward on orphan works.

Is this a green light for Bing and other search engines toclaim “fair use” for search-engine indexing of copyrighted works, as long as the indexing does not lead to full-text?

FISTER: It certainly is encouraging that a judge has said creating an index using digital copies of works is transformative enough to be a fair use, even if it’s on a large scale. That said, indexing books seems to have been a far more contentious issue than indexing the web for the purposes of search. Somehow, most people seem to think the concept of copyright pertains uniquely to texts that sometimes take a nondigital form and forget that it also pertains to websites. There haven’t been large-scale objections on copyright grounds to caching copies of web pages, even though the issues are parallel.

COYLE: There is still one case whose ruling is needed before we know if for-profit use will be ruled to be fair use—Authors Guild v. Google, which is the remaining thread in the now-seven-year saga of authors and publishers versus Google’s Book Search project.

GRIMMELMANN: Search engines have had a green light for years, thanks to cases on online search and automatic plagiarism detection in student term papers. This case just makes the green light a little brighter. At some point, even the most cautious pedestrians should feel safe stepping out into the street.

What does this decision mean for further digitization andonline research in the digital humanities? What types of initiatives have been on hold pending the court ruling?

FISTER: I’m not sure what projects have been affected, but I suspect many have either been halted or not even conceived because the penalty for making a mistake, even innocently, is so high. When you could be fined $150,000 for putting online one text that you thought was in the public domain but isn’t, it tends to discourage creativity and scholarship. I suspect some fields have been luckier than others. People teaching and doing research on pre–20th-century classics and history have had more opportunities to do digital work. This leads to some peculiar situations. Students are often perplexed that they have free online access to the Catholic Encyclopedia (published in 1907, so out of copyright). Yet the New Catholic Encyclopedia published in 2003 (now with Vatican II!) is only available in print where I work [Gustavus Adolphus College library].

COYLE: The proposed settlement between Google and representatives of authors and publishers included some specific text relating to “nonconsumptive uses” of the corpus of digital texts—uses in which the texts were not read by humans and did not take the place of reading the texts. Keyword indexing was one use, but there was also a desire on the part of humanities scholars to be able to do sophisticated text analysis using this data, a question that I am not confident has been settled by this case.

GRIMMELMANN: How on earth would one either determine a fair fee to do an influence analysis on 20th-century novels or fairly divide the revenues among novelists? By recognizing the potential in this still-quite-young field, Judge Baer’s opinion gives research in the digital humanities the freedom to grow and mature.

How does the Georgia State University e-reserves case decision align with the HathiTrust decision?

FISTER: Both decisions are hopeful signs that the courts are willing to wrestle with the implications of the constitutional issues around the nature and purpose of copyright and to work through with more care the questions about the social purpose of copyright than we have seen from Congress in recent years. Both decisions respect what libraries traditionally do, and they uphold the value of our doing it in an era when rights holders are finding ways to restrict library uses of digital texts. Though Congress is explicitly given the job of promoting science and the “useful arts” by granting limited monopolies, these court cases underscore that only part of that bargain has been upheld and that this imbalance is harmful to us all.

COYLE: This is another case where the judge was very supportive of education and fair use. One key similarity was that the GSU judge studied the actual number of uses of many of the contested works and concluded that unused digitized works were not infringements. I see a trend where digitization itself is not a copyright offense, which makes it much easier to engage in “just in case” mass digitization.

GRIMMELMANN: University libraries that use digital tools to enable their ordinary research and teaching mission win their lawsuits. Judges have learned that information technology is now part of the basic fabric of higher education; they would no more rip out the computers than rip out the desks. Where teachers and students are reading significant portions of books or articles and there is a reasonable way to pay copyright owners for that, courts are asking universities to pay, and they are generally quite willing to. But there never has been—even in the digital age—an expectation that every page, every word is metered.

Is Google no longer required to ask publishers if they would like to opt out of Google Books scanning? Will fair use trump contractual agreements that limit the scope of fair use?

FISTER: Here’s the way I remember this whole thing evolving. Amazon launched Search Inside one day in 2003, having digitized the contents of a lot of books without rights holders’ permission. (Publishers, who held publication rights, were consulted, but not authors, who in most cases held copyright to the works Amazon digitized.) It was generally accepted as an enhancement of Amazon’s sales platform, not as an abrogation of anyone’s rights. At the same time, Google was trying to work with publishers to make the contents of books searchable, but couldn’t get much traction. Publishers couldn’t see any advantage to collaborating with a search engine, and they were worried their content might escape and scamper around the web. Google upped the ante by getting a handful of libraries to allow them to scan their content. Though Google claimed to be like a library, it soon turned into a bookseller, which has given them better leverage for negotiating with publishers, who recently settled their lawsuit against Google so they could have a profitable working relationship. The Author’s Guild was agreeable to a settlement with Google until it was thrown out, even though the ground they assumed in representing authors as a class has eroded out from under them.

Google originally felt no need to ask rights holders whether they wanted to have their published works indexed or not. They based the legality of their efforts on the basis of fair use. When it looked as if they could come to a settlement with authors and publishers, they dropped that argument in exchange for an mutually beneficial and exclusive arrangement. The court rejected the settlement.

As I understand it, fair use is not something that contracts can necessarily limit; it’s a right inherent in the granting of copyright, a limit to the rights owners’ monopoly that benefits science and culture.

COYLE: This suit did not make any determinations relating to Google itself. In the seven or more years since the publishers joined the authors in the lawsuit against Google, publishers and Google have found a point of mutual benefit that covers a large swath of current and previously published materials. This brings up another question: Is there any reason for Google to continue its library partners project, now that it has a very large number of publisher participants (estimated to run to the tens of thousands)? There are numerous cases of books represented only by metadata in the Google Book Search database, which is evidence that digitizing “all the books in the world” is still an elusive goal.

GRIMMELMANN: The opt-out did not meaningfully enter into the decision. My sense is that if the opt-out had been called to Judge Baer’s attention more prominently, he might have emphasized its importance. At the moment, it now seems that publishers have made their peace with Google and are uninterested in opting out of scanning, so it is primarily authors who would care about it. For Google’s commercial search engine, there is a reasonable case that the opt-out is necessary; that case is weaker when applied to noncommercial libraries.

Does digitization now mean the lack of copyright? How will authors secure rights under the ruling?

FISTER: Rights holders still have the upper hand. There is no lack of copyright. There is, in fact, far too much in the way of restrictions. The extension of the term of copyright from 14 years [as legislated in 1790] to the life of the author plus 70 years [the Sonny Bono Copyright Term Extension Act of 1998] is a significant erosion in the balancing act that the framers of the Constitution envisioned. So is the fact that everything is copyrighted by default, without any action being taken. When assertive rights holders are forced to recognize that there are limits to their control, they tend to base their outrage on a sense that their work is property that they are grudgingly sharing with the world, but only on their own terms. In fact their work is not like real estate or goods, and copyright is merely a limited monopoly granted to them by the state. If I publish my work, it becomes in some sense public, and the public has rights. Deal with it.

COYLE: Copyright is alive, well, and unaltered. Fair use, however, has been reaffirmed, with some eloquence, as a necessary social compact to further the creation of new knowledge. Recent actions by publishers and some authors’ representatives could be seen as attempts to favor commercial gain over the social value of knowledge creation, including some publishers’ refusal to sell ebooks to libraries, as well as attempts to deny first sale rights. This ruling argues strongly for the constitutional view that the copyright monopoly is valid only in that it encourages more knowledge creation. Fair use is the balancing act between authors’ rights to control their work and the right of society to make use of it.

GRIMMELMANN: Copyright abides. The basis of this decision is that digitization to create search-engine indexes and make copies available to the blind does not interfere with the ordinary markets in which books are sold. If there had been evidence that the scans were leaking out or actually inhibiting sales of digital editions, the case would have been very different.

How will this decision change the face of humanities research in five years? Are there factors other than copyright that have hampered the digitization of one discipline over another?

FISTER: Apart from legal concerns, there are economic and cultural issues at play in the disciplines that shape how scholars shift gears to accommodate change. People will eventually expect digital access to the literature of all fields, and the restrictions we have placed on sharing texts and images will adjust to meet the need of scholars who want to be able to measure, mine, remix, and reuse content in ways that were unimaginable 50 years ago. In the past, scholars expected some library, somewhere, to have every book they might run across in a footnote. Soon, scholars may expect to have digital access to all those books. In the STEM fields, open access has been making strides in large part because scientists don’t always have the access they need to the literature they create and want to share. In the humanities, the legacy of the past will be tricky to negotiate—particularly the massive number of 20th-century books of uncertain copyright status—but these court decisions are encouraging libraries and scholars to explore ways of making this literature more discoverable and useful.

COYLE: Hurdle number one is the plain fact that most materials that humanities scholars wish to work with are not digitized, including large numbers of public domain texts. Those that are available are not always in comparable formats that would allow them to be studied together. The great advantage of Google Books is that they were digitized with more or less a single generation of technology, making them a truly viable research corpus. The question for educational institutions, libraries, and research organizations today is how to fill in the key missing members of the corpus, and do so quickly and coherently, should Google turn its attention elsewhere.

GRIMMELMANN: Some disciplines, like computer science and physics, have been composing their papers in structured typeset formats for years. Others, like physiology and art history, rely heavily on illustrations and other nontextual material. Their different relationships to print and to the written word will affect their relationships to digitization as well.


Keith Michael Fiels

Reenvisioning ALA

Reviewing and supporting our strategic goals

North Carolina State University student Tova Williams uses a tablet to tour campus with an eye toward African-American history at the university. Williams is using an app called Red, White, and Black, which started as a collaboration between NCSU’s Digital Library Initiative, the tour’s creators, and the library’s special collections. Photo: Charles Samuels, NCSU Libraries

University’s App Provides Tour of Black History

Students, faculty given access to materials and information not normally seen