Web-Scale Discovery Services

Finding the right balance

January 14, 2014

Marshall Breeding

Web-scale discovery services—tools that search seamlessly across a wide range of local and remote content and provide relevance-ranked results—have the ambitious goal of providing a single point of entry into a library’s collections. The four major vendors are OCLC, EBSCO, ProQuest, and Ex Libris. Ideally all possible online content providers are indexed, as well as the library’s local holdings. After four years of development, these products have come close to this ideal, but gaps persist.

Discovery services face complex challenges. For example, to accommodate the concerns of proprietary-content providers, discovery services must differentiate publicly available search results from content offered only to authenticated users. In addition, access to resources must be aligned with each library’s subscription and database selections. Discovery services only allow access to resources that a library is authorized to use through paid subscriptions, open access licenses, or the public domain. Often a library is a subscriber to both the discovery service and the content resource, allowing the best results.

Database publishers tend to make their materials available to discovery service providers to improve access and increase usage statistics, thus encouraging library renewals. In many cases, the full text of e-journal articles and other content is indexed along with the basic metadata required to retrieve citations.

Abstracting and indexing (A&I) services, such as those provided by Thompson Reuters/ISI, PubMed, and PsychInfo, are one area of controversy. These products are based on proprietary information—structured discipline-specific vocabularies, abstracts, and other elements that provide great value to the discovery process. In the form of standalone databases, A&I services offer precise search tools that aid researchers in finding scholarly articles. But A&I providers frequently express concerns that web-scale discovery services will weaken interest in their products.

The current business environment surrounding web-scale discovery reflects these complications. Of the major vendors, two (EBSCO and ProQuest) also publish major aggregations of content based on proprietary subject indexes. Because of a variety of business concerns, these two companies do not yet fully cooperate with competing discovery service providers. EBSCO, for example, does not provide metadata associated with its popular EBSCOhost databases to other discovery services for inclusion in their indexes, though it does offer access to the EBSCO Discovery Service API for libraries with mutual subscriptions. This type of noncooperation has been a point of frustration. The Orbis Cascade Alliance, a consortium of academic libraries in the Pacific Northwest, has engaged in a public discussion of its concerns about EBSCO not providing metadata to Ex Libris for inclusion in its Primo Central index.

It is also very difficult to quantify the relative coverage of these discovery services. One major consideration in selecting a service involves determining how comprehensively each product covers the library’s collection. This process involves a careful analysis of library holdings versus the stated coverage lists provided by each service.

A related concern involves how discovery services rank search results. A fully objective relevance ranking would order results without bias toward any given content provider. If the discovery service company is also a major content provider, libraries need reassurance that search results are not skewed.

Discovery services can play a vital role in a library’s strategic infrastructure. But it’s not a one-size-fits-all arena. The needs of public and academic libraries, for example, differ enormously. Libraries can select from a variety of options to deliver the best user experience, but they must be well-equipped with data and perspective as they place their bets in this critical area.