
Libraries value patron privacy. Yet a scan of current practices reveals uneven activation of the basic technology to secure web-based library systems. Encryption of data presented in online catalogs, discovery services, and other resources is essential to protect privacy. Without encryption, the content that patrons search for, view, or download is easily intercepted. These online streams of communications deserve the same protection granted to circulation records, but few libraries are taking even minimal steps to encrypt this data.
Secure communication on the web provides two important benefits:
- identifying the website authoritatively
- enabling encrypted communications between the user’s browser and the server that provides the resource
Encryption algorithms transform the data into a seemingly garbled form that, if intercepted, cannot be deciphered.
The use of a secure communication protocol (HTTPS) provides the best approach available today for protecting patron privacy. With HTTPS, a page remains encrypted from the time it is transmitted by the web server until it is displayed on the user’s browser. The information remains impervious to eavesdropping throughout its route, even if it passes through unsecured wireless networks or other points of vulnerability. The use of HTTPS has expanded from securing passwords and credit cards to all types of online services, and it is now widespread among commercial services, including Facebook, Twitter, and all Google services.
Enabling encryption on web-based resources has never been easier. Encryption with the HTTPS protocol requires minimal computing resources and is not difficult to implement. The user’s browser will indicate that the transmission is secure. Chrome, for example, identifies a fully valid and secure site with a green padlock and shows HTTPS in the URL; clicking on the padlock displays the details of the certificate.
Out of 124 ARL member libraries, only 16 (13%) use HTTPS on their main websites.
My Library Technology Report. Privacy and Security for Library Systems (vol. 52, no. 4), aims to assess the extent to which libraries use encryption to secure their patron-facing interfaces. In December 2015, I inspected the websites of representative groups of libraries, including members of the Association of Research Libraries (ARL) and the largest 25 public libraries in the US. These libraries are the most likely to have the technical capability and financial resources to implement secure systems. The data represents a snapshot of current practices and a baseline to measure changes that are taking place. Here are some of the key observations:
- Out of 124 ARL member libraries, only 16 (13%) use HTTPS on their main websites.
- Out of the 95 ARL member libraries that feature an online catalog search on their websites, only 12 (14%) default to HTTPS for search activity.
- Out of the 100 ARL member libraries that feature a discovery service on their websites, only 17 (17%) default to HTTPS for search activity.
- Out of the 25 large public libraries, only two (8%) use HTTPS on their main websites, and only seven (28%) default to HTTPS for catalog search activity.
The results of this study are alarming. My vendor survey of library automation systems shows that all have the technical capacity for encrypted secure communications. Only a small percentage of libraries have implemented encryption for their online catalogs or discovery services. Similarly, few implement their websites with security, which is also a standard capability of commercial and open-source web servers or content management systems.
We could attribute this lapse to gaps in awareness or a lack of expertise to reconfiguring implementations. Vendors and libraries can partner to reshape the security landscape quickly if this is identified as a priority.
This is an important discussion to be having. Privacy issues are a topic that libraries are ideally suited to cover — both internally and publicly. Most of us leave enormous data trails behind us, and the legal (if not necessarily ethical) uses of that data are many and varied.
There is an important component of web privacy that is not covered in the article. HTTPS encrypts both the content of a web site and the data that you post to it. That seems like it covers everything, doesn’t it? But it doesn’t: every web page also has an address, called the URL (Uniform Resource Locator). It’s that cryptic text in the little box at the top of your browser. The body might be encrypted, but HTTPS cannot do anything to hide the URL. Why is this important? If you query a certain popular OPAC for books about privacy, you do so via this request:
http://www.mylibrary.com/eg/opac/results?query=privacy&qtype=keyword&fi%3Asearch_format=&locg=1
Notice that this address contains two interesting pieces of information:
– The site name (which I have changed above to “www.mylibrary.com”)
– The information you are seeking (“query=privacy … type = keyword”)
This information travels insecurely across the Internet, regardless of whether you are using HTTP or HTTPS. Anyone in the world knows (if they want to) that I was searching for books on privacy. The best that can be done is to disguise which PC sent the request, so it cannot be traced back to the me. (The TOR browser can do this. But that’s a separate, and much larger, topic.)
From a technical perspective, this is an easy problem to fix. There are ways to send information to a web server that don’t involve the URL, and that would be encrypted by HTTPS. But it appears that authors of some OPAC software either do not consider or do not understand the privacy implications of their design choices.
Finally, for those who don’t see any need to worry whether someone is tapping into their online work and play, I leave you with this thought from Edward Snowden: “Arguing that you don’t care about the right to privacy because you have nothing to hide is no different than saying you don’t care about free speech because you have nothing to say.”
For HTTPS, the domain of the site you are visiting is visible to eavesdroppers (if only as part of the initial establishing connection with the web server prior to the encryption handshake), but not the full URL. The browser doesn’t send the HTTP request headers (which include the URL, post data, etc) until the secure connection has been fully established with the web server.
Thanks, Dave. I should probably have known that…. I feel better now that I do.
What is often more disturbing than the lack of HTTPS for such uses as catalog search activity, is the number of libs not even using HTTPS for basic member login activity, particularly for public libraries in the UK. It’s important to protect data such as user search details, but it should be assumed that passwords and logins, and personal data would be encrypted. Of course, although they shouldn’t do so, users often share passwords and PINs between their accounts. A library can’t just think only about the data they hold for a user. If they reveal a user’s PIN that the individual also uses for their bank account, they are then as culpable for anything that happens to that account as well.
Then, for more distrubing news, the same is often true of the mobile apps for patrons built by major library suppliers. These often don’t ensure that they use HTTPS. The difference there is that unlike using any major web browser, mobile operating systems don’t tell a user if an app is using an encrypted connection or not. So users are none the wiser that such details as their user ID and PIN are being sent unencrypted, easy enough to intercept and view if on the same network. That in turn could give someone access to fines, loans history, name/address, email address, cloud storage, bank details….
This article focuses on one isolated solution, HTTPS, but the reality is that security and privacy have to be treated systematically. I put forth these additional steps that libraries can take based on best practices at the 80+ public libraries we manage at Library Systems & Services:
– encrypt data both at rest (in the database) and in transit (including backups)
– authenticate servers via security certificates, the encrypted codes provided by internet agencies assuring users that the server is legitimate
– deploy well-regarded virus and spam protection, and firewalls to prevent suspicious activity both at the library and at the server side
– use Active Directory Authentication to assign rights and privileges, and to give IT the ability to shut down or limit access when problems arise
– backup incremently, perform full data backups on the weekends, and take routine snapshots at four-hour intervals for key applications to minimize the impact of a breach
– have a disaster recovery plan approved by insurance providers
– beware of consumer-oriented wireless access points (use of a local library network or ideally the cloud is a better option).
So yes, libraries can do a lot to protect personal data and a comprehensive plan is the place to start.
Dave Maxfield
Chief Information Officer
LS&S