Artificial intelligence (AI) and machine learning are everywhere, giving driving directions and identifying objects in photographs. They are so engrained in our technology that often people don’t realize what they’re experiencing is a machine learning system. Everyone with a smartphone has an AI system that uses machine learning.
For example, Google’s Android operating system records, measures, and collects information and sends that data to servers. These servers use billions of data points collected from tens of millions of users as input for their machine learning systems. When you ask an Android phone to show you pictures from the beach, a complex set of data moves back and forth between your phone and Google’s servers, comparing your photos to the billions in its data set. The search results include pictures that the AI decided were most likely to be related.
Since Google has billions of photos to assess and millions of people helping it train its AI, the decisions that the AI makes are generally good. But AI is only as effective as its training data and the weighting given to the system as it learns to make decisions. If the data is biased, contains bad examples of decision making, or is simply collected in such a way that it doesn’t represent the full problem set, the system will produce broken, unrepresentative, or bad outputs.
For data privacy and security concerns, localized machine learning has an advantage.
Apple, on the other hand, has chosen to model its AI and machine learning by analyzing and weighting your data locally on the iOS devices themselves. Your devices use the same machine learning algorithms to include your photos in Apple’s preset weights, but they aren’t pushed to Apple’s servers. Because each data set is analyzed locally, there is no shared decision making as there is with Google. Each device must do heavy lifting itself, rather than rely on remote servers for the bulk of the work.
For data privacy and security concerns, localized machine learning has an advantage. If you don’t need to send photos and data back and forth from server to client, and if providers don’t need to store and host data, the data’s vulnerability to attack is greatly reduced.
The examples above focus on object and image recognition in photos by a machine learning system. This is only one of dozens of uses for AI and machine learning systems.
It’s also easy to see how an AI system is useful for libraries and archives in creating metadata from digitization projects. AI systems can be trained to recognize locations from a single photograph—including where the photographer was standing—based on angle, geography, and other factors. These systems can be enormously useful in making the processing and cataloging of archives and collections more discoverable.
As more libraries and library vendors move into developing AI and machine learning systems, we should be sensitive to the privacy implications of collecting and storing the data that’s needed to train and update those systems. As with existing systems where we outsource data collection and retention to vendors, libraries need to be aware of the mechanisms by which that data is protected and how it may be shared with others through training sets. Where libraries can provide local analysis in the style of Apple and iOS, they should.
The opportunities associated with new machine learning systems to reform large portions of library activities will be rich and varied. While it will be some time before AI will conduct full conversations or reference interviews with students and patrons, the use of AI as an increasingly powerful lever inside other systems will progress quickly over the next three to five years. Libraries can watch these systems as they develop, work with vendors, and create their own services and systems so that our values and ethics are baked into the technology at the outset.