Many libraries are eagerly digitizing their materials and making them accessible online. This enthusiasm often stems from patrons, who are excited to make use of the resources on the internet, or from administrators, who are intent on elevating the library’s public profile.
But the push for digitization often puts pressure on library staff to make digital objects available before they have been properly prepared. This can turn into a numbers game in which the quantity of the images becomes more important than the quality of the descriptive metadata attached to them. This can result in images that are either minimally described or else tagged with whimsical, crowdsourced descriptors. While these might seem quick, easy, and convenient solutions, they do not make the images as findable as they would be with authoritative, consistent, and detailed (low-level) metadata.
Here is one way to look at it: The quality of metadata is inversely proportional to the speed with which digital objects can be uploaded and published online. The more detailed and descriptive the metadata, the longer each record takes to complete and process. Less detailed descriptions take less time to create, and the records can be ingested more quickly by the repository’s digital asset management system.
Simply put, records with broad or general (high-level) descriptions populate the database faster. This may work for born-digital records in some collections, but it is not really feasible for extensive collections of historical photographs, where each image is unique and an item-level description is desirable or required. Minimal descriptors may work for smaller photographic collections, but maintaining this standard will have an adverse effect as the collection grows.
Retrieving the best images
The findability of items in a large collection is directly proportional to the level of description for each digital object. This might seem like a no-brainer, but if description is to be consistent throughout the database, the ultimate size of the collection should be taken into account at the outset when the depth of description is standardized.
A researcher can effortlessly wade through a small collection with only a few descriptors; a results list of 10 hits in a database of 100 records is easily reviewed. The researcher who obtains longer results, however, will have a proportionally more difficult time identifying the material he is looking for.
For example, 1,000 records will yield 100 hits; 50,000 records will yield 5,000 hits; and so on. In an online environment, users will click their way through the hits. You can recommend narrowing the search terms, but if the metadata is at a high level, the chances of refining the search successfully are minimal.
Even if you are planning to digitize a small collection, you should give serious consideration to implementing low-level descriptive standards. Collections often grow or merge unexpectedly, or they can become associated with others as part of repository-wide or regional collections. What’s more important, you will have consistent descriptive standards at the outset, without needing to go back and edit or redescribe everything. In this type of federated environment, insisting on detailed descriptive standards throughout your institution will permit you to collate items from various collections into one results set. Using hyperlinked descriptors (keywords or subject headings) pulled from controlled vocabulary lists makes this all the more meaningful by grouping together similar records. The more descriptors, the better the functionality and the findability.
The folly of crowdsourcing
Finally, crowdsourcing can be used as a supplement to well-described metadata, but it should never be considered as a replacement or the standard for an entire collection. Though it presents an alluring, interactive vision, crowdsourcing offers little to increase the findability of records within a database.
An informal, unscientific survey that I conducted via the Society of American Archivists metadata discussion list revealed that only a handful of institutions with large photograph collections used crowdsourcing (fewer than I expected). When I examined these, I could not find many comments or tags. Even though I had specifically requested examples of collections that were not hosted on social media sites like Flickr, survey respondents inevitably offered social media sites as crowdsourcing examples. Even in large Flickr collections, when images lacked in-depth description, they also lacked meaningful comment and were untagged. This indicates to me that without an adequate level of description, the images were not accessible or findable by those who wished to participate.
Often, when users did comment on images, they did so without providing useful information that might help to identify or describe the people or places depicted. I found comments like “Great photo!” or “Nice hair,” but little else that added to the description. The most successful crowdsourcing initiatives were sites with small subsets of images that consisted of “mystery photos” or photos needing identification. These were always small, manageable groups that users could easily wade through. Crowdsourcing gets those users who are predisposed to participate interested and invested in the project, which is a good thing. But, at least for now, it does little to advance description and access. Crowdsourcing should be considered nothing more than added value.
And so, while the quality of metadata is inversely proportional to the speed of processing, the findability of images in a large photographic collection is directly proportional to the level of description applied to each record. Item-level records in a large collection with little or no descriptive metadata are of little use to anyone, even to willing participants in crowdsourcing efforts. In order for photographs to be useful to researchers, they must be described adequately. As a result, low-level descriptive standards should be put in place before the launch of any potentially large online digital photograph collection—regardless of the extra time involved.