(First of a series of 3 posts. See also Part 2: Problems with OCLC’s catalog policy, and Part 3: How to respond?)
A new bibliographic record distribution policy from OCLC threatens to split the library community from the increasingly large and valuable data sharing resources and applications on the Web, if it doesn’t simply fracture the library community internally. In this post, I’ll try to explain the basic issues. I hope to follow this with specific critiques of OCLC’s policy, and suggestions of useful responses. (If you’d just like a concise overview of the controversy, this Inside Higher Ed article is a good place to start.)
Librarians often lament that people all too often bypass the well-written, informative resources collected by libraries in favor of material on the open Web. One reason for this practice is that Web content is widely harvested, indexed, and annotated by a variety of entities. Some are well-known, large-scale services like Google, Technorati, or Delicious. Some are smaller services that specialize in a particular niche. Each of these services reuses openly accessible data to make Web content easy to find, search, and annotate. Information about what’s in libraries (“metadata”, in library parlance) has also been compiled with great detail and effort, but it tends to be locked up behind individual libraries’ online public access catalogs (OPACs) that tend to be less usable, and less visible, than services like Google.
It doesn’t have to be this way. If you follow this blog, for instance, you’re aware of the DLF ILS Discovery Interfaces recommendation, which is meant to free library metadata from the constraints of the OPAC and make it available to a wide variety of discovery applications. And a growing number of Web services like Open Library, LibraryThing, WorldCat.org, and Google Book Search aggregate metadata from a variety of libraries and other knowledge sources. They open up interesting new ways for people to find, use, and annotate books and other knowledge sources. Bringing library resources into the light through services like these helps readers find the best information, and helps libraries fulfill their missions.
These services, as well as libraries themselves, rely on aggregating the metadata they need from a wide variety of sources. Creating catalog records for books is a laborious and painstaking process, one that would be too labor intensive for most libraries acting on their own. So librarians long ago agreed to partition the work, exchange their records, and enhance them jointly, though the use of shared cataloging and union catalogs that combined the different libraries’ records. Union catalogs were first devised well before the Web, when libraries mainly traded information just among themselves. Union catalog participants typically contribute their own catalog records, and pay a subscription fee for the right to retrieve and reuse records from the union catalog. One current union catalog, OCLC’s WorldCat, has absorbed other union catalogs over time and is currently much larger than any other of its kind. The WorldCat.org website gives free public access to some, but not all, of the information in the subscription-based WorldCat.
In effect, libraries are paying their staff to create catalog records, giving them to OCLC, and then paying to get them back. This isn’t necessarily a bad thing if you’re paying a reasonable price for a useful service, such as OCLC provides with its WorldCat service. But such arrangements can turn exploitative or obstructionist over time. We in libraries are all too aware of this when we see the invoices for the journals we subscribe to, where the articles are written and refereed for free by our faculty. The subscription fee in some cases can cost as much as a new car just for one year of a single journal. And the scholars who wrote the articles typically sign away their rights to get them published, and then can be surprised to discover that they are restricted from redistributing or reusing what they themselves wrote.
An alternative advocated by many library professionals (myself included) is open access, where intellectual content can be freely shared and reused. We have lots of arguments about how open access can lower costs, increase visibility, and promote the global spread of knowledge. The arguments are not just about economics and philanthropy, but about improving scholarship. For instance, when data is freely shared, it can be fruitfully be reused, repurposed, remixed, and reanalyzed in new scholarship and teaching. Yet, even while libraries have promoted open access, open access has not been the principal ways in which we have shared and distributed our own cataloging.
At Palinet’s future of cataloging forum I attended earlier this year, I heard folks in various parts of libraries start to speak up about opening up access to our own cataloging data. My own talk at that forum argued for opening access to catalog data, and recommended doing so via Creative Commons Attribution-ShareAlike licenses. There were also OCLC staff at the same forum, and while they did not promise anything specific, I got the sense that they were planning on opening up access to their WorldCat records.
The new policy does clarify how individual researchers and libraries can reuse and repurpose WorldCat records in some useful ways. Unfortunately, it also explicitly asserts OCLC control over these records, in a way that threatens to dampen much of the sharing and independent collective action that can make our library metadata much more visible and useful. In a followup post, I’ll summarize the problems I see in the policy, and then suggest some things that we might do to help free our library metadata for the benefit of our users.