As we begin Open Access Week, it’s worth noting the importance of open access not only to research articles, data, teaching materials, and the like, but also to books. We are fortunate not only that millions of historic volumes are now openly accessible from various digitization projects, but also that many recent volumes are also available as open access from a variety of academic presses, government and nonprofit agencies, and other individuals and groups.
One thing that’s been slowing down the use of open access books is that information about them has not been distributed and exploited as widely as it could be. While you can search for lots of books in places like Google Books and the Internet Archive, the interfaces and metadata they provide is not always ideal for finding what you want. It may be hard to find multi-volume works, for instance, or to browse by subject area to the level of detail that researchers expect, or to find information about books not actually managed by the site you’re searching.
Part of the problem is that the library community faces its own open access issues with its cataloging data. Many libraries use OCLC’s WorldCat to collaborate on cataloging books, but WorldCat is not open access, as defined by projects like the Budapest Open Access Initiative (which uses a definition that includes free reuse and redistrihution of “open access” material by anyone). After months of debate (including some discussion on this blog), OCLC decided to adopt a policy that allows access and reuse of WorldCat-mediated data by OCLC members, but limits use and redistribution outside the membership.
A number of libraries and library-related organizations, however, have taken a more open approach. For instance, several German libraries and the biblios.net project make their bibliographic data available for reuse without restriction. The British Library is now making its bibliographic data generally available for non-commercial use. And the Open Knowledge Foundation has also released a draft of working principles for open bibliographic data, recommending that bibliographic data be made available with as few restrictions as possible (ideally, with public domain dedication).
Once you’ve opened your data, lots of people can reuse and adapt it in useful ways. For instance, I have harvested metadata provided without restriction by Hathi Trust on over 1 million freely readable online volumes they have in their digital collections. Today I have made it browsable and searchable on The Online Books Page. Not only does this let users search across lots of books digitized by Google, Microsoft, and various other projects large and small, but it also provides new ways of exploring the Hathi collection not previously possible, such as browsing through subject maps of the collection. (See, for instance, how you can explore various battles and campaigns of the American revolution, with both Hathi and non-Hathi titles.) My announcement on The Online Books Page has more details about the new Hathi books, and the new “extended shelves” that will eventually include additional collections as well.
The Hathi data I’m using is not as rich as full MARC catalog data would be. (I’m getting it from their OAI data export, which strips out some information from the original catalog records, and I’m currently using their Dublin Core data instead of their MARC data.) Fortunately, I can use other open data to make automated improvements to the data I get from Hathi. In particular, I’m using open subject authority data provided by the Library of Congress to automatically update many of the subject headings in the Hathi data, so that they’re compatible with present-day cataloging practice. (I describe the basic technique in a previous post.) In the future, I plan to use further data sources and automated methods to make author names and subject assignments for books more consistent and complete as well.
I hope the new extended shelves will be useful to users of both The Online Books Page and Hathi Trust’s online book collection. Others are free to reuse the same data I used to create similar, or better, book searching and browsing indexes. I’d like to thank Hathi Trust, the Library of Congress, Google, and the other digitization, preservation and copyright-clearance partners of Hathi for providing the open data that makes it possible to liberate all of these books. And I’d love to hear from readers browsing the new, extended online bookshelves.
What about Internet Archive’s Open Library? I see it as a wonderful open access competitor to WorldCat, and it’s collaborative.
OpenLibrary has a lot of potential, yes, and I hope to include some of the online books it indexes in my catalog as well. I hope you’ll see more about this in the not-too-distant future.
I don’t really see it as a head-to-head competitor against WorldCat, and at least so far it doesn’t seem to be set up for large-scale collaborative cataloging (that is, where lots of libraries are actively working on shared catalog data), like WorldCat is. But who knows whether it, or some other site, might emerge as such a collaborative hub in the future? As with other innovative domains, I don’t expect whatever eventually succeeds WorldCat to look just like its predecessor.