What a difference a few years can make. A few years ago, folks in the library world (myself included) were arguing about whether it was a good idea to let other people copy and build on their catalog records. Whether or not libraries could or should reuse and redistribute records from WorldCat, for example, was up in the air. Some of us were starting to take small steps towards putting catalog records under open licenses. For instance, I licensed the catalog records I created for The Online Books Page under the Creative Commons Attribution-ShareAlike license some years back. At the time, that was farther than many library projects were willing to go.
By now, though, there’s been a definite shift towards wider and more common opening up of bibliographic records. Large libraries like the German National Library and Harvard have released millions of the MARC records into the public domain. OCLC has revised its data policies to give their blessing to member libraries releasing their catalog data under ODC-BY. (They’ve also released some of their own data sets, like VIAF, under that license.) And large online library collaborations like Europeana and the Digital Public Library of America have adopted a policy of public domain status (using the CC0 declaration) for their bibliographic metadata. Both of these projects are now supplying promising platforms for projects that can aggregate, reuse, and build on this data in interesting and useful ways.
Now The Online Books Page is joining the CC0 party as well. Yesterday I put a CC0 notice on the more than 50,000 catalog records I’ve created for The Online Books Page’s curated collection over the last 20 years. (Yes, the site turns 20 this summer– it’s hard for me to believe it’s been up this long.) The “curated collection” refers to all the records that I’ve personally edited, as opposed to automatically importing from other projects. (Those records, automatically imported from sites like HathiTrust and Project Gutenberg, account for well over 1 million more books, and make up what I call the “extended shelves” of the site. I replace extended shelves records with personally edited “curated collection” records on request.)
While 50,000 records isn’t a huge number, compared to the number of records found in massive metadata collections at places like Google Books and the Harvard Library, I think it is still a useful metadata compilation. It covers a lot of important free online books and serials that for one reason or another are not in the large electronic book archives. I’ve also made various efforts to enhance the value of these records beyond what many of the more industrial-scale projects provide, including collating multi-volume works and serials, applying Library of Congress subject headings that support interesting modes of subject browsing, standardizing personal names (which enables linking between libraries and Wikipedia), and various other improvements. And I continue to add new records and improve existing ones in response to reader requests and corrections, and make them CC0 when I do.
More details about what data’s been placed into the public domain, and how to get our data, can be found on the site’s Copyrights and Licenses page. I hope people find it useful. And I’m thankful to all the other people and organizations that are now openly sharing their bibliographic data, and the people that are using it to make works easier to find and use online.