In my previous two posts, I discussed how open library metadata is becoming increasingly important for the future of library content, and how OCLC’s new catalog policy works against it. By asserting proprietary rights over the records in WorldCat, OCLC risks relegating libraries’ painstaking descriptions of their resources to the sidelines of Web-based innovation and aggregation. In the long term, this threatens to marginalize libraries, and their missions.
In this last post of the series, I suggest responses that may improve the situation. OCLC’s new policy won’t go into effect until February, and it’s evolved since it first hit the Net a few weeks ago. Commenting on the policy, OCLC vice-president Karen Calhoun has said
We believe that libraries, the wider library-archives-museum community, and those they serve will benefit from the updated policy without placing our shared investment in WorldCat at peril.
I’ll take her word that OCLC believes that they’re acting in the best interests of the members of the WorldCat cooperative. I just don’t think they actually are acting in libraries’ best interests in their current proposal. But ultimately the libraries themselves are the most appropriate ones to determine that, not I, or OCLC’s officials. So my first suggestion is:
Get informed, and speak up. There’s been a lot of discussion about OCLC’s new policy, and you can find pointers to much of the public online debate here. So far, I’ve mostly just seen OCLC officials defending the policy, and various individuals inside and outside libraries criticizing it. The libraries in the WorldCat cooperative need to make their voices heard as well. Whether they’re for or against the policy, they should have a say on the fate of the metadata corpus they’ve worked to support and populate.
OCLC’s new policy doesn’t just affect traditional library catalog applications. The MARC records OCLC manages can hold lots of important additional information about library materials, including digitizations, copyright, and preservation. I’ve advocated cooperative information sharing in these areas (and participated in planning discussions for some of them), and I’ve been pleased to see OCLC’s leadership in these areas. I don’t want to see these initiatives held back by restrictive sharing policies attached to their records.
Know (and assert) your rights. The main legal justifications for restricting the reuse and sharing of catalog records are copyright law and contract law. Contract law may sometimes override rights that one would normally have under copyright (such as fair use and public domain), but generally requires some sort of specific agreement between parties. If you’re working with OCLC, you might want to make sure you don’t sign away rights you might otherwise have to your own cataloging, or to that of other libraries. I’m not a lawyer, and I can’t give you advice on how best to do this, but adding explicit statements of your rights, and modifying or striking out questionable concessions in OCLC contracts, might be in order.
It’s worth noting that OCLC’s policy, in its current draft, specifically says that it does not restrict simply associating an OCLC control number with an otherwise independent record, or a library reusing and redistributing WorldCat records marked as its own original cataloging. Libraries may want to explicitly affirm these prerogatives. (And third parties can freely attach OCLC numbers to their own records, and could make arrangements with libraries that have done original cataloging to free their records.)
The extent to which copyright can apply to catalog records is not a settled matter, to my knowledge, but the US Supreme Court’s Feist decision makes it clear that, under US law, simply compiling facts without “requisite originality” is not sufficient for copyright. It follows straightforwardly, in my opinion, that basic factual citation information in a catalog record (such as title, author, and edition) and other objective facts (such as page counts and physical dimensions) are no more copyrightable than the names, addresses and telephone numbers in the Feist telephone directory case were. In addition, works originally created by federal government agencies (like the Library of Congress) are not subject to US copyright. In other countries, though, copyright or other restrictions might apply to objective facts; and US copyright might apply to more subjective parts of a catalog record (like prose descriptions or subject analysis). If you want to remove doubt about the reusability of your metadata, and prevent it from being made proprietary, I suggest:
Attach a Creative Commons Attribution-ShareAlike license to your records. (Update: But see the discussion in the comments below about alternatives, and pros and cons of different approaches, before deciding to go this route. – JMO) As I’ve noted previously, the Attribution-ShareAlike license allows any sort of reuse of a record, but requires that its source be credited, and that any copies or derivatives of the record remain shareable. Creative Commons licenses specifically reaffirm all the normal rights that users have under copyright law, so it should be safe to add them even to records that might be public domain, without compromising their status. That’s the license I use for my Online Books Page records.
The license should cover the record itself, and records that derive from it. It should not apply to the entire contents of any database that contains the record (as some have suggested). A database-wide viral license has both philosophical and practical problems. (There are lots of reasons why I might want to have an ILS or union catalog that contains both proprietary and Share-Alike records. For instance, I might have various electronic collections for which I’ve previously bought proprietary records, completely independent from my open records.)
OCLC can use Attribution-ShareAlike records in WorldCat and in any of the services they sell, just as any other commercial or noncommercial organization can. (They could also include them alongside proprietary records.) But they can’t add their own restrictions to the records — which I’ve noted previously are incompatible with ShareAlike — without violating the Creative Commons terms, or implicitly admitting that the record is public domain. Thus, it’s possible to pre-emptively “inoculate” records contributed to WorldCat, or to any other aggregation that includes the records, against their being made proprietary.
OCLC could decide to refuse records with Attribution-ShareAlike licenses. But they’d have to go out of their way to do this, and they’d be acting against the stated wishes of the contributor. I think it’s worth seeing if they’d push back like this. If they did, it’d be worth bringing to the public’s attention, and may also be worth distributing the records outside of OCLC. Which brings me to my last suggestion:
Consider alternative methods for sharing library metadata. Just as OCLC is not the creator of most of the library metadata in WorldCat, it is not the only possible coordinator of library metadata. A number of other organizations are also aggregating descriptions of library resources, some for specific applications, like Google’s Book Search, some for social networking businesses, like LibraryThing, some for preservation, like Hathi Trust, some for free-for-all sharing, like Open Library. Other data hubs may arise as well. (Indeed, an open WorldCat could remain a vital hub that enriches those other aggregations, and is in turn enriched from the information aggregated at the other hubs.) Libraries might contribute records to multiple aggregators.
It’s important to remember that any broad-based data cooperative is not likely to completely satisfy all its members. An alternative to OCLC will not necessarily be more open, unless its members hold it to that standard. Reliability, quality control, and seamless interaction are not easy to provide, and participants in alternative networks will have to put time and effort into getting them right.
New cataloging cooperatives could also provide places to experiment with better representations and workflows for library metadata. Implementors need to be careful about getting sidetracked here; past experiences with RDA and other proposed changes to cataloging infrastructure show that new initiatives can be argued over for years without much progress. But if new union catalogs are compatible with existing catalog systems (such as by providing and accepting standard MARC records), and support efficient workflows, they can potentially represent metadata internally in new structures that might be more informative and easier to maintain. This could improve library cooperation and sharing across the board (and maybe improve WorldCat itself in the process).
This ends my discussion, at least for now, of OCLC’s new policy and its conflict with open library metadata. I hope it’s helped inform and advise readers about the debate, and the issues at stake. And I hope it will help readers determine where they stand, and how they should respond.
A Creative Commons license is inappropriate for cataloging records, precisely because they are unlikely to be copyrightable. The whole legal premise of Creative Commons (and open source) licenses is that someone owns the copyright, and thus they have the right to license you to use it, and if you want a license, these are the terms. If you don’t own a copyright in the first place, there’s no way to license it under Creative Commons.
The Talis-initiated Open Data Commons project was motivated by this fact, to find a suitable open access license for _data_. This turns out to be a tricky thing to do, in part because intellectual property in data is treated fairly differently in different jurisdictions, AND is still an open legal question in many aspects–to create a license that would apply to all jurisdictions, and apply regardless of how certain open legal questions (in some jurisdictions) get decided is tricky.
The lawyers involved in the Creative Commons project actually decided that it was unworkable to create an enforceable license for data that was ‘some rights reserved’ like creative commons–that says ‘you can do X, but not Y’ or ‘if you do X, then you have to do Y.’ (eg, Share Alike). The only workable thing to do was to release data into the public domain, saying whatever copyright or other intellectual property rights you may or may not have to it, you relinquish them, or if you can’t legally relinquish them for some reason, then you grant full and complete rights to the data to anyone, as much as the law will let you.
The result is the Open Data Commons Public Domain Dedication and License.
I actually think it’s a mis-step to try licensing your data under Creative Commons. Actual lawyers looked into this, and the result was the Open Data Commons Public Domain Dedication and License. It’s what it was designed for. I recommend using it.
PS, a slightly expanded version of the prior comment, including citations to legal opinions on why CC is inappropriate for data, and a statement from CC itself that CC is inappropriate for data, can be found on my blog:
Thanks for your comments, which are food for thought. I’ll insert a pointer down to these comments in the license section of the main post.
I like the ODC PDDL, and think it would be a good long-term consensus equilibrium for library metadata. It’s not clear to me why a “ShareAlike applies only insofar as copyright/database restrictions *can* apply” license causes legal uncertainty– if copyright/database restrictions apply in a jurisdiction, it’s binding; if they don’t, it’s only advisory. (But IANAL.)
It looks to me that the more compelling reason for Science Commons to have moved from Creative Commons licenses to a public domain dedication were social/workflow issues rather than legal issues. In particular, the overhead involved in keeping the attribution trails, and possible incompatibilities with other open data arrangements, made the license more trouble than it was worth in many scientific applications.
That’s something to consider here, and a good argument for going to ODC PDDL in the long term. In the short term, a lot of libraries are depending on OCLC for sharing cataloging, OCLC is now planning to put explicit restrictions on what we share, and there is not (yet) either a well-established alternative that does all that shared-catalogers in libraries need, or a clear legal holding on the status of catalog records as a whole.
In such a situation, where libraries are relying on WorldCat for data sharing, it’s not a choice between ShareAlike and public domain dedication; it’s a choice between ShareAlike and OCLC restrictions (since their restrictions can override a public domain dedication, but not a ShareAlike license). So a ShareAlike may make sense as a near-term defensive measure.
This situation may well change before long, with the rise of alternative sharing hubs. I don’t yet see one that can play the roles WorldCat plays for shared cataloging, though one might develop before long. (The biblios.net announcement I saw this morning looks promising, I admit, but it’s just started its beta phase now and we’ll need to see how it works out.)
It’s possible to shift after the fact from an Attribution-ShareAlike license to something like the ODC PDDL, if the folks mentioned in a record’s attribution gave their consent to this. As a practical matter, this may suffer from the same problems of tracking down rightsholders that we see in other copyright clearance issues (though one would hope that libraries are easier to reach than random authors).
Ideally, maybe you’d have a defensive license like ShareAlike when you needed it, but have it get out of the way without fuss when it was no longer needed. Perhaps, for instance, one could invent a license that specified ShareAlike for an initial term, but then automatically expired in favor of ODC PDDL after certain conditions held, or after a certain (fixed) date passed. I wouldn’t feel confident about drafting something like this myself without a lawyer (and other shared-cataloging initiatives might make it moot). But it may be worth considering further.
If you submit records to OCLC with a ODC PDDL, then they are public domain, and no matter what restrictions OCLC tries to put on them, anyone can get them from another source (like you directly) and do whatever they like with it. I think that’s good enough.
I think efforts to try to force OCLC to share records from their own infrastructure by putting in a subtle ‘back door’ license are not going to work, even if it WAS legally enforceable, which it’s probably not. The way to get OCLC to change their policy, is for our libraries to apply pressure to OCLC through official channels.
But while that’s going on, it is important that the original cataloging that libraries _continue_ to do does not become de facto owned by OCLC. Making sure it’s all ODC PDDL seems to accomplish that. Include a 996 saying so in your own OPAC, even if OCLC removes it from your data once you contribute it to them.
It would be nice if there were a machine-readable indicator that your content in your own OPAC was ODC PDDL though. The URL to the ODC PDDL would serve, except that they ask you not to link to it, doh. But we should come up with a suitable one, or ask Talis to establish one, I’m sure they would.