Copyright and Provenance: A paper and an example

I’m happy to announce the publication of my paper “Copyright and Provenance: Some Practical Problems” in the latest issue of the IEEE Data Engineering Bulletin. I’ve also placed a copy in our institutional repository.

[Provenance of the work: Created by John Mark Ockerbloom, 2007. First published in Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, Vol. 30, No. 4, Dec. 2007, pp. 51-58. No previous works included; however, it derives in part from a previous presentation by the author at the 2007 Principles of Provenance Workshop in Philadelphia.

Provenance of the rights: Copyright originally by the author. Copyright assigned to IEEE, with certain rights retained by the author, via an IEEE Copyright Transfer Form (version as of Dec. 3, 2007) modified by a Science Commons addendum (Immediate Access 1.0).

Provenance of the preceding information: Asserted by the author, Jan 4, 2008.]

The bracketed paragraphs above should give you a taste of some of the provenance issues relevant to copyright clearance that I discuss in the paper. I wrote it primarily for computer scientists, essentially to argue that copyright clearance was an interesting and important application domain for research and development of provenance-aware systems, and to describe some of the basic issues involved. But it may also be of interest to librarians and others who are concerned about risk mitigation, efficiency, and value in clearing copyrights. It doesn’t go as deeply into clearance issues as other work in legal and library literature, but I hope that it provides a useful overview, with a minimum of technical jargon.

For what it’s worth, the draft proposal for embedding copyright status information in MARC records that I mentioned in an earlier post has a number of subfields for encoding the basics of the work-provenance I give above, as well as the information-provenance. It doesn’t have structured ways to express the derivation from previous work that I express above, or the rights assignment information, though these could conceivably go in unstructured notes fields.

Still, if it’s useful to use MARC (with the accompanying tradeoff between its installed base in libraries and its structural limitations) for encoding copyright information, the proposal looks to me like a good start (with some slight modifications I’ve suggested to the proposal committee.) But to enable large-scale copyright clearance with automated assistance, we’re going to eventually need more sophisticated data structures. Relatively speaking, the copyright of my paper is still a lot less complex than many other important examples.

I’m hoping that efforts like OCLC’s Registry of Copyright Evidence project will eventually provide ways of expressing more complex copyright issues in a structured manner. And if there’s any sort of global persistent identifier in a MARC record for a work (whether an ISBN, a DOI, a copyright registration number, or some other suitable identifier), it could be used as a key for linking bibliographic information in the MARC record with detailed copyright evidence in a registry.

Registries aren’t the only places this information can go, of course. Detailed, machine-readable copyright information can also be embedded directly in a work, thanks to the standards efforts of projects like Creative Commons. Which can be quite useful, especially for folks who want to dedicate their work to public use in a simple manner, and see no need to wait 14 years or more to do it.

About John Mark Ockerbloom

I'm a digital library strategist at the University of Pennsylvania, in Philadelphia.

View all posts by John Mark Ockerbloom →