Don’t shade your eyes

Back in 2006, Paul Collins wrote an article in Slate asking “Will Google Book Search uncover long-buried literary crimes?” Now that we have large corpuses of texts searchable online, he argued, it will become much easier to find words lifted from other writers than it once was. (Collins reported on a few such cases found in early GBS searches.) The Net may make it easier for people to misappropriate other authors’ words, but it also makes such misappropriation easier to detect.

Indeed, it’s starting to finger some well-known contemporary authors. Last week, some bloggers used GBS to finger a prolific, and still publishing, romance author who was repeatedly plagiarizing the works of others. (The linked story is the first report of the news last Monday; its sidebar currently point to numerous followups, including more examples uncovered by the same blog and its readers.) Cassie Edwards may be one of the first well-known current authors to be caught out via online-library Googling, but she’s not likely to be the last. Here are some things mentioned in the ensuing discussions that may be worth remembering the next time this sort of thing comes to light:

Plagiarism is not the same as copyright infringement and is therefore not justifiable with a “fair use” excuse. Plagiarism and infringment both involve improper copying, but otherwise they have important differences. Copyright infringement is copying without proper authorization (either from the copyright holder, or from copyright law), and is a legal offense. Plagiarism is copying without proper attribution, and is an ethical offense. While they sometimes go together, it’s perfectly possible to plagiarize without violating copyright (such as by plagiarizing public domain sources), and violate copyright without plagiarizing (such as by putting the Harry Potter books on your web site, with J. K. Rowling’s name left on them.)

Standards of proper attribution may vary by genre, but they exist for all genres. Formal scholarly writing standards are especially strict about attribution, with detailed citations generally required for words or even ideas taken from someone else. Popular fiction, music, preaching, and the like, may not usually include footnotes, and reuse may be more common in those genres (especially for things like standard chord progressions), but generally speaking, if you “quote” extensively from someone else, you’re expected to credit them. This can be in your acknowledgements, your liner notes, or wherever, but if it’s more than just a brief or obvious allusion (like the title of this post, taken from a Tom Lehrer song about plagiarism) you need to credit it.

Plagiarism may be forgivable, but not excused or justified, at least for anyone who expects professional respect. Being caught early on may ironically be a blessing; had Edwards been caught out by her editor or a reader on one of the first of her 100+ books, perhaps she could have apologized, changed her ways, and gone on to earn lasting plaudits on her original writing in the books that followed. For that matter, had Edwards’ plagiarism been limited just to nonfiction “infodumps”, and kept out of her story narratives, as it appeared in the first examples to be uncovered, it might have been easier to let go of. Unfortunately, it turns out she also lifted descriptive passages from fiction, which cuts closer to her main line of work as a storyteller.

On the other hand, I find it easier to forgive someone like Martin Luther King, who was (posthumously) found to have plagiarized in many of his academic writings, including his doctoral dissertation. Since Rev. King is best known and honored as a great civil rights leader, it’s easier to forgive this flaw that it would be if he were mainly remembered as an academic or an author. But it’s still sad, as it is when one uncovers the flaws of any of our American heroes (as you will for any of them, if you look closely enough). And now I can’t hear someone call him “Doctor Martin Luther King” without mentally interpolating an asterisk after the title.

More than ever, now that people’s words are increasingly available for searching and scanning in perpetuity, it’s important to take responsibility for what goes out under your name, whether you’re a storyteller, a scholar, or a politician. If you own up to your mistakes and sins early on, and do what you can to fix them, there may be good hope of redemption. Wait, and the damage can worsen both for those you’ve wronged and for yourself. I’ll try to remember that in my own writing, and I hope my readers will help hold me to it.

Posted in crimes and misdemeanors, online books | 1 Comment

Subjects are more than just facets (and an ALA talk plug)

The Library of Congress’ Working Group for the Future of Bibliographic Control announced its final report today. I haven’t yet read over the final version, but I read an earlier draft, and was particularly interested in what it had to say about subjects.

“How should we offer searching in library collections?” is a question that lots of libraries are asking. The answer heard a lot nowadays is “Facets!” Facets have been used in databases and e-commerce sites for some years now. Essentially, they define several (ideally independent) attributes for items, and then let users zero in on what they want by selecting and deselecting various attributes. For example, if you go to Amazon to buy shoes, you can select values from facets like brand, size, color, and price range. Try different selections, and you can quickly pick out the few pairs that best meet your needs out of the tens of thousands offered on the site. (Assuming you’re willing to buy shoes without trying them on.)

The Endeca catalog at NC State applies the same idea to finding books in the library. When it came out two years ago, lots of library folks got excited. And when open source tools like Solr made it easy to code up your own faceted catalog, it came as no surprise that lots of folks set out to try facet-based discovery for their collections. These new catalogs are in many ways big improvements over existing catalogs. Though, as K. G Schneider and others point out, that’s not a high bar to clear.

We too use facets in some new applications we’re building here at Penn. But they don’t entirely work well with subject headings. Kelley McGrath’s article “Facet-Based Search and Navigation: Problems and Opportunities” in the inaugural issue of the Code4lib Journal describes some of the practical problems involved.

Some have said that subject headings should change to be more facet-oriented. That’s the recommendation of the Calhoun Report commissioned by the Library of Congress that was released in 2006, which recommended dismantling the Library of Congress Subject Headings (LCSH), now the most common subject headings vocabulary. The more recent report from the Future of Bibliographic Control doesn’t go that far, but it does recommend transforming LCSH, “de-coupling subject strings” and evaluating LCSH’s ability to “support faceted browsing and discovery”. The FAST system, which breaks up subjects into uncoordinated facets, is mentioned as an interesting technology to pursue.

LCSH indeed has several problems associated with it: people have a hard time finding the appropriate subject terms for what they’re looking for; catalogers have a hard time constructing terms that follow all the LCSH rules; terms are used inconsistently across collections; terms are slow to adapt to contemporary usage; and both “traditional” and faceted library catalogs have a hard time connecting related terms together using LCSH.

Should we, then, dismantle LCSH into a simple system of facet sets? Not so fast, I say. Subjects are inherently messy things, neither fully discrete nor hierarchical, and in a large collection it’s important to be able to zero in on specific subjects through relationships. Not only is there a large installed base of materials already described with LCSH, but LCSH and ontologies like it allow books to be described with greater precision, and with richer relationships, than pure facets allow. (See Thomas Mann’s “The Peloponnesian War and the Future of Reference, Cataloging, and Scholarship in Research Libraries” for a spirited argument for the power of LCSH-style subject headings.)

What we really need are better tools that allow readers and catalogers to take full advantage of rich subject headings and relationships, and make it easier for subject headings systems to evolve more quickly to meet the needs of users. A technology I’m experimenting with now, and calling subject maps, involves networks of related subjects, techniques for enriching those networks through automation and user input, and displays that let users and librarians browse large collections by navigating through complex subject areas. Subject maps can play well with facets and user-assigned tags, to produce discovery systems that offer the best features of all of these technologies.

Too good to be true? If you want to hear more, see a demo, or ask how this would actually work, come see and/or heckle me on Saturday at ALA. I’ll be presenting at the Catalog Form and Function Interest Group, at 10:30 AM in the Versailles Room of the Sofitel Philadelphia. For more info, and for other ALA forums that may be of interest to metadata librarians, see this post on the ALA blog.

Posted in architecture, libraries, subjects | Comments Off on Subjects are more than just facets (and an ALA talk plug)

Copyright and Provenance: A paper and an example

I’m happy to announce the publication of my paper “Copyright and Provenance: Some Practical Problems” in the latest issue of the IEEE Data Engineering Bulletin. I’ve also placed a copy in our institutional repository.

[Provenance of the work: Created by John Mark Ockerbloom, 2007. First published in Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, Vol. 30, No. 4, Dec. 2007, pp. 51-58. No previous works included; however, it derives in part from a previous presentation by the author at the 2007 Principles of Provenance Workshop in Philadelphia.

Provenance of the rights: Copyright originally by the author. Copyright assigned to IEEE, with certain rights retained by the author, via an IEEE Copyright Transfer Form (version as of Dec. 3, 2007) modified by a Science Commons addendum (Immediate Access 1.0).

Provenance of the preceding information: Asserted by the author, Jan 4, 2008.]

The bracketed paragraphs above should give you a taste of some of the provenance issues relevant to copyright clearance that I discuss in the paper. I wrote it primarily for computer scientists, essentially to argue that copyright clearance was an interesting and important application domain for research and development of provenance-aware systems, and to describe some of the basic issues involved. But it may also be of interest to librarians and others who are concerned about risk mitigation, efficiency, and value in clearing copyrights. It doesn’t go as deeply into clearance issues as other work in legal and library literature, but I hope that it provides a useful overview, with a minimum of technical jargon.

For what it’s worth, the draft proposal for embedding copyright status information in MARC records that I mentioned in an earlier post has a number of subfields for encoding the basics of the work-provenance I give above, as well as the information-provenance. It doesn’t have structured ways to express the derivation from previous work that I express above, or the rights assignment information, though these could conceivably go in unstructured notes fields.

Still, if it’s useful to use MARC (with the accompanying tradeoff between its installed base in libraries and its structural limitations) for encoding copyright information, the proposal looks to me like a good start (with some slight modifications I’ve suggested to the proposal committee.) But to enable large-scale copyright clearance with automated assistance, we’re going to eventually need more sophisticated data structures. Relatively speaking, the copyright of my paper is still a lot less complex than many other important examples.

I’m hoping that efforts like OCLC’s Registry of Copyright Evidence project will eventually provide ways of expressing more complex copyright issues in a structured manner. And if there’s any sort of global persistent identifier in a MARC record for a work (whether an ISBN, a DOI, a copyright registration number, or some other suitable identifier), it could be used as a key for linking bibliographic information in the MARC record with detailed copyright evidence in a registry.

Registries aren’t the only places this information can go, of course. Detailed, machine-readable copyright information can also be embedded directly in a work, thanks to the standards efforts of projects like Creative Commons. Which can be quite useful, especially for folks who want to dedicate their work to public use in a simple manner, and see no need to wait 14 years or more to do it.

Posted in copyright | 1 Comment

Public Domain Day gifts

Quite the festive day today! All over the world, people who use the Gregorian calendar are celebrating New Year’s Day. Here in Philadelphia, it’s Mummer’s Parade day. And in my church, it’s a special day dedicated to Mary, the mother of Jesus, and to world peace.

Much of the world gets to celebrate today as Public Domain Day as well, the day when a whole year’s worth of copyrights enter the public domain for anyone to copy or reuse as they like.

In countries that use the “life plus 50 years” minimum standard of the Berne Convention, works by authors who died in 1957 enter the public domain today. That includes writers, artists, and composers like Nikos Kazantzakis, Diego Rivera, Dorothy L. Sayers, Jean Sibelius, and Laura Ingalls Wilder.

In countries that use the “life plus 70 years” term, works by authors who died in 1937 enter the public domain, including works by J. M. Barrie, Jean de Brunhoff, H. P. Lovecraft, Maurice Ravel, and Edith Wharton. Since many countries with this term recently extended it due to trade agreements, they’re often seeing these works re-enter the public domain after being removed from it, but their return to the public is still appreciated.

In countries like the US and Australia, which are under 20-year freezes of all or most of the public domain, it’s not quite as momentous a day. Here in the US, like Bill Murray in Groundhog Day, we’re once again waking up to a public domain 1922, as we have since 1998. Our next mass expiration of copyrighted published material is scheduled for New Year’s Day 2019, 11 years from now. That’s assuming that copyright isn’t again extended before then. Recent extensions here and abroad have often been pushed through in the name of “harmonization” (which seems to always lengthen rather than shorten copyrights), and with Mexico now having a life+100 years term, I would not at all be surprised to see that as the pretext for the next round of attempts to further extend copyright.

But this is not a foregone conclusion. Canada, notably, has held the line at life+50 years for its copyright terms, despite many of its trading partners extending their terms further. And a recent attempt to introduce stricter copyright-related technology controls in Canada was turned back due to public protest, much of it organized online. Similar sustained activism in the US and other countries can keep our copyright laws from getting more out of balance.

Let’s not just ask what the public domain can do for us; let’s ask what we can do for the public domain. In particular, as of this year more than 14 years have passed since the Web started to explode into public consciousness, with NCSA’s release of the Mosaic web browser in 1993. Many of us older Net users started creating web sites that year. And 14 years was the original term of copyright specified in the UK’s Statute of Anne, and the US’s first copyright law (with an optional renewal term).

As an advocate of more reasonable copyright terms, like those envisioned by our country’s founders, I am therefore today dedicating the copyrights of all 1993 versions of my web sites into the public domain. These sites include The Online Books Page, which is still in operation, and Catholic Resources on the Net, which I stopped maintaining in 1999.

Admittedly, this dedication is largely symbolic, since I don’t have 1993 copies of these sites close to hand, though they may still exist in backup copies somewhere in my files, or in unauthorized mirrors (which in some cases were created very early in the sites’ histories). And early on, these sites didn’t have much original prose on them, instead being mostly links that are now largely out of date. But I hope to keep dedicating to the public another year’s worth of these sites each Public Domain Day in the future, and eventually they get more interesting and accessible. And perhaps others who also created content online in the early years of the Web will join in as well, and show that they’re happy releasing their material under shorter copyright terms.

In any case, I’m very interested in hearing about things that people are giving to or receiving from the public domain this year. Happy Public Domain Day!

Posted in copyright, sharing | 17 Comments

Quick! While there’s still time!

Folks interested in copyright information sharing may be interested in the following draft proposal for encoding copyright evidence in MARC records, the standard format used for library catalog records.

It was published on December 17, just a couple of days ago. The email I got asked for responses by January 7.

I’m hoping to comment, when I have the time to look it over carefully. Good thing I heard about it early (though so far I’ve only seen word of it on a private mailing list; I had to Google to verify that the recommendation was actually publicly linked from somewhere).

Bad thing for folks who don’t hear about it that quick, or who are busy, especially with the holidays right in the middle of the comment period. This doesn’t appear to be unusual practice in the library world, though. For instance, the Library of Congress recently released a very interesting-looking draft report on the future of bibliographic control, and allowed all of 16 days for comments. I knew this report was coming, and I’ve been hoping to read it in detail and comment on it. But the deadline for feedback to the committee that prepared it has already passed. Well, I can always comment to the world at large here, but it would have been nice to have more time to read it over carefully and reflect on it, and then comment to the authors. This isn’t just my sentiment; K. G. Schneider has suggestions along these lines to the authors of this and other reports.

I know this can be easier said than done, as deadlines loom. In the working group I’m leading now on interfaces to integrated library systems, we ended up scrambling to get a downloadable recommendation draft together a few days before the conference where we said we’d discuss the draft. But we had most of the material that went into the draft on the wiki well before then. The next iteration of the recommendation, which will get more specific about what we need, is now in preparation, and can be followed and commented live on the site.

The face-to-face discussions of that next iteration, which will target developers and technical folks from ILS vendors and open source development efforts, are planned for February. Hopefully those that are interested in participating will have the chance to read and comment well before those face-to-face meetings.

Yes, folks are often in a rush to meet a deadline, and may be shy about showing half-finished work that may draw all kinds of criticism or premature conclusions from the audience. I’m personally vulnerable to both of those pitfalls. And as a result, we often get over-short comment periods. But it doesn’t have to be that way. Groups can plan ahead for reasonably long comment periods. And readers who are used to the blog and the wiki should know how to deal appropriately with half-finished work. As long as the work doesn’t have to be kept confidential, and there’s a reasonably transparent process for authoring and updating documents, open authoring and comment can give your group insight from a wider variety of knowledgeable people, when it’s still early enough to matter.

(PS: Any relation between the title of this post and this prank is completely, um, coincidental.)

Posted in copyright, libraries | Comments Off on Quick! While there’s still time!

Quick links of interest

Some resources I’ve recently hear about that look like they deserve some attention (which I haven’t given them yet, but find worth noting now)

  • The first issue of the Code4lib journal is out, with lots of interesting-looking articles on next-generation catalogs. I’m looking forward to reading the article on facet-based navigation in LCSH, since I’ve done some work with LCSH navigation, and think there’s more we can do with it than just facets. Some of the articles may also be relevant to the ILS discovery interface group I’m working with.
  • The Copyright Clearance Center is starting a Discover Works wiki that’s attempting to collect and disseminate information about copyright holders to different works. It’s a tricky thing to do, and I still have to see how well this will work as a platform, but they’ve sucked in a lot of initial data, and it will be interesting to look at it in more detail. (Thanks to Merrilee Proffitt for the tip.)
  • The University of Texas has officially launched its Free the Books blog, a place to “challenge ideas about creation and authorship and discuss copyright laws, the public domain and orphan works.” They’re planning to digitize over a million volumes at Texas, sdo this isn’t simply a theoretical concern for them.

All worth a look, in my opinion, if you have the time. I hope to get more of a look at these soon.

Posted in copyright, libraries, sharing | Comments Off on Quick links of interest

Kids on the lawn, and copynorms

There’s an interesting discussion over in John Scalzi’s blog about a new organization called the Organization for Transformative Works, which essentially aims to legitimize fan fiction as first-class expressions safe from copyright challenges. As I write this, there are over 200 comments in response to the opinion by Scalzi (who is a professional author, and whose take on the issue is similar to my own). The opinions run a fairly wide range, some claiming fan fiction is either inherently wrong or inherently protected, but most taking some sort of middle position.

The discussion made me think back to my childhood, growing up in a small town in Connecticut. We lived in a small, fairly new suburban development with farmland and woods around it. A lot of the houses had kids in them our age, and when we got together, we’d often go exploring or playing in yards. Other than the streets themselves, virtually all the land was privately owned. And while we’d often play in our own yards, sometimes we’d go onto someone else’s land, sometimes when the owners weren’t around. One neighbor had a woody area with some paths and little hills that we liked to ride bikes in. Another had a hill in the backyard that was great for sledding in winter. And sometimes you could play in a leftover sandpile or a fallow field in ways you couldn’t anywhere else.

I don’t remember anyone specifically asking if we could play in these places, but it was generally accepted or at least tolerated by the neighbors. There were certain rules: I knew as early as I can remember that some yards were off limits, because the owners didn’t want kids there. And we knew that we’d have to leave immediately if the owner came out and told us to, but that there would be no further consequences, assuming we hadn’t done any damage or otherwise made trouble for the owner. Some of the norms I had to learn by experience. After a scolding (and, I think, a call to my parents) following an “emergency” bathroom break, I learned that entering someone’s yard without asking was one thing, but entering someone’s house without asking was another thing entirely.

Essentially we had a vibrant neighborhood culture built on casual and tolerated trespassing. It was fundamentally a social compact, rather than a legal one. If we kids went too far in infringing on people’s properties, or the adults went too extreme in clamping down, the whole thing would have fallen apart. A homeowner brandishing a shotgun, or a kid defying a request to leave with a “we have a legitimate right to be here!”, or a parent threatening legal action if their kid was hurt in someone else’s yard, would have disrupted things pretty badly.

So I’m looking at the OTW effort with some interest and trepidation. Although I don’t have much interest in “fan fiction” as such, I recognize its value. (Especially since I met a number of friends, including my wife, in a kind of “fan fiction” venue– but that’s a story for another time,) And there’s a good argument to be made that, as long as fan writers keep their work to themselves or to a small, private circle, that it’s fair use.

But once it’s moved online into the public sphere, it seems to me the equivalent of playing in other people’s yards. You hope most people will be fine with it, and it may well even help maintain the social fabric; fan communities, after all, often end up buying lots of the original author’s books. Getting commercial with fanfic, or interfering with an author’s ability to work and make money, would be the equivalent of entering their house or building a booth on their front lawn– Not Done. And ultimately, it’s the author’s right to tell the kids to get off their lawn if they choose. Hopefully, that’s all they’ll want and need to do, and neither they nor the fans will be motivated to raise the stakes.

If you maintain a library, you might want to watch the sort of interaction going on here, even if you don’t particularly care about fanfic. Collection building and public service functions in the digital age often have to negotiate similar gray areas that aren’t neatly covered in law, but have important social aspects. It can be useful to look and see what sorts of practices build up owner and user communities, and what tears them down.

Posted in copyright, sharing | 2 Comments

Notes (and Queries) about adopting serials

The other night, Mary was researching the authorship of a memoir of the Battle of Waterloo, originally published under the by-line “An Englishwoman”. After searching online, she found a link to an article published in an 1871 issue of Notes and Queries that looked promising. She clicked the link– and immediately hit a paywall.

Which was frustrating on multiple levels. First, our library has already bought Notes and Queries several times over. We have print copies– for most years, multiple print copies– of all the volumes from the start of the journal in 1849 up to the present. We also buy access to the online edition. But the regular online access only goes back to 1996– before that, it seems you have to buy an extra package or pay per article.

Okay. But, second, we’re dealing with an article from 1871, long since passed into the public domain. Yes, the publisher has spent money to digitize and store these old issues, and would understandably like some return for its investment. But this is the sort of resource that, with all the mass digitization now going on, should really be free online in some form.

In fact, it is, if you can find it. High-profile mass digitization projects are scanning serial volumes along with books. So far, they’re not giving serials particular attention or care, but they’re there. In order to be really useful, these serial volumes need to be consciously adopted. There are a number of ways one can do this:

  • First, one can digitize them. I’ve found at least three projects that have digitized various volumes of N&Q: the Internet Library of Early Journals (ILEJ), the Open Content Alliance (OCA), and Google. The first of these digitized systematically, but only up to 1869. The latter two don’t seem to have been as systematic, but between then they managed to digitize nearly all the later volumes up to 1922.
  • To make particular issues easily findable, though, one needs to organize them. I got worked up enough to do that for N&Q; the results are here. Except for the ILEJ range, I had to do it volume by volume; the OCA and Google collections didn’t neatly arrange them, or make it easy to find a particular volume.
  • To make them easier to use, it also helps to transform them. Project Gutenberg’s Distributed Proofreaders, for instance, has taken the digitized page images of many of the early issues and produced transcriptions that are considerably more compact, easy to search, and textually accurate than their initial scan-and-OCR digitizations.
  • Researching their copyrights may enable more journal issues to be scanned. Google is very conservative about public domain copyright determinations, particularly abroad, sometimes locking up content as far back as 1865 to some users. The OCA scanners were confident enough to go all the way to 1922. It might be possible to go further still: I’ve discovered that Notes and Queries copyright were not renewed in the US. If post-1922 volumes were subject to US renewal requirements (which requires more research, into questions like whether US-based subscriptions counted as publication here) a number of them may now be out of copyright here.
  • But why stop with uncopyrighted material? Working with the authors of articles in the serial could yield still more. Notes and Queries, like many Oxford journals, appears to have a policy allowing author self-archiving of their articles, in this case once an issue’s been out for at least 2 years. So, conceivably, motivated readers could go through the tables of contents from issues for 2005 and before, try to reach authors, and persuade (or help them) to put the articles into their institutional or disciplinary repositories, assuming they have them. (And reader intervention could help; institutional repositories tend not to fill up on their own; and many libraries can’t or won’t commit the resources to fill them themselves.)

So, here are five ways one can adopt a serial: digitizing, organizing, transforming, copyright-clearing, and getting content from authors and rightsholders. There are some interesting examples of many of these adoption strategies: consider, for instance, the Directory of Open Access Journals; and here’s a big Wiki-page organizing online pre-1930 German-language serials.) And more can be done: for instance, if enough readers supporting open access adopt various journals that allow author self-archiving, we could see lots more current research content openly findable online.

Those, then, are my notes. My queries: What further serial adoption efforts should we know about? And what should we work on?

Posted in open access, serials | Comments Off on Notes (and Queries) about adopting serials

We the mediators

Back in early 2006, Peter Brantley (now the director of the Digital Library Federation) got a lot of interesting folks in libraries and publishing together in one room to talk about issues related to reading in the digital age. While libraries and publishers have different focuses and priorities, we both serve as mediators between authors and audience, and both kinds of mediators are seeing dramatic upheavals and innovations in the ways we carry out our missions.

So the meeting touched off an interesting series of discussions. I’m having a hard time finding the “official” presentation pages from the original meeting, but here’s a short summary from me and a more detailed list of talk summaries from Tim O’Reilly. After the meeting, discussions continued on a mailing list of participants that over time added a number of other folks in publishing and libraries.

A number of the folks involved, mostly on the publishing side of things, have now started a group blog to take many of these conversations public. The blog is called Publishing Frontier, with the tagline “a raucous public discussion of the publishing revolution”. Its starting contributors include folks who’ve worked at trade publishers, scientific imprints, commercial research labs, and grassroots book digitizing.

The blog promises to be an interesting forum and chronicle of the digital revolutions in communication, largely from publishing perspectives (much as I hope this blog to be another such forum and chronicle, largely from librarianship perspectives). I encourage readers here to check it out.

Posted in meta, publishing | Comments Off on We the mediators

Copyright information sharing: An update

I regularly get mail about the web pages I have on copyright registrations and renewals and the inventory I did on the first renewals of periodicals. Turns out a lot of folks, both inside and outside of libraries, are interested in reviving and repurposing old creative works, if they could just figure out whether they were still under copyright, and how to reach the copyright holders if they are.

Here’s part of a not atypical query I received recently (posted with permission):

My anticipated enterprise regards short fiction published predominantly in monthly war-era periodicals; the “usual” specificity of interest – ’23-’63 periodicals which might not have been timely renewed.

It appears to me, after an exhaustive study of public domain law & online resources, that your work is the current state of the art: the closest thing – right now – to definitive.

My question, then…is there, so far as you are aware, any “quantum leaps” anticipated to come down the pike in some forseeable future re: a “definitive” means to check ’23 – ’63 renewals? Especially online/searchable?

Again, THANKS beyond measure for your work; it’s the closest-to-perfect tool yet for exasperated publishers seeking to simply ascertain whether the project they’re considering is “doing the right thing” where not violating someone else’s property is concerned!

It’s both gratifying and frustrating to receive email like this: gratifying because it’s always nice to hear your my is benefiting people; frustrating because I know there’s so much more that could be done to share copyright information, especially when there are so many people interested in it.

And in fact, more is being done, and planned. I organized an open discussion at last spring’s Digital Library Federation (DLF) forum called “Sharing Copyright Information: Opportunities for Collaboration”. It was an interesting and wide-ranging conversation, involving people from a number of libraries and other organizations. Here are the notes from the session. For a good overview and background on many of the copyright issues discussed, see Stanford’s Copyright & Fair Use website.

There have been some notable developments since the spring. Carl Malamud and Peter Brantley have “liberated” recent copyright registration and renewal data from the Copyright Office’s database, making them available for analysis and indexing. Mimi Calter at Stanford has been refining and analyzing their database on book copyright renewals. Bill Carney at OCLC is planning a project for registering copyright information with WorldCat entries. You can read more about these and other initiatives in Peter Brantley’s “Checking Copyright” blog post from last month.

We’re still a long way from a one-stop shop for copyright research. But I hope to use the new data Peter and Carl have liberated to complete my inventory of periodical renewals (which now is complete only to about 1950). I’ve also heard from more than one group that would like to digitize all of the pre-1978 copyright registration and renewal records that are not in the Copyright Office’s online database. If we had good machine-readable data for new and old copyrights, we could construct powerful search engines for copyright registration research. I don’t know who’s actually going to supply this data, though, or how long it will be before it’s all available.

Of course, copyright registration searching is just one part of the problem of copyright clearance, which can involve complicated issues of provenance of works, rights, and information. I’ve recently made a presentation giving an overview of some of these questions (slides here) to an interested group of computer scientists, and a paper I wrote with more details on provenance issues in copyright research should be published later this month. (I’ll link to it when it comes out.)

I don’t want to have lots of people exerting redundant, expensive efforts to clear copyright, or to be deterred from reusing older works because clearing copyright is too difficult. It helps for those of us who are working in this area to keep each other informed about what we’re doing and finding out. So feel free to add a comment to this post if you have a question or useful information on copyright clearance. You can also email me (address in the “about” page) to suggest relevant items for future posts on copyright issues.

Posted in copyright, sharing | 4 Comments