Everybody's Libraries

September 23, 2011

Early journals from JSTOR and others

Filed under: copyright,open access,serials,sharing — John Mark Ockerbloom @ 11:26 am

Earlier this month,  JSTOR announced that it would provide  free open access to their earliest scholarly journal content, published before 1923.  All of this material should be old enough to be in the public domain.  (Or at least it is in the US.  Since copyrights can last longer elsewhere, JSTOR is only showing pre-1870 volumes openly outside the US.)  I was very pleased to hear they would be opening up this content; it’s something I’d asked them to consider ever since they ended a small trial of open, public domain volumes in their early years.

Lots of early  journal content now openly readable online

The time was ripe to open access at JSTOR.  (And not just because of growing discontent over limited access to public domain and publicly funded research.) Thanks to mass-digitization initiatives and other projects, much of the early journal content found in JSTOR is now also available from other sources.  For instance, after Gregory Maxwell posted a torrent of pre-1923 JSTOR volumes of the Philosophical Transactions of the Royal Society of London, I surveyed various free digital text sites and found nearly all the same volumes, and more, available for free from Hathi Trust, Google, the Internet Archive, Gallica, PubMed Central, and the Royal Society itself.  The content needed to be organized to be usefully browsable across sites, but that required a bit of basic librarianship and a bit of time.

Philosophical Transactions is not an anomaly.  After collating volumes of this journal, I looked at the first ten journals that signed on to JSTOR back in the mid-1990s.  (The list can be found below.)  I again found that nearly all of pre-1923 content of these journals was also available from various free online sites.  Now, when you look them up on The Online Books Page, you’ll find links to both the JSTOR copies and the copies at other sites.

Comparing the sites that provide this content is enlightening.  In general, the JSTOR copies are better presented,  with article-level tables of contents, cross-volume searching, article downloads, and consistently high scan quality.  But the copies at other sites are generally usable as well, and sometimes include interesting non-editorial material, such as advertisements, that might not be present in JSTOR’s archive.  By opening up access to its early content now, though, JSTOR will remain the preferred access point to this early content for most researchers — and that, hopefully, will help attract and sustain paid support for the larger body of scholarly content that JSTOR provides and preserves for its subscribers.

And there’s a lot more in the public domain

JSTOR currently only provides open access for volumes up to 1922 (or up to 1869, if you’re not in the US).   But there’s lots more public domain journal content that can be made available.  Looking again at the initial ten JSTOR journals, I found that all of them have additional public domain content that is currently not available as open access on JSTOR, or as of yet on other sites.  That’s because journals published in the US before 1964 had to renew their copyrights after 28 years or enter the public domain.  But most scholarly journals, including these 10, did not renew the copyrights to all their issues.  Here’s a list of the 10 journals, and their first issue copyright renewals:

  1. The American Historical Review – began 1895; issues first renewed in 1931
  2. Econometrica - began 1933; issues first renewed in 1942
  3. The American Economic Review – began 1911; issues not renewed before 1964 (when renewal became automatic)
  4. Journal of Political Economy – began 1892; issues first renewed in 1953
  5. Journal of Modern History - began 1929, issues first renewed in 1953
  6. The William and Mary Quarterly – began 1892; issues first renewed in 1946
  7. The Quarterly Journal of Economics – began 1886; issues first renewed in 1934
  8. The Mississippi Valley Historical Review (now the Journal of American History) – began 1914; issues first renewed in 1939
  9. Speculum – began 1926; issues first renewed in 1934
  10. Review of Economic Statistics (now the Review of Economics and Statistics) – began 1919; issues first renewed in 1935

This list reflects more proactive renewal policies than were typical for scholarly journals. A few years ago, I did a survey of JSTOR journals (summarized in this presentation) that were publishing between 1923 and 1950, and found that only 49 out of 298, or about 1/6, renewed any of their issue copyrights for that time period.  (JSTOR has since added more journals covering this time period, so the numbers will be different now, but I suspect the renewal rate won’t be any higher now than it was then.)

Currently JSTOR has no plans to open up access to post-1922 journal volumes.  But many of those volumes have been digitized, and are in Google’s or Hathi Trust’s collections; or they could be digitized by contributors to the Internet Archive or similar text archives.

If someone does want to open up these volumes, they should re-check their copyright status.   In particular, I have not yet checked the copyright status of individual articles in these journals, which can in theory be renewed separately.  In practice, I’ve found this rarely done for scholarly articles, but not completely unknown.  It might be feasible for me to do a “first article renewal” inventory for journals, like I’ve done for first issue renewal, which could speed up clearances.

Opportunities for open librarianship

JSTOR’s recent open access release of early journals, then, is just the beginning of the open access historic journal content that can be available online.  JSTOR provides a valuable service to libraries in providing and preserving comprehensive digital back runs of major scholarly journals, both public domain and copyrighted.  But while our libraries pay for that service, let’s also remember our mission to provide access to knowledge for all whenever possible.  JSTOR’s contribution in opening  its pre-1923 journal volumes is a much-appreciated contribution to a high-quality open record of early scholarship.  We can build on that further, with copyright research, digitization, and some basic public librarianship.  (I’ve discussed the basics of journal liberation in previous posts.)

For my part, I plan to start by gradually incorporating the open access JSTOR offerings into the serial listings of the Online Books Page, as time permits.  I can also gather further copyright information on these and other journals as I bring them in.  I’m also happy to hear about more journals that are or can go online (whether they’re JSTOR journals or not); you can submit them via my suggestion interface.

How about you?  What would you like to see from the early scholarly record, and what can you do to help open it up?

June 15, 2011

A digital public library we still need, and could build now

Filed under: citizen librarians,copyright,libraries,people,sharing — John Mark Ockerbloom @ 12:39 pm

It’s been more than half a year since the Digital Public Library of America project was formally launched, and I’m still trying to figure out what the project organizers really want it to be.  The idea of “a digital library in service of the American public” is a good one, and many existing digital libraries already play that role in a variety of ways.  As I said when I christened this blog, I’m all for creating a multitude of libraries to serve a diversity of audiences and information needs.

At a certain point after an enthusiastic band of performers says “Let’s put on a show!”, though, someone has to decide what their show’s going to be about, and start focusing effort there.  So far, the DPLA seems to be taking an opportunistic approach.  Instead of promulgating a particular blueprint for what they’ll do, they’re asking the community for suggestions, in a “beta sprint” that ends today.   Whether this results in a clear distinctive direction for the project, or a mishmash of ideas from other digitization, aggregation, preservation, and public service initiatives, remains to be seen.

Just about every digital project I’ve seen is opportunistic to some extent.   In particular, most of the big ones are opportunistic when it comes to collection development.  We go after the books, documents, and other knowledge resources that are close to hand in our physical collections, or that we find people putting on the open web, or that our users suggest, or volunteer to provide on their own.

There are a number of good reasons for this sort of opportunism.  It lets us reuse work that we don’t have to redo ourselves.  It can inform us of audience interests and needs (at least as far as the interests of the producers we find align with the interests of the consumers we serve).  And it’s cheap, and that’s nothing to sneer at when budgets are tight.

But the public libraries that my family prefers to use don’t, on the whole, have opportunistically built collections.  Rather, they have collections shaped primarily by the needs of their patrons, and not primarily by the types of materials they can easily acquire.   The “opportunistic” community and school library collections I’ve seen tend to be the underfunded ones, where books in which we have yet to land on the Moon, the Soviet Union is still around, or Alaska is not yet a state may be more visible than books that reflect current knowledge or world events.  The better libraries may still have older titles in their research stacks, but they lead with books that have current relevance to their community, and they go out of their way to acquire reliable, readable resources for whatever information needs their users have.  In other words, their collections and services are driven by  demand, not supply.

In the digital realm, we have yet to see a library that freely provides such a digital collection at large scale for American public library users.   Which is not to say we don’t have large digital book collections– the one I maintain, for instance, has over a million freely readable titles, and Google Books and lots of other smaller digital projects have millions more.  But they function more as research or special-purpose collections than as collections for general public reference, education, or enjoyment.

The big reason for this, of course, is copyright.  In the US, anyone can freely digitize books and other resources published before 1923, but providing anything published after that requires copyright research and, usually, licensing, that tends to be both complex and expensive.  So the tendency of a lot of digital library projects is to focus on the older, obviously free material, and have little current material.  But a generally useful digital public library needs to be different.

And it can be, with the right motivation, strategy, and support.  The key insight is that while a strong digital public library needs to have high-quality, current knowledge resources, it doesn’t need to have all such resources, or even the most popular or commercially successful ones.  It just needs to acquire and maintain a few high-quality resources for each of the significant needs and aptitudes of its audience. Mind you, that’s still a lot of ground to cover, especially when you consider all the ages, education levels, languages, physical and mental abilities, vocational needs, interests, and demographic backgrounds that even a midsized town’s public library serves.  But it’s still a substantially smaller problem, and involves a smaller cost, than the enticing but elusive idea of providing instant free online access to everything for everyone.

There are various ways public digital libraries could acquire suitable materials proactively.  The America.gov books collection provides one interesting example.  The US State Department wanted to create a library of easy-to-read books on civics and American culture and history for an international audience.  Some of these books were created in-house by government staff.  Others were commissioned to outside authors.  Still others were adapted from previously published works, for which the State Department acquired rights.

A public digital library could similarly create, commission, solicit, or acquire rights to books that meet unfilled information needs of its patrons.  Ideally it would aim to acquire rights not just to distribute a work as-is, but also to adapt and remix into new works, as many Creative Commons licenses allow.  This can potentially greatly increase the impact of any given work.  For instance, a compellingly written,  beautifully illustrated book on dinosaurs might be originally written for 9-12 year old English speakers, and be noticeably obsolete due to new discoveries after 5 or 10 years.  But if a library’s community has reuse and adaptation rights, library members can translate, adapt, and update the book, so it becomes useful to a larger audience over a longer period of time.

This sort of collection building can potentially be expensive; indeed, it’s sobering that America.gov has now ceased being updated, due to budget cuts.  But there’s a lot that can be produced relatively inexpensively.  Khan Academy, for example, contains thousands of short, simple educational videos, exercises, and assessments created largely by one person, with the eventual goal of systematically covering the entire standard K-12 curriculum.  While I think a good educational library will require the involvement of many more people, the Khan example shows how much one person can get accomplished with a small budget, and projects like Wikipedia show that there’s plenty of cognitive surplus to go around, that a public library effort might usefully tap into.

Moreover, the markets for rights to previously authored content can potentially be made much more efficient than they are now.  Most books, for instance, go out of print relatively quickly, with little or no commercial exploitation thereafter.  And as others have noted, just trying to get permission to use  a work digitally, even apart from any royalties, can be very expensive and time-consuming.  But new initiatives like Gluejar aim to make it easier to match up people who would be happy to share their book rights with people who want to reuse them. Authors can collect a small fee (which could easily be higher than the residual royalties on an out-of-print book); readers get to share and adapt books that are useful to them.   And that can potentially be much cheaper than acquiring the rights to a new work, or creating one from scratch.

As I’ve described above, then, a digital public library could proactively build an accessible collection of high-quality, up to date online books and other knowledge resources, by finding, soliciting, acquiring, creating, and adapting works in response to the information needs of its users.  It would build up its collection proactively and systematically, while still being opportunistic enough to spot and pursue fruitful new collection possibilities.  Such a digital library could be a very useful supplement to local public libraries, would be open any time anywhere online, and could provide more resources and accessibility options than a local public library could provide on its own.  It would require a lot of people working together to make it work, including bibliographers, public service liaisons, authors, technical developers, and volunteers, both inside and outside existing libraries.  And it would require ongoing support, like other public libraries do, though a library that successfully serves a wide audience could also potentially tap into a wide base of funds and in-kind contributions.

Whether or not the DPLA plans to do it, I think a large-scale digital free public library with a proactively-built, high-quality, broad-audience general collection is something that a civilized society can and should build.  I’d be interested in hearing if others feel the same, or have suggestions, critiques, or alternatives to offer.

April 9, 2011

Opt in for open access

Filed under: copyright,libraries,online books,open access — John Mark Ockerbloom @ 8:40 am

There’s been much discussion online about Judge Chin’s long-awaited decision to reject the settlement proposed by Google and authors and publishers’ organizations over the Google Books service. Settlement discussions continue (and the court has ordered a status conference for April 25).  But it’s clear that it will be a while before this case is fully settled or decided.

Don’t count on a settlement to produce a comprehensive library

When the suit is finally resolved, it will not enable the comprehensive retrospective digital library I had been hoping for.  That, Chin clearly indicated, was an over-reach.  The  proposed settlement would have allowed Google to sell access to most pre-2009 books published in the English-speaking world whose rightsholders had not opted out.   But, as Chin wrote, “the case was about the use of an indexing and searching tool, not the sale of complete copyrighted works.”  The changes in the American copyright regime that the proposed settlement entailed, he wrote, were too sweeping for a court to approve.

Unless Congress makes changes in copyright law, then, a rightsholder has to opt in for a copyrighted book to be made readable on Google (or on another book site).  Chin’s opinion ends with a strong recommendation for the parties to craft a settlement that would largely be based on “opt-in”.  Of course, an “opt in” requirement necessarily excludes orphan works, where one cannot find a rightsholder to opt in.  And as John Wilkin recently pointed out, it’s likely that a lot of the books held by research libraries are orphan works.

Don’t count on authors to step up spontaneously

Chin expects that many authors will naturally want to opt in to make their works widely available, perhaps even without payment.  “Academic authors, almost by definition, are committed to maximizing access to knowledge,” he writes.  Indeed, one of the reasons he gives for rejecting the settlement is the argument, advanced by Pamela Samuelson and some other objectors, that the interests of academic and other non-commercially motivated authors are different from those of the commercial organizations that largely drove the settlement negotiations.

I think that Chin is right that many authors, particularly academics, care more about having their work appreciated by readers than about making money off of it.  And even those who want to maximize their earnings on new releases may prefer freely sharing their out of print books to keeping them locked away, or making a pittance on paywall-mediated access.  But that doesn’t necessarily mean that we’ll see all, or even most, of these works “opted in” to a universally accessible library.  We’ve had plenty of experience with institutional repositories showing us that even when authors are fine in principle with making their work freely available, most will not go out of their way to put their work in open-access repositories, unless there are strong forces mandating or proactively encouraging it.

Don’t count on Congress to solve the problem

The closest analogue to a “mandate” for making older books generally available would be orphan works legislation.    If well crafted, such a law could make a lot of books available to the public that now have no claimants, revenue, or current audience, and I hope that a coalition can come together to get a good law passed. But an orphan works law could take years to adopt (indeed, it’s already been debated for years). There’s no guarantee on how useful or fair the law that eventually gets passed would be, after all the committees and interest groups are done with it.  And even the best law would not cover many books that could go into a universal digital library.

Libraries have what it takes, if they’re proactive

On the other hand, we have an unprecedented opportunity right now to proactively encourage authors (academic or otherwise) to make their works freely available online.  As Google and various other projects continue to scan books from library collections, we now have millions of these authors’ books deposited in “dark” digital archives.  All an interested author has to do is say the word, and the dark  copy can be lit up for open access.  And libraries are uniquely positioned to find and encourage the authors in their communities to do this.

It’s now pretty easy to do, in many cases.  Hathi Trust, a coalition of a growing number of research institutions, currently has over 8 million volumes digitized from member libraries.  Most of the books are currently inaccessible due to copyright.  But they’ve published a permission agreement form that an author or other rightsholder can fill out and send in if they want to make their book freely readable online.  The form could be made a bit clearer and more visible, but it’s workable as it is.  As editor of The Online Books Page, I not infrequently hear from people who want to share their out of print books, or those of their ancestors, with the world.  Previously, I had to worry about how the books would get online.  Now I usually can just verify it’s in Hathi’s collection, and then refer them to the form.

Google Books also lets authors grant access rights through their partner program.  Joining the program is more complicated than sending in the Hathi form, and it’s more oriented towards selling books than sharing them.  But Google Books partners can declare their books freely readable in full if they wish, and can give them Creative Commons licenses (as they can with Hathi).  Google has even more digitized books in its archives than Hathi does.

So, all those who would love to see a wide-ranging (if not entirely comprehensive), globally accessible digital library now have a real opportunity to make it happen.  We don’t have to wait for Congress to act, or  some new utopian digital library to arise.  Thanks to mass digitization, library coalitions like Hathi’s, and the development of simplified, streamlined rights and permissions processes, it’s easier than ever for interested authors (and heirs, and publishers) to make their work freely available online.  If those us involved in libraries, scholarship, and the open access movement work to open up our own books, and those of our colleagues, we can light up access to the large, universal digital library that’s now waiting for us online.

January 2, 2011

Public Domain Day 2011: Will the tide be turned?

Filed under: copyright — John Mark Ockerbloom @ 12:40 am

This year’s Public Domain Day, the day on which a year’s worth of copyrights expire in many countries, is getting particular attention in Europe, where events in various European cities commemorate authors who died in 1940, and whose works are now in the public domain there.

Or, to be more precise, they’ve returned to the public domain there.  Although the reigning international copyright standard, the Berne Convention, requires copyrights to run at least for the lifetime of the author plus 50 years, the European Union in 1993 mandated a retroactive copyright extension to life plus 70 years, to match the longest term in any of its member countries at the time.  Twenty years of the public domain were buried by this extension.  For at least the next 3 years, all we’ll be seeing in Europe is the old public domain re-emerging.

The public domain has seen losses and freezes in much of the rest of the world since.  In 1998, after years of lobbying by the entertainment industry, the US enacted its own 20-year copyright extension.  Thankfully, this extension only froze the public domain instead of rolling it back, but we will wait another 8 years before more publications enter the public domain here due to age.  The 1998 extension was just the latest of a series of copyright extensions in the United States.  In 1954, US copyrights ran a maximum of 56 years, so all of the works published before 1955 would now in the public domain here were it not for later extensions.   (Instead, we still have copyrights in force as far back as 1923.)

There’s no clear end in sight to further extensions.  Since 1998 I’ve steadily been seeing country after country extend its terms, often pushed by trade negotiations with Europe or the United States.  “Life+50″ may still be the global standard, but bi-lateral and region-specific trade agreements have pushed terms up to “life+70″ in many countries around the world.  Some countries have gone even longer — Mexico, for instance, is now “life+100″– making convenient targets for further rounds of copyright extensions in the name of international “harmony”.

There are some bright spots, though.  Many countries continue to hold the line at life+50 years, including Canada (despite years of pressure from its southern neighbor).  As of today, residents of “life+50″ countries are now free to republish, adapt, reuse, and build upon works by authors who died in 1960 or before, in whatever way they see fit.   I hope to show some of what this means as I introduce listings from projects like Gutenberg Canada to The Online Books Page this year.

In the US, where many copyrights prior to 1964 didn’t run for their full length unless renewed, a number of digitization projects (most notably Hathi Trust) have been finding post-1922 works with unrenewed copyrights, and making them freely readable online.  These works tend not to be the best-sellers or popular backlist titles, but collectively they embody much of the knowledge and culture of the mid-20th century.  I’ve also been very happy to list many of these works over the past year.

At the same time, there’s been a growing awareness that copyright need not be “one size fits all”, particularly for works that no longer have much commercial value.  This insight helped lead various authors’ and publishers’ groups to negotiate a blanket license to Google to make out of print works generally available online.  The license, part of the Google Books Settlement, is not without its controversy or problems, and might or might not eventually get court approval.  But it suggests political feasibility for similar efforts to free older, more obscure cultural and scholarly works now languishing under exclusive copyright control.

We’ve even seen at least one entertainment industry spokesman speculate out loud that re-introducing simple formalities to maintain copyright might not be such a bad idea.  Such formalities are forbidden by the Berne Convention, so they could not be introduced across the board without re-negotiating that treaty.  That would be no easy task.

But the recent round of copyright extensions may at least provide an opening for international experimentation.  Now that copyright terms go past the Berne minimum in many countries, the post-Berne portion of the copyright term could potentially be made subject to requirements that Berne doesn’t allow (such as the renewal of copyrights in some suitable international registry system).  That could not only free many older “orphan works” for reuse, but if it works well it could also lead to negotiating a farther-reaching international registry system. Such a system could make it easier both to contact copyright holders for permissions, and to free works for the public domain whose owners no longer cared (or who never did want) to maintain exclusive rights.

I’ve been practicing a self-imposed system of “formalities” myself over the last few years.  On every Public Domain Day, I’ve been freeing published works of mine more than 14 years old, except for works where I explicitly opt to reserve copyright.  (Copyrights in the US originally ran for 14 years unless renewed for another 14.)  So: All works of mine published in 1996 for which I control the copyright are hereby released to the public domain.  (Legally, you can consider them all to be declared CC0.)  Much of the publication I did that year online can now be found through sites like the Internet Archive, which started crawling my web sites in late 1996.

I’d be very happy to hear about other gifts people are making to the public domain, as well as successes in bringing more of the public domain to light online, and in expanding the scope of the public domain as a whole.  Happy Public Domain Day to all!

October 18, 2010

October 15, 2010

Journal liberation: A community enterprise

Filed under: copyright,discovery,open access,publishing,serials,sharing — John Mark Ockerbloom @ 2:53 pm

The fourth annual Open Access Week begins on Monday.  If you follow the official OAW website, you’ll be seeing a lot of information about the benefits of free access to scholarly research.  The amount of open-access material grows every day, but much of the research published in scholarly journals through the years is still practically inaccessible to many, due to prohibitive cost or lack of an online copy.

That situation can change, though, sometimes more dramatically than one might expect.  A post I made back in June, “Journal liberation: A Primer”, discussed the various ways in which people can open access to journal content, past and present,  one article or scanned volume at a time.  But things can go much faster if you have a large group of interested liberators working towards a common goal.

Consider the New England Journal of Medicine (NEJM), for example.  It’s one of the most prominent journals in the world, valued both for its reports on groundbreaking new research, and for its documentation, in its back issues, of nearly 200 years of American medical history.  Many other journals with lesser value still cannot be read without paying for a subscription, or visiting a research library that has paid for a subscription.  But you can find and read most of NEJM’s content freely online, both past and present. Several groups of people made this possible.  Here are some of them.

The journal’s publisher has for a number of years provided open access to all research articles more than 6 months old, from 1993 onward.  (Articles less than 6 months old are also freely available to readers in certain developing countries, and in some cases for readers elsewhere as well.)  A registration requirement was dropped in 2007.

Funders of medical research, such as the National Institutes of Health, the Wellcome Trust, and the Howard Hughes Medical Institute, have encouraged publishers in the medical field to maintain or adopt such open access policies, by requiring their grantees (who publish many of the articles in journals like the NEJM) to make their articles openly accessible within months of publication.  Some of these funders also maintain their own repositories of scholarly articles that have appeared in NEJM and similar journals.

Google Books has digitized most of the back run of the NEJM and its predecessor publications as part of its Google Books database.  Many of these volumes are freely accessible to the public.  This is not the only digital archive of this material; there’s also one on NEJM’s own website, but access there requires either a subscription or a $15 payment per article.   Google’s scans, unlike the ones on the NEJM website, include the advertisements that appeared along with the articles.  These ads document important aspects of medical history that are not as easily seen in the articles, on subjects ranging from the evolving requirements and curricula of 19th-century medical schools to the early 20th-century marketing of heroin for patients as young as 3 years old.

It’s one thing to scan journal volumes, though; it’s another to make them easy to find and use– which is why NEJM’s for-pay archive got a fair bit of publicity when it was released this summer, while Google’s scans went largely unnoticed.  As I’ve noted before, it can be extremely difficult to find all of the volumes of a multi-volume work in Google Books; and it’s even more difficult in the case of NEJM, since issues prior to 1928 were published under different journal titles.  Fortunately, many of the libraries that supplied volumes for Google’s scanners have also organized links to the scanned volumes, making it easier to track down specific volumes.  The Harvard Libraries, for instance, have a chronologically ordered list of links to most of the volumes of the journal from 1828 to 1922, a period when it was known as the Boston Medical and Surgical Journal.

For many digitized journals, open access stops after 1922, because of uncertainty about copyright.  However, most scholarly journals have public domain content after that date, so it’s possible to go further if you research journal copyrights.  Thanks to records provided by the US Copyright Office and volunteers for The Online Books Page, we can determine that issues and articles of the NEJM prior to the 1950s did not have their copyrights renewed.  With this knowledge, Hathi Trust has been able and willing to open access to many volumes from the 1930s and 1940s.

We at The Online Books Page can then pull together these volumes and articles from various sources, and create a cover page that allows people to easily get to free versions of this journal and its predecessors all the way back to 1812.

Most of the content of the New England Journal of Medicine has thus been liberated by the combined efforts of several different organizations (and other interested people).  There’s still more than can be done, both in liberating more of the content, and in making the free content easier to find and use.  But I hope this shows how widespread  journal liberation efforts of various sorts can free lots of scholarly research.  And I hope we’ll hear about many more  free scholarly articles and journals being made available, or more accessible and usable, during Open Access Week and beyond.

I’ve also had another liberation project in the works for a while, related to books, but I’ll wait until Open Access Week itself to announce it.  Watch this blog for more open access-related news, after the weekend.

June 11, 2010

Journal liberation: A primer

Filed under: copyright,libraries,open access,publishing,sharing — John Mark Ockerbloom @ 10:07 am

As Dorothea Salo recently noted, the problem of limited access to high-priced scholarly journals may be reaching a crisis point.  Researchers that are not at a university, or are at a not-so-wealthy one, have long been frustrated by journals that are too expensive for them to read (except via slow and cumbersome inter-library loan, or distant library visits).  Now, major universities are feeling the pain as well, as bad economic news has forced budget cuts in many research libraries, even as further price increases are expected for scholarly journals.  This has forced many libraries to consider dropping even the most prestigious journals, when their prices have risen too high to afford.

Recently, for instance, the University of California, which has been subject to significant budget cuts and furloughssent out a letter in protest of Nature Publishing Group’s proposal to raise their subscription fees by 400%.  The letter raised the possibility of cancelling all university subscriptions to NPG, and having scholars boycott the publisher.

Given that Nature is one of the most prestigious academic journals now publishing, one that has both groundbreaking current articles and a rich history of older articles, these are strong words.  But dropping subscriptions to journals like Nature might not be as as much of a hardship for readers as it once might have been.  Increasingly, it’s possible to liberate the research content of academic journals, both new and old, for the world.  And, as I’ll explain below, now may be an especially opportune time to do that.

Liberating new content

While some of the content of journals like Nature is produced by the journal’s editorial staff or other writers for hire, the research papers are typically written by outside researchers, employed by universities and other research institutions.  These researchers hold the original copyright to their articles, and even if they sign an agreement with a journal to hand over rights to them (as they commonly do), they retain whatever rights they don’t sign over.  For many journals, including the ones published by Nature Publishing Group, researchers retain the right to post the accepted version of their paper (known as a “preprint”) in local repositories.  (According to the Romeo database, they can also eventually post the “postprint”– the final draft resulting after peer review, but before actual publication in the journal– under certain conditions.)  These drafts aren’t necessarily identical to the version of record published in the journal itself, but they usually contain the same essential information.

So if you, as a reader, find a reference to a Nature paper that you can’t access, you can search to see if the authors have placed a free copy in an open access repository. If they haven’t, you can contact one of them to encourage them do do so.  To find out more about providing open access to research papers, see this guide.

If a journal’s normal policies don’t allow authors to share their work freely in an open access repository, authors  may still be able to retain their rights with a contract addendum or negotiation.  When that hasn’t worked, some academics have decided to publish in, or review for, other journals, as the California letter suggests.  (When pushed too far, some professors have even resigned en masse from editorial boards to start new journals that are friendlier to authors and readers.

If nothing else, scholarly and copyright conventions generally respect the right of authors to send individual copies of their papers to colleagues that request them.  Some repository software includes features that make such copies extremely easy to request and send out.  So even if you can’t find a free copy of a paper online already, you can often get one if you ask an author for it.

Liberating historic content

Many journals, including Nature, are important not only for their current papers, but for the historic record of past research contained in their back issues.  Those issues may be difficult to get a hold of, especially as many libraries drop print subscriptions, deaccession old journal volumes, or place them in remote storage.  And electronic access to old content, when it’s available at all, can be surprisingly expensive.  For instance, if I want to read this 3-paragraph letter to the editor from 1872 on Nature‘s web site, and I’m not signed in at a subscribing institution, the publisher asks me to pay them $32 to read it in full.

Fortunately, sufficiently old journals are in the public domain, and digitization projects are increasingly making them available for free.  At this point, nearly all volumes of Nature published before 1922 can now be read freely online, thanks to scans made available to the public by the University of Wisconsin, Google, and Hathi Trust.  I can therefore read the letters from that 1872 issue, on this page, without having to pay $32.

Mass digitization projects typically stop providing public access to content published after 1922, because copyright renewals after that year might still be in force.  However, most scholarly journals– including, as it turns out, Nature — did not file copyright renewals.  Because of this, Nature issues are actually in the public domain in the US all the way through 1963 (after which copyright renewal became automatic).  By researching copyrights for journals, we can potentially liberate lots of scholarly content that would otherwise be inaccessible to many. You can read more about journal non-renewal in this presentation, and research copyright renewals via this site.

Those knowledgeable about copyright renewal requirements may worry that the renewal requirement doesn’t apply to Nature, since it originates in the UK, and renewal requirements currently only apply to material that was published in the US before, or around the same time as, it was published abroad.  However, offering to distribute copies in the US counts as US publication for the purposes of copyright law.  Nature did just that when they offered foreign subscriptions to journal issues and sent them to the US; and as one can see from the stamp of receipt on this page, American universities were receiving copies within 30 days of the issue date, which is soon enough to retain the US renewal requirement.  Using similar evidence, one can establish US renewal requirements for many other journals originating in other countries.

Minding the gap

This still leaves a potential gap between the end of the public domain period and the present.  That gap is only going to grow wider over time, as copyright extensions continue to freeze the growth of the public domain in the US.

But the gap is not yet insurmountable, particularly for journals that are public domain into the 1960s.  If a paper published in 1964 included an author who was a graduate student or a young researcher, that author may well be still alive (and maybe even be still working) today, 46 years later.  It’s not too late to try to track authors down (or their immediate heirs), and encourage and help them to liberate their old work.

Moreover, even if those authors signed away all their rights to journal publishers long ago, or don’t remember if they still have any rights over their own work, they (or their heirs) may have an opportunity to reclaim their rights.  For some journal contributions between 1964 and 1977, copyright may have reverted to authors (or their heirs) at the time of copyright renewal, 28 years after initial publication.  In other cases, authors or heirs can reclaim rights assigned to others, using a termination of transfer.  Once authors regain their rights over their articles, they are free to do whatever they like with them, including making them freely available.

The rules for reversion of author’s rights are rather arcane, and I won’t attempt to explain them all here.  Terminations of transfer, though, involve various time windows when authors have the chance to give notice of termination, and reclaim their rights.  Some of the relevant windows are open right now.   In particular, if I’ve done the math correctly, 2010 marks the first year one can give notice to terminate the transfer of a paper copyrighted in 1964, the earliest year in which most journal papers are still under US copyright.  (The actual termination of a 1964 copyright’s transfer won’t take effect for another 10 years, though.)  There’s another window open now for copyright transfers from 1978 to 1985; some of those terminations can take effect as early as 2013.  In the future, additional years will become available for author recovery of copyrights assigned to someone else.  To find out more about taking back rights you, or researchers you know, may have signed away decades ago, see this tool from Creative Commons.

Recognizing opportunity

To sum up, we have opportunities now to liberate scholarly research over the full course of scholarly history, if we act quickly and decisively.  New research can be made freely available through open access repositories and journals.  Older research can be made freely available by establishing its public domain status, and making digitizations freely available.  And much of the research in the not-so-distant past, still subject to copyright, can be made freely available by looking back through publication lists, tracking down researchers and rights information, and where appropriate reclaiming rights previously assigned to journals.

Journal publishing plays an important role in the certification, dissemination, and preservation of scholarly information.  The research content of journals, however, is ultimately the product of scholars themselves, for the benefit of scholars and other knowledge seekers everywhere.   However the current dispute is ultimately resolved between Nature Publishing Group and the University of California, we would do well to remember the opportunities we have to liberate journal content for all.

April 7, 2010

Copyright information is busting out all over

Filed under: copyright,sharing — John Mark Ockerbloom @ 3:43 pm

Like the crocuses and daffodils now coming up all over our front garden, new copyright registration information has been popping up all over the net lately.  As I’ve described in various previous posts, this information can be extremely useful for folks who want to revive, disseminate, or reuse works from the past.

Here’s a summary of the some of the recent highlights:

Copyright renewals for maps and commercial prints are now all online, and join what is now a complete set of renewals of active copyrights for still images.  The scanning was done here at the Penn Libraries by me and by the Schoenberg Center for Electronic Text and Image, from microfilms and volumes loaned by the Free Library of Philadelphia.  I thank all the folks who helped out with this project.

With this addition of this latest set of records, you can now find copyright renewals online for nearly anything you’d find in a book, if they’re recent enough to still be in force.  The only active copyright renewals of any sort not yet online at this point to my knowledge are renewals for most music prior to 1978, and a few small sets of pre-1978 renewals for film (about 2 years’ worth in all).

Original copyright registrations are also going online at a rapid rate.   The biggest publicly accessible set of original registrations from 1923 onward (the date of the oldest copyrights still in force) is at Hathi Trust, and consists of digitized volumes that have been scanned by Google for Hathi member libraries.  I’ve include them in a list of registration volumes organized by year and type of work on my Catalog of Copyright Entries Page, which has now been reorganized to combine all the original and renewal registrations known to be available online.  I’ve also added direct page links to renewal and other important sections of the volumes, so that researchers looking for those can go to them directly.  In many cases, the renewal sections can be downloaded for offline use.  I’ve also brought out statistics from the volumes, to help give readers a sense of the rate of registrations and renewals.

Google is making enhanced versions of book copyright registration volumes available online. Specifically, they’ve digitized the full set of original and renewal registrations for books from 1922-1977, in a set of scans that are of generally higher quality than the ones at Hathi Trust.  You can search the full text of the entire set at once, or search or browse individual volumes.

These scans were done specially for copyright research purposes, and seem to involve more careful scanning than the normal mass-book-digitization procedures Google used for the Hathi Trust volumes.  They aren’t entirely free of problems– I identify a few trouble spots in my listings– and they also don’t include registrations for other types of work, which has apparently confused some folks who have contacted me.  But they’re quite high quality overall, and could be a very good basis for structured data records of these copyright registrations.  Google has previously made such records available for book copyright renewals; I hope we’ll see a release of records based on these new scans before long as well.

Also in the pipeline: Based on conversations I’ve had with others interested in copyright issues, we may well see a complete set of copyright registrations and renewals online (at least in the form of page images from the Catalog of Copyright Entries) by the end of this year.  And a number of projects are working on making this digitized information more useful for practical copyright clearance.  Today, for instance, I heard about the Durationer project, being presented at the Copyright@300 conference at Berkeley later this week.  The project is developing a tool to help people determine the copyright status of specific works in specific jurisdictions, based on copyright registrations and other relevant information.

Some possible future directions: As I described in more detail in a 2007 paper, a thorough determination of a work’s copyright status depends not just on registration information, but on various other kinds of information, much of which can be found in a work’s bibliographic records.  Copyright registration data can also be used to build new bibliographic data structures.  Therefore, the interests of copyright clearance and the interests of access to bibliographic data tend to converge.  I elaborate on this idea in a guest blog post for the Open Knowledge Foundation, who I’ve started to work with in these areas.  (For folks following the debate over OCLC’s WorldCat, this convergence is also worth keeping in mind when reading the just-released WorldCat Rights and Responsibilities draft, which I hope to comment on in the not-too-distant future.)

I hope you find this new copyright information useful.  And I’m very interested in hearing what you’re doing with it, or would like to do with it.

February 8, 2010

Shedding light on images in the public domain

Filed under: copyright,sharing — John Mark Ockerbloom @ 3:05 pm

For years, I’ve regularly gotten requests from authors and publishers for licenses to reproduce images in books listed on The Online Books Page, or included in the local collection of A Celebration of Women Writers.  Sometimes these requests relate to copyrighted books that I list but don’t control rights for; in those cases, I do my best to refer the request to the book’s copyright holder.  But often, they’re for images in our own collections, from books published over 100 years ago.  In those cases, I respond that the image is in the public domain (and our digitization, which adds no originality, is also in the public domain), so no license is necessary or appropriate.

Usually that response receives a thankful reply, sometimes with signs of surprise that an image can be reused without permission.  But sometimes I’ll get back a more alarmed reply.  “My publisher says I need a license for every image in my book, or I can’t use it,” it might say, followed by a plea for help in tracking down some long-defunct 19th century publisher.

I wish I could say this was an atypical anecdote.   But, if you look around the Web, you’ll find that there are huge numbers of historic images– paintings, photographs, figures, and the like– that are behind access barriers, or closed off altogether from online access, when they don’t have to be.  Artstor has over a million images of thousands of years of art that you can’t look at unless you’re at an institution that has a subscription.  The fine arts image catalog at my own library has over 100,000 digital images, none of which can be seen online by the public outside of Penn, except in thumbnails.  Neither Artstor nor Penn want to keep art away from the public; both are nonprofit educational institutions. But clearing images for free public access on a large scale has to date been impractical for these institutions.

Restrictions on images also create holes in other works.  For instance, under the proposed Google Books settlement, images in books that might be under copyright would be blanked out unless the rightsholder to the book also asserted they held the rights in the images.  These sorts of omissions can cut the heart out of many works.  In a recent New Republic article, “For the Love of Culture“, Lawrence Lessig described how a critical table was omitted in an otherwise free article about his daughter’s possible illness, due to rights-clearance issues.  “I could not believe that we were this far down the path to insanity already,” he wrote of the incident.

Part of the insanity is that many of these images from our cultural heritage are actually in the public domain.  Many people are aware that copyrights prior to 1923 have expired in the US.  But so have many copyrights from later in the 20th century.  Pre-1964 copyrights generally had to be renewed 28 years after the start of their term, or they would expire.   (Exceptions and further details are described here.)  But most copyrights were never renewed; and that’s especially true for images.

In 1923, there were copyright registrations for 3,059 works of art, 1,149 scientific and technical drawings, 7,533 photographs, and 11,289 prints and pictorial illustrations, making a total of  23,030 copyright registrations for these classes of image.  In 1951, 28 years later, there were 198 copyright renewals for all of these image classes combined.  This represents a renewal rate of less than 1%.

We have just completed posting scans that make all active copyright renewals for artwork viewable online.  In fact, once we finish scanning one last batch of renewals for maps and for commercial prints (meaning images created for product packaging and promotion) all active copyright renewals for any type of still image will be viewable online.   In later years, the number of image copyright renewals grows slightly, but not by much. But the number of images published in those years grows substantially.

Images without a copyright registration of their own might still be under copyright if they were first published as part of a copyrighted book, newspaper, magazine, or other larger work.  Fortunately, we have complete online renewal records for those kinds of works too.  It becomes much easier to establish the public domain status of a newspaper photograph, for instance, if you know (as I previously revealed) that no newspaper outside New York renewed copyright for any issue published before the end of World War II.

Having copyright renewals online for artwork is an important step towards freeing the public domain in images.  But there’s more needed to make copyright clearance practical at a large scale.  Putting scanned renewal records into a searchable database (perhaps combined with fair use image thumbnails) will make it easier to find any copyright renewals that might exist for a particular image.  (A similar database for book renewals already exists, and there are more book renewals than image renewals.)  Making original copyright registrations available as well (as we now have for artwork through 1949, and soon will have for later years) lets us determine when the copyright for an image began, and whether it was renewed in time to prevent it from expiring.

Furthermore, establishing the history and provenance of images will let us determine when unregistered artwork enter the public domain.  Registered or not, the copyright to an image created before 1964 began no later than its first US publication, and the copyright for many such images therefore ended after 28 years due to a lack of renewal.  And the mostly-frozen American public domain still includes more work each year that was never published before 2003.  On Public Domain Day last month, all such work by artists who died in 1939 entered the public domain in the US.  (I won’t get now into the rather baroque rules for establishing “publication” of an artwork, but you can determine it if  the history of the image is documented.)

So we have a rich treasure trove of images in the public domain that’s been largely buried under presumptions and uncertainties about copyright.  By finding and sharing information about their copyrights, we can protect and enjoy these images in the commons of the public domain, where they can be viewed freely, included in new works, and reused in any way we can imagine.  If you find this prospect intriguing, I hope you’ll help bring these images to light.

January 1, 2010

Public domain day 2010: Drawing up the lines

Filed under: copyright,online books,open access — John Mark Ockerbloom @ 12:01 am

As we celebrate the beginning of the New Year, we also mark Public Domain Day (a holiday I’ve been regularly celebrating on this blog.)  This is the day when a year’s worth of copyrights expire in many countries around the world, and the works they cover become free for anyone to use and adapt for any purpose.

In many counties, this is a bittersweet time for fans of the public domain.  For instance, this site notes the many authors whose works enter the public domain today in Europe, now that they’ve been dead for at least 70 years.  But for many European countries, this just represents reclaimed ground that had been previously lost.   Europe retroactively extended and revived copyrights from life+50 to life+70 years in 1993, so it’s still three more years before Europe’s public domain is back to what it was then.  Many other countries, including the United States, Australia, Russia, and Mexico, are in the midst of public domain freezes.  For instance, due to a 1998 copyright extension, no copyrights of published works will expire here in the US due to age for another 9 years, at least.

In the past, many people have had only a vague idea of what’s in the public domain and what isn’t.  But thanks to mass book digitization projects, the dividing line is becoming clearer.  Millions of books published before 1923 (the year of the oldest US copyrights) are now digitized, and can be found with a simple Google search and read in full online.  At the same time, millions more digitized books from 1923 and later can also be found with searches, but are not freely readable online.

Many of those works not freely readable online have languished in obscurity for a long time.   Some of them can be shown to be in the public domain after research, and groups like Hathi Trust are starting to clear and rescue many such works.  Some of them are still under copyright, but long out of print, and may have unknown or unreachable rightsholders.  The current debate over Google Books has raised the profile of these  works, so much so that the New York Times cited “orphan books”, a term used to describe such unclearable works, as one of the buzzwords of 2009.

The dividing line between the public domain and the world of copyright could well have been different.   In 1953, for instance, US copyrights ran for a maximum of 56 years, and the last of that year’s copyrights would have expired today, were it not for extensions.  Duke’s Center for the Study of the Public Domain has a page showing what could have been entering the public domain today– everything up to the close of the Korean War.  In contrast, if the current 95-year US terms had been in effect all of last century, the copyrights of 1914 would have only expired today.  Only now would we be able to start freely digitizing the first set of books from the start of World War I.

With the dividing line better known nowadays, do we have hope of protecting the public domain against more expansions of copyright?  Many countries still stick to the life+50 years term of the Berne Convention, including Canada and New Zealand.  In those countries, works from authors who died in 1959 enter the public domain for the first time.  There’s pressure on some of these countries to increase their terms, so far resisted.  Efforts to extend copyrights on sound recordings continues in Europe, and recently succeeded in Argentina.  And secret ACTA treaty negotiations are also aimed at increasing the power of copyright holders over Internet and computer users.

But resistance to these expansions of copyright is on the rise, and public awareness of copyright extensions and their deleterious effects is quite a bit higher now than when Europe and the US extended their copyrights in the 1990s.  And with concerns expressed by a number of parties over a possible Google monopoly on orphan books, one can envision building up a critical mass of interest in freeing more of these books for all to use.

So today I celebrate the incremental expansion of the public domain, and hope to help increase it further. To that end, I have a few gifts of my own.  As in previous years, I’m freeing all the copyrights I control for publications (including public online postings) that are more than 14 years old today, so any such works published in 1995 and before are now dedicated to the public domain.  Unfortunately, I don’t control the copyright of the 1995 paper that is my most widely cited work, but at least there’s an early version openly accessible online.

I can also announce the completion of a full set of digitized active copyright renewal records for drama and works prepared for oral delivery, available from this page.  This should make it easier for people to verify the public domain status of plays, sermons, lectures, radio programs, and similar works from the mid-20th century that to date have not been clearable using online resources.  We’ve also put online many copyright renewal records for images, and hope to have a complete set of active records not too far into 2010.  Among other things, this will help enable the full digitization of book illustrations, newspaper photographs, and other important parts of the historical record that might be otherwise omitted or skipped by some mass digitization projects.

Happy Public Domain Day!  May we have much to enjoy this day, and on many more Public Domain Days to come.

(Edited later in the day January 1 to fix an inaccurately worded sentence.)

« Previous PageNext Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 87 other followers