Journal liberation: A community enterprise

The fourth annual Open Access Week begins on Monday.  If you follow the official OAW website, you’ll be seeing a lot of information about the benefits of free access to scholarly research.  The amount of open-access material grows every day, but much of the research published in scholarly journals through the years is still practically inaccessible to many, due to prohibitive cost or lack of an online copy.

That situation can change, though, sometimes more dramatically than one might expect.  A post I made back in June, “Journal liberation: A Primer”, discussed the various ways in which people can open access to journal content, past and present,  one article or scanned volume at a time.  But things can go much faster if you have a large group of interested liberators working towards a common goal.

Consider the New England Journal of Medicine (NEJM), for example.  It’s one of the most prominent journals in the world, valued both for its reports on groundbreaking new research, and for its documentation, in its back issues, of nearly 200 years of American medical history.  Many other journals with lesser value still cannot be read without paying for a subscription, or visiting a research library that has paid for a subscription.  But you can find and read most of NEJM’s content freely online, both past and present. Several groups of people made this possible.  Here are some of them.

The journal’s publisher has for a number of years provided open access to all research articles more than 6 months old, from 1993 onward.  (Articles less than 6 months old are also freely available to readers in certain developing countries, and in some cases for readers elsewhere as well.)  A registration requirement was dropped in 2007.

Funders of medical research, such as the National Institutes of Health, the Wellcome Trust, and the Howard Hughes Medical Institute, have encouraged publishers in the medical field to maintain or adopt such open access policies, by requiring their grantees (who publish many of the articles in journals like the NEJM) to make their articles openly accessible within months of publication.  Some of these funders also maintain their own repositories of scholarly articles that have appeared in NEJM and similar journals.

Google Books has digitized most of the back run of the NEJM and its predecessor publications as part of its Google Books database.  Many of these volumes are freely accessible to the public.  This is not the only digital archive of this material; there’s also one on NEJM’s own website, but access there requires either a subscription or a $15 payment per article.   Google’s scans, unlike the ones on the NEJM website, include the advertisements that appeared along with the articles.  These ads document important aspects of medical history that are not as easily seen in the articles, on subjects ranging from the evolving requirements and curricula of 19th-century medical schools to the early 20th-century marketing of heroin for patients as young as 3 years old.

It’s one thing to scan journal volumes, though; it’s another to make them easy to find and use– which is why NEJM’s for-pay archive got a fair bit of publicity when it was released this summer, while Google’s scans went largely unnoticed.  As I’ve noted before, it can be extremely difficult to find all of the volumes of a multi-volume work in Google Books; and it’s even more difficult in the case of NEJM, since issues prior to 1928 were published under different journal titles.  Fortunately, many of the libraries that supplied volumes for Google’s scanners have also organized links to the scanned volumes, making it easier to track down specific volumes.  The Harvard Libraries, for instance, have a chronologically ordered list of links to most of the volumes of the journal from 1828 to 1922, a period when it was known as the Boston Medical and Surgical Journal.

For many digitized journals, open access stops after 1922, because of uncertainty about copyright.  However, most scholarly journals have public domain content after that date, so it’s possible to go further if you research journal copyrights.  Thanks to records provided by the US Copyright Office and volunteers for The Online Books Page, we can determine that issues and articles of the NEJM prior to the 1950s did not have their copyrights renewed.  With this knowledge, Hathi Trust has been able and willing to open access to many volumes from the 1930s and 1940s.

We at The Online Books Page can then pull together these volumes and articles from various sources, and create a cover page that allows people to easily get to free versions of this journal and its predecessors all the way back to 1812.

Most of the content of the New England Journal of Medicine has thus been liberated by the combined efforts of several different organizations (and other interested people).  There’s still more than can be done, both in liberating more of the content, and in making the free content easier to find and use.  But I hope this shows how widespread  journal liberation efforts of various sorts can free lots of scholarly research.  And I hope we’ll hear about many more  free scholarly articles and journals being made available, or more accessible and usable, during Open Access Week and beyond.

I’ve also had another liberation project in the works for a while, related to books, but I’ll wait until Open Access Week itself to announce it.  Watch this blog for more open access-related news, after the weekend.

As living arrows sent forth

It’s that time of year when offspring start to leave home and strike out on their own.  Young children may be starting kindergarten.  Older ones may be heading off to university.  And in between, children slowly gain a little more independence every year.  If parents are fortunate, and do our job well, we set our children going in good directions, but they then make paths for themselves.

Standards are a little like children that way.  You can invest lots of time, thought, and discussion into specifying how some set of interactions, expressions, or representations should work.  But, if you do well, what you specified will take on a life apart from you and its other parents, and make its own way in the world.  So it’s rather gratifying for me to see a couple of specifications that I’d helped parent move out into the world that way.

I’ve mentioned them both previously on this blog.  One was a fairly traditional committee effort: the DLF ILS-Discovery Interface recommendation.  After the original DLF group finished its work, a new group of folks affiliated with OCLC and the Code4lib community formed to implement the types of interfaces we’d recommended.  The new group has recently announced they’ll be supporting and contributing code to the Extensible Catalog NCIP toolkit.  This is an important step towards realizing the goal of standardized patron interaction with integrated library systems.  I’m looking forward to seeing how the project progresses, and hope I’ll hear more about it at the upcoming Digital Library Federation forum.

The other specification I’ve worked on that’s recently taken on a life of its own is the Free Decimal Correspondence (FDC).   This was a purely personal project of mine to develop a simple, freely reusable classification that was reasonably compatible with the Dewey Decimal System and the Library of Congress Subject Headings.  I created it for Public Domain Day last year, and did a few updates on it afterwards, but have largely left it on the shelf for the last while.  Now, however, it’s being used as one of the bases of the “Melvil Decimal System“, part of the Common Knowledge metadata maintained at LibraryThing.

It’s nice to see both of these efforts start to make their mark in the larger world.  I’ve seen the ILS-DI implementation work develop in good hands for a while, and I’m content at this point to watch its progress from a distance.  The Free Decimal Correspondence adoption was a bit more of a surprise, though one that was quite welcome.  (I put FDC in the public domain in part to encourage that sort of unexpected reuse.)  When the Melvil project’s use of FDC was announced, I quickly put out an update of the specification, so that recent additions and corrections I’d made could be easily reused by Melvil.

I’m still trying to figure out what further updating, if any, I should do for FDC.  Melvil already goes into more detail than FDC in many cases, and as a group project, it will most likely further outstrip FDC in size as time passes.  On the other hand, keeping in sync specifically with LC Subject Headings terminology is not necessarily a goal of Melvil’s, as it has been for FDC.  Though I’m not sure at this point if that specific feature of FDC is important to any existing or planned project out there.  And as I stated in my FDC FAQ, I don’t intend to spend a whole lot of time maintaining or supporting FDC over the long term.

But since it is getting noticeable outside use, I’ll probably spend at least some time working up to a 1.0 release.  This might simply involve making a few corrections and then declaring it done.  Or it could involve incorporating some of the information from Melvil back into FDC, to the extent that I can do so while keeping FDC in the public domain.  Or it could involve some further independent development.  To help me decide, I’d be interested in hearing from anyone who’s interested in using or developing FDC further.

Projects are never really finished until you let them go.  I’m glad to see these particular ones take flight, and hope that we in the online library community will release lots of other creations in the years to come.