Sharing journals freely online

What are all the research journals that anyone can read freely online?  The answer is harder to determine than you might think.  Most research library catalogs can be searched for online serials (here’s what Penn Libraries gives access to, for instance), but it’s often hard for unaffiliated readers to determine what they can get access to, and what will throw up a paywall when they try following a link.

Current research

The best-known listing of current free research journals has been the Directory of Open Access Journals (DOAJ), a comprehensive listing of free-to-read research journals in all areas of scholarship. Given the ease with which anyone can throw up a web site and call it a “journal” regardless of its quality or its viability, some have worried that the directory might be a little too comprehensive to be useful.  A couple of years ago, though, DOAJ instituted more stringent criteria for what it accepts, and it recently weeded its listings of journals that did not reapply under its new criteria, or did not meet its requirements.   This week I am pleased to welcome over 8,000 of its journals to the extended-shelves listings of The Online Books Page.  The catalog entries are automatically derived from the data DOAJ provides; I’m also happy to create curated entries with more detailed cataloging on readers’ request.

Historic research

Scholarly journals go back centuries.  Many of these journals (and other periodicals) remain of interest to current scholars, whether they’re interested in the history of science and culture, the state of the natural world prior to recent environmental changes, or analyses and source documents that remain directly relevant to current scholarship.  Many older serials are also included in The Online Books Page’s extended shelves courtesy of HathiTrust, which currently offers over 130,000 serial records with at least some free-to-read content.  Many of these records are not for research journals, of course, and those that are can sometimes be fragmentary or hard to navigate.  I’m also happy to create organized, curated records for journals offered by HathiTrust and others at readers’ request.

It’s important work to organize and publicize these records, because many of these journals that go back a long way don’t make their content freely available in the first place one might look.  Recently I indexed five journals founded over a century ago that are still used enough to be included in Harvard’s 250 most popular works: Isis, The Journal of Comparative Neurology, The Journal of Infectious Diseases, The Journal of Roman Studies, and The Philosophical Review.  All five had public domain content offered at their official journal site, or JSTOR, behind paywalls (with fees for access ranging from $10 to $42 per article) that was available for free elsewhere online.  I’d much rather have readers find the free content than be stymied by a paywall.  So I’m compiling free links for these and other journals with public domain runs, whether they can be found at Hathitrust, JSTOR (which does make some early journal content, including from some of these journals, freely available), or other sites.

For many of these journals, the public domain extends as late as the 1960s due to non-renewal of copyright, so I’m also tracking when copyright renewals actually start for these journals.  I’ve done a complete inventory of serials published until 1950 that renewed their own copyrights up to 1977.  Some scholarly journals are in this list, but most are not, and many that are did not renew copyrights for many years beyond 1922.  (For the five journals mentioned above, for instance, the first copyright-renewed issues were published in 1941, 1964, 1959, 1964, and 1964 respectively– 1964 being the first year for which renewals were automatic.)

Even so, major projects like HathiTrust and JSTOR have generally stopped opening journal content at 1922, partly out of a concern for the complexity of serial copyright research.  In particular, contributions to serials could have their own copyright renewals separate from renewals for the serials themselves.  Could this keep some unrenewed serials out of the public domain?  To answer this question, I’ve also started surveying information on contribution renewals, and adding information on those renewals to my inventory.  Having recently completed this survey for all 1920s serials, I can report that so far individual contributions to scholarly journals were almost never copyright-renewed on their own.  (Individual short stories, and articles for general-interest popular magazines, often were, but not articles intended for scientific or scholarly audiences.)  I’ll post an update if the situation changes in the 1930s or later. So far, though, it’s looking like, at least for research journals, serial digitization projects can start opening issues past 1922 with little risk.  There are some review requirements, but they’re comparable in complexity to the Copyright Review Management System that HathiTrust has used to successfully open access to hundreds of thousands of post-1922 public domain book volumes.

Recent research

Let’s not forget that a lot more recent research is also available freely online, often from journal publishers themselves.  DOAJ only tracks journals that make their content open access immediately, but there are also many journals that make their content freely readable online a few months or years after initial publication.  This content can then be found in repositories like PubMedCentral (see the journals noted as “Full” in the “participation” column), publishing platforms like Highwire Press (see the journals with entries in the “free back issues” column), or individual publishers’ programs such as Elsevier’s Open Archives.

Why are publishers leaving money on the table by making old but copyrighted content freely available instead of charging for it?  Often it’s because it’s what’s makes their supporters– scholars and their funders– happy.  NIH, which runs PubMedCentral, already mandates open access to research it funds, and many of the journals that fully participate in PubMedCentral’s free issue program are largely filled with NIH-backed research.  Similarly, I suspect that the high proportion of math journals in Elsevier’s Open Archives selection has something to do with the high proportion of mathematicians in the Cost of Knowledge protest against Elsevier.  When researchers, and their affiliated organizations, make their voices heard, publishers listen.

I’m happy to include listings for  significant free runs of significant research journals on The Online Books Page as well, whether they’re open access from the get-go or after a delay.  I won’t list journals that only make the occasional paid-for article available through a “hybrid” program, or those that only have sporadic “free sample” issues.  But if a journal you value has at least a continuous year’s worth of full-sized, complete issues permanently freely available, please let me know about it and I’ll be glad to check it out.

Sharing journal information

I’m not simply trying to build up my own website, though– I want to spread this information around, so that people can easily find free research journal content wherever they go.  Right now, I have a Dublin Core OAI feed for all curated Online Books Page listings as well as a monthly dump of my raw data file, both CC0-licensed.  But I think I could do more to get free journal information to libraries and other interested parties.  I don’t have MARC records for my listings at the moment, but I suspect that holdings information– what issues of which journals are freely available, and from whom– is more useful for me to provide than bibliographic descriptions of the journals (which can already be obtained from various other sources).  Would a KBART file, published online or made available to initiatives like the Global Open Knowledgebase, be useful?  Or would something else work better to get this free journal information more widely known and used?

Issues and volumes vs. articles

Of course, many articles are made available online individually as well, as many journal publishers allow.  I don’t have the resources at this point to track articles at an individual level, but there are a growing number of other efforts that do, whether they’re proprietary but comprehensive search platforms like Google Scholar and Web of Science, disciplinary repositories like ArXiV and SSRN, institutional repositories and their aggregators like SHARE and BASE, or outright bootleg sites like Sci-Hub.  We know from them that it’s possible to index and provide access to the scholarly knowledge exchange at a global scale, but doing it accurately, openly, comprehensively, sustainably, and ethically is a bigger challenge.   I think it’s a challenge that the academic community can solve if we make it a priority.  We created the research; let’s also make it easy for the world to access it, learn from it, and put it to work.  Let’s make open access to research articles the norm, not the exception.

And as part of that, if you’d like to help me highlight and share information on free, authorized sources for online journal content, please alert me to relevant journals, make suggestions in the comments here, or get in touch with me offline.

Public Domain Day 2015: Ending our own enclosures

It’s the start of the new year, which, as many of my readers know, marks another Public Domain Day, when a year’s worth of creative work becomes free for anyone to use in many countries.

In countries where copyrights have been extended to life plus 70 years, works by people like Piet Mondrian, Edith Durham, Glenn Miller, and Ethel Lina White enter the public domain.  In countries that have resisted ongoing efforts to extend copyrights past life + 50 years, 2015 sees works by people like Flannery O’Connor, E. J. Pratt, Ian Fleming, Rachel Carson, and T. H. White enter the public domain. And in the US, once again no published works enter the public domain due to an ongoing freeze in copyright expirations (though some well-known works might have if we still had the copyright laws in effect when they were created.)

But we’re actually getting something new worth noting this year.  Today we’re seeing scholarship-quality transcriptions of tens of thousands of early English books — the EEBO Text Creation Partnership Phase I texts — become available free of charge to the general public for the first time.  (As I write this, the books aren’t accessible yet, but I expect they will be once the folks in the project come back to work from the holiday.)  (UpdateIt looks like files and links are now on Github; hopefully more user-friendly access points are in the works as well.)

This isn’t a new addition to the public domain; the books being transcribed have been in the public domain for some time.  But it’s the first time many of them are generally available in a form that’s easily searchable and isn’t riddled with OCR errors.  For the rarer works, it’s the first time they’re available freely across the world in any form.  It’s important to recognize this milestone as well, because taking advantage of the public domain requires not just copyrights expiring or being waived, but also people dedicated to making the public domain available to the public.

And that is where we who work in institutions dedicated to learning, knowledge, and memory have unique opportunities and responsibilities.   Libraries, galleries, archives, and museums have collected and preserved much of the cultural heritage that is now in the public domain, and that is often not findable– and generally not shareable– anywhere else.  That heritage becomes much more useful and valuable when we share it freely with the whole world online than when we only give access to people who can get to our physical collections, or who can pay the fees and tolerate the usage restrictions of restricted digitized collections.

So whether or not we’re getting new works in the public domain this year, we have a lot of work to do this year, and the years to follow, in making that work available to the world.  Wherever and whenever possible, those of us whose mission focuses more on knowledge than commerce should commit to having that work be as openly accessible as possible, as soon as possible.

That doesn’t mean we shouldn’t work with the commercial sector, or respect their interests as well.  After all, we wouldn’t have seen nearly so many books become readable online in the early years of this century if it weren’t for companies like Google, Microsoft, and ProQuest digitizing them at much larger scale than libraries had previously done on their own.  As commercial firms, they’re naturally looking to make some money by doing so.  But they need us as much as we need them to digitize the materials we hold, so we have the power and duty to ensure that when we work with them, our agreements fulfill our missions to spread knowledge widely as well as their missions to earn a profit.

We’ve done better at this in some cases than in others.   I’m happy that many of the libraries who partnered with Google in their book scanning program retained the rights to preserve those scans themselves and make them available to the world in HathiTrust.   (Though it’d be nice if the Google-imposed restrictions on full-book downloads from there eventually expired.)  I’m happy that libraries who made deals with ProQuest in the 1990s to digitize old English books that no one else was then digitizing had the foresight to secure the right to make transcriptions of those books freely available to the world today.  I’m less happy that there’s no definite release date yet for some of the other books in the collection (the ones in Phase II, where the 5-year timer for public release doesn’t count down until that phase’s as-yet-unclear completion date), and that there appears to be no plan to make the page images freely available.

Working together, we in knowledge institutions can get around the more onerous commercial restrictions put on the public domain.  I have no issue with firms that make a reasonable profit by adding value– if, for instance, Melville House can quickly sell lots of printed and digitally transcribed copies of the US Senate Torture report for under $20, more power to them.  People who want to pay for the convenience of those editions can do so, and free public domain copies from the Senate remains available for those who want to read and repurpose them.

But when I hear about firms like Taylor and Francis charging as much as $48 to nonsubscribers to download a 19th century public domain article from their website for the Philosophical Magazine, I’m going to be much more inclined to take the time to promote free alternatives scanned by others.  And we can make similar bypasses of not-for-profit gatekeepers when necessary.  I sympathize with Canadian institutions having to deal with drastic funding cuts, which seem to have prompted Early Canadiana Online to put many of their previously freely available digitized books behind paywalls– but I still switched my links as soon as I could to free copies of most of the same books posted at the Internet Archive.  (I expect that increasing numbers of free page scans of the titles represented in Early English Books Online will show up there and elsewhere over time as well, from independent scanning projects if not from ProQuest.)

Assuming we can hold off further extensions to copyright (which, as I noted last year, is a battle we need to show up for now), four years from now we’ll finally have more publication copyrights expiring into the public domain in the US.  But there’s a lot of work we in learning and memory institutions can do now in making our public domain works available to the world.  For that matter, there’s a lot we can do in making the many copyrighted works we create available to the world in free and open forms.  We saw a lot of progress in that respect in 2014: Scholars and funders are increasingly shifting from closed-access to open-access publication strategies.  A coalition of libraries has successfully crowdfunded open-access academic monographs for less cost to them than for similar closed-access print books.  And a growing number of academic authors and nonprofit publishers are making open access versions of their works, particularly older works, freely available to world while still sustaining themselves.  Today, for instance, I’ll be starting to list on The Online Books Page free copies of books that Ohio State University Press published in 2009, now that a 5-year-limited paywall has expired on those titles.  And, as usual, I’m also dedicating a year’s worth of 15-year-old copyrights I control (in this case, for work I made public in 2000) to the public domain today, since the 14-year initial copyright term that the founders of the United States first established is plenty long for most of what I do.

As we celebrate Public Domain Day today, let’s look to the works that we ourselves oversee, and resolve to bring down enclosures and provide access to as much of that work as we can.

Updates on library linking, Wikipedia, and what you can do

I’m gratified for the positive response I’ve been getting to the Forward To Libraries service I first introduced last month.  It really took off when I announced the templates for linking to libraries from Wikipedia a couple of weeks ago.   They’ve been written up in places like Boing Boing and in Wikipedia’s own Signpost newsletter.   The service now includes more than 150 libraries throughout the English-speaking world.  Various Wikipedia editors are also adding the link templates to various articles–  besides the handful I added myself, more than 450 have been added by other editors at this writing.  And I’ve heard from numerous librarians who now want to start editing Wikipedia themselves, both to add library links and to otherwise improve articles.  (Here’s how to become a Wikipedia editor.)

So far, I’ve largely provided this service on my own, with support from the University of Pennsylvania Libraries.   But I’d like to make the service more useful, and could use some help.  If you’re interested, here are some things you might want to know:

Some libraries are easier to link than others.   If you’re using one of many standard library catalogs or discovery systems, and you haven’t made substantial modifications to it, it’s easy for me to add your system. I basically just record what software you’re using and where on the Web the service runs, run some test searches to verify your system, and you’re good to go.  If you’re using a more customized, obscure, or home-grown system, I might still be able to add links to it, but it may take me more effort to figure out how to make useful search links into the system.  Any information you can provide would be helpful.  There are also certain off-the-shelf systems that I have problems with.  Many Polaris systems, for example, will give a “session timed out” message the first time you try to follow a search link into the system.   (Back up and try the link again, and everything will be fine for some time afterwards.)  Some other systems don’t seem to support deep search links in any consistent way that I’ve been able to determine, and not just some very old session-based systems, but also EBSCO’s fairly new EDS discovery platform.

I’ve determined ways to link into these various systems from reading various documentation files I’ve found on the public Internet, along with some reverse-engineering of public web sites.  If you know of better ways to link to some of these systems that I haven’t yet figured out myself, and this information can be made public, let me know.

For now, I’m declining to list libraries that don’t have many English-language subject or Library of Congress name headings, because the results of English searches in those libraries will be misleadingly incomplete.  But I’m considering ways to include translated searches, where the data to support this is available, for a wider range of countries.  (VIAF already provides much relevant data for names.)

The most popular new Wikipedia Library resource template is also controversial, and might be modified or deleted.   I provide a number of different templates for linking from Wikipedia to libraries, including the inlined text templates “Library resources about” and “Library resources by“, and the all-in-one sidebar template “Library resources box“. By far the most used of these templates has been the Library resources box.   It’s easy to spot in an article, it organizes links clearly, and it’s easy for editors to recognize as a template that they can add to articles they find of interest.  But some Wikipedians, including at least one Wikipedia admin, have objected to the template.  They cite style guidelines that say external link templates should not use boxes or other graphical elements, but only appear as inlined text.  I’ve defended the boxes, noted how other library-related external links commonly appear in boxes, and proposed ways to address various Wikipedian concerns.   But it’s ultimately up to the Wikipedia community to determine whether or how library links will appear in Wikipedia articles.  To find out more about the issues, see the Library resources box talk page.  And if you’re a Wikipedia editor or user, feel free to weigh in on that page or other relevant forums.

I’m exploring ways to make it easier for readers to get to our libraries.  For one, I’m starting to record IP ranges for some institutions, so that local network users can follow “resources in your library” links straight to the institution’s library, without having to first register a preference.  (Users can still register a different preference if they want.)  IP-based routing is an experimental service, initially being provided to a limited number of institutions, and I may modify or withdraw it in the future.  If you’d like me to consider it for your institution, you can submit a request, with the relevant IP ranges (preferably in CIDR format) in the “anything we should know?” field.  Note that the IP ranges you submit will be published as part of the library data I’m sharing for this project.

I’m starting to share my work on Github.  There is now a Github repository with selected data and code for the FTL project.  In it, you’ll find the data I use to link to the libraries enrolled in the service, and you’ll also see the code for the main CGI script used to forward readers to those libraries.   You can’t yet run the service out of the box yourself with the code and data provided so far, but I hope that what’s there will help people understand how the service works, and possibly implement similar services themselves if they’re so inclined.  The data’s released under CC0, so you can reuse it however you like; and the code is open-source licensed under the Educational Community License 2.0.  I hope to add more data and code over time, and I’m happy to hear suggestions for enhancements and improvements.

I’m hoping that as more people get involved, the service will improve, library resources will become more reachable online, and Wikipedia will become a more useful resource as well.  If you’d like to get involved yourself, I’d love to hear what you’re up to, and what suggestions you might have.

From Wikipedia to our libraries

I’ve heard the lament in more than one library discussion over the years.  “People aren’t coming to our library like they should,” librarians have told me.  “We’ve got a rich collection, and we’ve expended lots of resources on an online presence, but lots of our patrons just go to Google and Wikipedia without checking to see what we have.”  The pattern of quick online information-finding using search engines and Wikipedia is well-known enough that it has its own acronym: GWR, for Google -> Wikipedia -> References.  (David White gives a good description of that pattern in the linked article.)

Some people I’ve talked to think we should break this pattern.  With the right search tool or marketing plan, some say, we can get patrons to start with us first, instead of Google or Wikipedia.  This idea seems to me both futile and beside the point.  Between them, Google and Wikipedia cover a vast array of online information, more than librarians could hope to replicate or index ourselves in that medium.  Also, if we truly have better resources available in our libraries than can be found on the open Web, it’s less important that our researchers start from our libraries’ websites than that they end up finding the knowledge resources our libraries make available to them.

Looked at the right way, Wikipedia can be a big help in making online readers aware of their library’s offerings.  One of the things we spend a lot of time on in libraries is organizing information into distinct, conceptual categories.  That’s what Wikipedia does too: so far,  their English edition has over 4 million concepts identified, described, and often populated with reference links.  And Wikipedia has encouraged people to add links to relevant digital library collections on various topics, through programs like Wikipedia Loves Libraries and Wikipedian in Residence programs.  But while these programs help bring some library resources online, and direct people to those selected resources, there’s still a lot of other relevant library material that users can’t get to via Wikipedia, but can via the libraries that are near them.

So how do we get people from Wikipedia articles to the related offerings of our local libraries?  Essentially we need three things: First, we need ways to embed links in Wikipedia to the libraries that readers use.  (We can’t reasonably add individual links from an article to each library out there, because there are too many of them– there has to be a way that each Wikipedia reader can get to their own favored libraries via the same links.)  Second, we need ways to derive appropriate library concepts and local searches from the subjects of Wikipedia articles, so the links go somewhere useful.  Finally, we need good summaries of the resources a reader’s library makes available on those concepts, so the links end up showing something useful.  With all of these in place, it should be possible for researchers to get from a Wikipedia article on a topic straight to a guide to their local library’s offerings on that topic in a single click.

I’ve developed some tools to enable these one-click Wikipedia -> library transitions.  For the first thing we need, I’ve created a set of Wikipedia templates for adding library links. The documentation for the Library resources box template, for instance, describes how to use it to create a sidebar box with links to resources about (or by) the topic of  a Wikipedia article in a reader’s library, or in another library a reader might want to consult.  (There’s also an option for direct links to my Online Books Page, if there are relevant books online; it may be easier in some cases for readers to access those than to access their local library’s books.)

For the links to work, we need to know about the reader’s preferred library.  Users can register their preferred library (which will set a cookie in their browser recording that choice), or select it for each individual search.  We know how to link to several dozen libraries so far, and can add more libraries on requestWorldcat.org, which includes holdings of thousands of libraries worldwide, is also an option.  Besides the “Library resources box” template, I’ve also provided templates for in-text links to library resources, if those work better in a given article.  Links to these templates can be found at the end of the “Library resources box” documentation.

For the second thing we need, I’ve created a library forwarding service (“Forward to Libraries”, or FTL– catchier name suggestions welcome) that transforms links from Wikipedia into searches for appropriate  headings or keywords in local libraries.  This is the same service I describe in my “From my library to yours” blog post from last month, but it now supports links from Wikipedia as well as to Wikipedia.

Thanks to information included in the Library of Congress’ Authorities and Vocabularies datasets, OCLC’s VIAF data feeds, Wikipedia’s database downloads, and my own metadata compiled at The Online Books Page, FTL already knows how to link directly to over 240,000 distinct authority-controlled headings known to the Library of Congress from their corresponding Wikipedia articles.   (Library of Congress headings are used in most sizable US libraries, and many English-language libraries outside the US also use similar headings.)

For other articles, FTL by default will try a general keyword search based on the Wikipedia article’s title, which will often turn up useful results at the destination library.  Alternatively, my templates allow Wikipedia editors to determine a specific Library of Congress heading to use in library links, if appropriate.  I’m hoping to incorporate suggested headings into FTL’s own knowledge base as I detect them showing up in Wikipedia articles.  I also plan to publish FTL’s data sets under open access terms, so that others can use and improve on them as well.

The third part of this solution– displaying relevant resources at the destination library— can be implemented differently at each library.  For most of the libraries in FTL’s current knowledge base, links go to searches in the library’s regular online catalog.  But with some libraries, I’ve linked to another discovery system, if it seems to be the main search promoted at that library, and it seems to produce useful results.  The Online Books Page’s subject map displays also have features that I think will be useful to Wikipedia subject researchers arriving at my site, such as also showing related subjects and books filed under those subjects.  I hope in future posts to talk more about other useful guideposts and contextual information we could be providing to readers arriving from Wikipedia.

But if you’ve read this far, you probably want to see how this all works in practice.  So I’ve added some example library resources boxes in a few Wikipedia articles that seemed particularly relevant this month, including those for Women’s history, Elizabeth Cady Stanton, and Flannery O’Connor.  Look down in the “External links” or “Further reading” sections of those articles for the boxes, and view the page source of the articles to see how those boxes are constructed.

As with most things related to Wikipedia, this service is experimental, and subject to change (and, hopefully,  improvement) over time.  I’d love to hear thoughts and suggestions from users and maintainers of Wikipedia and libraries.  And if you find creating these sort of links from Wikipedia useful, and need help getting started, I’d be happy to help you bring them to your favorite Wikipedia topics and local libraries, as time permits.