Sharing journals freely online

What are all the research journals that anyone can read freely online?  The answer is harder to determine than you might think.  Most research library catalogs can be searched for online serials (here’s what Penn Libraries gives access to, for instance), but it’s often hard for unaffiliated readers to determine what they can get access to, and what will throw up a paywall when they try following a link.

Current research

The best-known listing of current free research journals has been the Directory of Open Access Journals (DOAJ), a comprehensive listing of free-to-read research journals in all areas of scholarship. Given the ease with which anyone can throw up a web site and call it a “journal” regardless of its quality or its viability, some have worried that the directory might be a little too comprehensive to be useful.  A couple of years ago, though, DOAJ instituted more stringent criteria for what it accepts, and it recently weeded its listings of journals that did not reapply under its new criteria, or did not meet its requirements.   This week I am pleased to welcome over 8,000 of its journals to the extended-shelves listings of The Online Books Page.  The catalog entries are automatically derived from the data DOAJ provides; I’m also happy to create curated entries with more detailed cataloging on readers’ request.

Historic research

Scholarly journals go back centuries.  Many of these journals (and other periodicals) remain of interest to current scholars, whether they’re interested in the history of science and culture, the state of the natural world prior to recent environmental changes, or analyses and source documents that remain directly relevant to current scholarship.  Many older serials are also included in The Online Books Page’s extended shelves courtesy of HathiTrust, which currently offers over 130,000 serial records with at least some free-to-read content.  Many of these records are not for research journals, of course, and those that are can sometimes be fragmentary or hard to navigate.  I’m also happy to create organized, curated records for journals offered by HathiTrust and others at readers’ request.

It’s important work to organize and publicize these records, because many of these journals that go back a long way don’t make their content freely available in the first place one might look.  Recently I indexed five journals founded over a century ago that are still used enough to be included in Harvard’s 250 most popular works: Isis, The Journal of Comparative Neurology, The Journal of Infectious Diseases, The Journal of Roman Studies, and The Philosophical Review.  All five had public domain content offered at their official journal site, or JSTOR, behind paywalls (with fees for access ranging from $10 to $42 per article) that was available for free elsewhere online.  I’d much rather have readers find the free content than be stymied by a paywall.  So I’m compiling free links for these and other journals with public domain runs, whether they can be found at Hathitrust, JSTOR (which does make some early journal content, including from some of these journals, freely available), or other sites.

For many of these journals, the public domain extends as late as the 1960s due to non-renewal of copyright, so I’m also tracking when copyright renewals actually start for these journals.  I’ve done a complete inventory of serials published until 1950 that renewed their own copyrights up to 1977.  Some scholarly journals are in this list, but most are not, and many that are did not renew copyrights for many years beyond 1922.  (For the five journals mentioned above, for instance, the first copyright-renewed issues were published in 1941, 1964, 1959, 1964, and 1964 respectively– 1964 being the first year for which renewals were automatic.)

Even so, major projects like HathiTrust and JSTOR have generally stopped opening journal content at 1922, partly out of a concern for the complexity of serial copyright research.  In particular, contributions to serials could have their own copyright renewals separate from renewals for the serials themselves.  Could this keep some unrenewed serials out of the public domain?  To answer this question, I’ve also started surveying information on contribution renewals, and adding information on those renewals to my inventory.  Having recently completed this survey for all 1920s serials, I can report that so far individual contributions to scholarly journals were almost never copyright-renewed on their own.  (Individual short stories, and articles for general-interest popular magazines, often were, but not articles intended for scientific or scholarly audiences.)  I’ll post an update if the situation changes in the 1930s or later. So far, though, it’s looking like, at least for research journals, serial digitization projects can start opening issues past 1922 with little risk.  There are some review requirements, but they’re comparable in complexity to the Copyright Review Management System that HathiTrust has used to successfully open access to hundreds of thousands of post-1922 public domain book volumes.

Recent research

Let’s not forget that a lot more recent research is also available freely online, often from journal publishers themselves.  DOAJ only tracks journals that make their content open access immediately, but there are also many journals that make their content freely readable online a few months or years after initial publication.  This content can then be found in repositories like PubMedCentral (see the journals noted as “Full” in the “participation” column), publishing platforms like Highwire Press (see the journals with entries in the “free back issues” column), or individual publishers’ programs such as Elsevier’s Open Archives.

Why are publishers leaving money on the table by making old but copyrighted content freely available instead of charging for it?  Often it’s because it’s what’s makes their supporters– scholars and their funders– happy.  NIH, which runs PubMedCentral, already mandates open access to research it funds, and many of the journals that fully participate in PubMedCentral’s free issue program are largely filled with NIH-backed research.  Similarly, I suspect that the high proportion of math journals in Elsevier’s Open Archives selection has something to do with the high proportion of mathematicians in the Cost of Knowledge protest against Elsevier.  When researchers, and their affiliated organizations, make their voices heard, publishers listen.

I’m happy to include listings for  significant free runs of significant research journals on The Online Books Page as well, whether they’re open access from the get-go or after a delay.  I won’t list journals that only make the occasional paid-for article available through a “hybrid” program, or those that only have sporadic “free sample” issues.  But if a journal you value has at least a continuous year’s worth of full-sized, complete issues permanently freely available, please let me know about it and I’ll be glad to check it out.

Sharing journal information

I’m not simply trying to build up my own website, though– I want to spread this information around, so that people can easily find free research journal content wherever they go.  Right now, I have a Dublin Core OAI feed for all curated Online Books Page listings as well as a monthly dump of my raw data file, both CC0-licensed.  But I think I could do more to get free journal information to libraries and other interested parties.  I don’t have MARC records for my listings at the moment, but I suspect that holdings information– what issues of which journals are freely available, and from whom– is more useful for me to provide than bibliographic descriptions of the journals (which can already be obtained from various other sources).  Would a KBART file, published online or made available to initiatives like the Global Open Knowledgebase, be useful?  Or would something else work better to get this free journal information more widely known and used?

Issues and volumes vs. articles

Of course, many articles are made available online individually as well, as many journal publishers allow.  I don’t have the resources at this point to track articles at an individual level, but there are a growing number of other efforts that do, whether they’re proprietary but comprehensive search platforms like Google Scholar and Web of Science, disciplinary repositories like ArXiV and SSRN, institutional repositories and their aggregators like SHARE and BASE, or outright bootleg sites like Sci-Hub.  We know from them that it’s possible to index and provide access to the scholarly knowledge exchange at a global scale, but doing it accurately, openly, comprehensively, sustainably, and ethically is a bigger challenge.   I think it’s a challenge that the academic community can solve if we make it a priority.  We created the research; let’s also make it easy for the world to access it, learn from it, and put it to work.  Let’s make open access to research articles the norm, not the exception.

And as part of that, if you’d like to help me highlight and share information on free, authorized sources for online journal content, please alert me to relevant journals, make suggestions in the comments here, or get in touch with me offline.

Updates on library linking, Wikipedia, and what you can do

I’m gratified for the positive response I’ve been getting to the Forward To Libraries service I first introduced last month.  It really took off when I announced the templates for linking to libraries from Wikipedia a couple of weeks ago.   They’ve been written up in places like Boing Boing and in Wikipedia’s own Signpost newsletter.   The service now includes more than 150 libraries throughout the English-speaking world.  Various Wikipedia editors are also adding the link templates to various articles–  besides the handful I added myself, more than 450 have been added by other editors at this writing.  And I’ve heard from numerous librarians who now want to start editing Wikipedia themselves, both to add library links and to otherwise improve articles.  (Here’s how to become a Wikipedia editor.)

So far, I’ve largely provided this service on my own, with support from the University of Pennsylvania Libraries.   But I’d like to make the service more useful, and could use some help.  If you’re interested, here are some things you might want to know:

Some libraries are easier to link than others.   If you’re using one of many standard library catalogs or discovery systems, and you haven’t made substantial modifications to it, it’s easy for me to add your system. I basically just record what software you’re using and where on the Web the service runs, run some test searches to verify your system, and you’re good to go.  If you’re using a more customized, obscure, or home-grown system, I might still be able to add links to it, but it may take me more effort to figure out how to make useful search links into the system.  Any information you can provide would be helpful.  There are also certain off-the-shelf systems that I have problems with.  Many Polaris systems, for example, will give a “session timed out” message the first time you try to follow a search link into the system.   (Back up and try the link again, and everything will be fine for some time afterwards.)  Some other systems don’t seem to support deep search links in any consistent way that I’ve been able to determine, and not just some very old session-based systems, but also EBSCO’s fairly new EDS discovery platform.

I’ve determined ways to link into these various systems from reading various documentation files I’ve found on the public Internet, along with some reverse-engineering of public web sites.  If you know of better ways to link to some of these systems that I haven’t yet figured out myself, and this information can be made public, let me know.

For now, I’m declining to list libraries that don’t have many English-language subject or Library of Congress name headings, because the results of English searches in those libraries will be misleadingly incomplete.  But I’m considering ways to include translated searches, where the data to support this is available, for a wider range of countries.  (VIAF already provides much relevant data for names.)

The most popular new Wikipedia Library resource template is also controversial, and might be modified or deleted.   I provide a number of different templates for linking from Wikipedia to libraries, including the inlined text templates “Library resources about” and “Library resources by“, and the all-in-one sidebar template “Library resources box“. By far the most used of these templates has been the Library resources box.   It’s easy to spot in an article, it organizes links clearly, and it’s easy for editors to recognize as a template that they can add to articles they find of interest.  But some Wikipedians, including at least one Wikipedia admin, have objected to the template.  They cite style guidelines that say external link templates should not use boxes or other graphical elements, but only appear as inlined text.  I’ve defended the boxes, noted how other library-related external links commonly appear in boxes, and proposed ways to address various Wikipedian concerns.   But it’s ultimately up to the Wikipedia community to determine whether or how library links will appear in Wikipedia articles.  To find out more about the issues, see the Library resources box talk page.  And if you’re a Wikipedia editor or user, feel free to weigh in on that page or other relevant forums.

I’m exploring ways to make it easier for readers to get to our libraries.  For one, I’m starting to record IP ranges for some institutions, so that local network users can follow “resources in your library” links straight to the institution’s library, without having to first register a preference.  (Users can still register a different preference if they want.)  IP-based routing is an experimental service, initially being provided to a limited number of institutions, and I may modify or withdraw it in the future.  If you’d like me to consider it for your institution, you can submit a request, with the relevant IP ranges (preferably in CIDR format) in the “anything we should know?” field.  Note that the IP ranges you submit will be published as part of the library data I’m sharing for this project.

I’m starting to share my work on Github.  There is now a Github repository with selected data and code for the FTL project.  In it, you’ll find the data I use to link to the libraries enrolled in the service, and you’ll also see the code for the main CGI script used to forward readers to those libraries.   You can’t yet run the service out of the box yourself with the code and data provided so far, but I hope that what’s there will help people understand how the service works, and possibly implement similar services themselves if they’re so inclined.  The data’s released under CC0, so you can reuse it however you like; and the code is open-source licensed under the Educational Community License 2.0.  I hope to add more data and code over time, and I’m happy to hear suggestions for enhancements and improvements.

I’m hoping that as more people get involved, the service will improve, library resources will become more reachable online, and Wikipedia will become a more useful resource as well.  If you’d like to get involved yourself, I’d love to hear what you’re up to, and what suggestions you might have.

From Wikipedia to our libraries

I’ve heard the lament in more than one library discussion over the years.  “People aren’t coming to our library like they should,” librarians have told me.  “We’ve got a rich collection, and we’ve expended lots of resources on an online presence, but lots of our patrons just go to Google and Wikipedia without checking to see what we have.”  The pattern of quick online information-finding using search engines and Wikipedia is well-known enough that it has its own acronym: GWR, for Google -> Wikipedia -> References.  (David White gives a good description of that pattern in the linked article.)

Some people I’ve talked to think we should break this pattern.  With the right search tool or marketing plan, some say, we can get patrons to start with us first, instead of Google or Wikipedia.  This idea seems to me both futile and beside the point.  Between them, Google and Wikipedia cover a vast array of online information, more than librarians could hope to replicate or index ourselves in that medium.  Also, if we truly have better resources available in our libraries than can be found on the open Web, it’s less important that our researchers start from our libraries’ websites than that they end up finding the knowledge resources our libraries make available to them.

Looked at the right way, Wikipedia can be a big help in making online readers aware of their library’s offerings.  One of the things we spend a lot of time on in libraries is organizing information into distinct, conceptual categories.  That’s what Wikipedia does too: so far,  their English edition has over 4 million concepts identified, described, and often populated with reference links.  And Wikipedia has encouraged people to add links to relevant digital library collections on various topics, through programs like Wikipedia Loves Libraries and Wikipedian in Residence programs.  But while these programs help bring some library resources online, and direct people to those selected resources, there’s still a lot of other relevant library material that users can’t get to via Wikipedia, but can via the libraries that are near them.

So how do we get people from Wikipedia articles to the related offerings of our local libraries?  Essentially we need three things: First, we need ways to embed links in Wikipedia to the libraries that readers use.  (We can’t reasonably add individual links from an article to each library out there, because there are too many of them– there has to be a way that each Wikipedia reader can get to their own favored libraries via the same links.)  Second, we need ways to derive appropriate library concepts and local searches from the subjects of Wikipedia articles, so the links go somewhere useful.  Finally, we need good summaries of the resources a reader’s library makes available on those concepts, so the links end up showing something useful.  With all of these in place, it should be possible for researchers to get from a Wikipedia article on a topic straight to a guide to their local library’s offerings on that topic in a single click.

I’ve developed some tools to enable these one-click Wikipedia -> library transitions.  For the first thing we need, I’ve created a set of Wikipedia templates for adding library links. The documentation for the Library resources box template, for instance, describes how to use it to create a sidebar box with links to resources about (or by) the topic of  a Wikipedia article in a reader’s library, or in another library a reader might want to consult.  (There’s also an option for direct links to my Online Books Page, if there are relevant books online; it may be easier in some cases for readers to access those than to access their local library’s books.)

For the links to work, we need to know about the reader’s preferred library.  Users can register their preferred library (which will set a cookie in their browser recording that choice), or select it for each individual search.  We know how to link to several dozen libraries so far, and can add more libraries on requestWorldcat.org, which includes holdings of thousands of libraries worldwide, is also an option.  Besides the “Library resources box” template, I’ve also provided templates for in-text links to library resources, if those work better in a given article.  Links to these templates can be found at the end of the “Library resources box” documentation.

For the second thing we need, I’ve created a library forwarding service (“Forward to Libraries”, or FTL– catchier name suggestions welcome) that transforms links from Wikipedia into searches for appropriate  headings or keywords in local libraries.  This is the same service I describe in my “From my library to yours” blog post from last month, but it now supports links from Wikipedia as well as to Wikipedia.

Thanks to information included in the Library of Congress’ Authorities and Vocabularies datasets, OCLC’s VIAF data feeds, Wikipedia’s database downloads, and my own metadata compiled at The Online Books Page, FTL already knows how to link directly to over 240,000 distinct authority-controlled headings known to the Library of Congress from their corresponding Wikipedia articles.   (Library of Congress headings are used in most sizable US libraries, and many English-language libraries outside the US also use similar headings.)

For other articles, FTL by default will try a general keyword search based on the Wikipedia article’s title, which will often turn up useful results at the destination library.  Alternatively, my templates allow Wikipedia editors to determine a specific Library of Congress heading to use in library links, if appropriate.  I’m hoping to incorporate suggested headings into FTL’s own knowledge base as I detect them showing up in Wikipedia articles.  I also plan to publish FTL’s data sets under open access terms, so that others can use and improve on them as well.

The third part of this solution– displaying relevant resources at the destination library— can be implemented differently at each library.  For most of the libraries in FTL’s current knowledge base, links go to searches in the library’s regular online catalog.  But with some libraries, I’ve linked to another discovery system, if it seems to be the main search promoted at that library, and it seems to produce useful results.  The Online Books Page’s subject map displays also have features that I think will be useful to Wikipedia subject researchers arriving at my site, such as also showing related subjects and books filed under those subjects.  I hope in future posts to talk more about other useful guideposts and contextual information we could be providing to readers arriving from Wikipedia.

But if you’ve read this far, you probably want to see how this all works in practice.  So I’ve added some example library resources boxes in a few Wikipedia articles that seemed particularly relevant this month, including those for Women’s history, Elizabeth Cady Stanton, and Flannery O’Connor.  Look down in the “External links” or “Further reading” sections of those articles for the boxes, and view the page source of the articles to see how those boxes are constructed.

As with most things related to Wikipedia, this service is experimental, and subject to change (and, hopefully,  improvement) over time.  I’d love to hear thoughts and suggestions from users and maintainers of Wikipedia and libraries.  And if you find creating these sort of links from Wikipedia useful, and need help getting started, I’d be happy to help you bring them to your favorite Wikipedia topics and local libraries, as time permits.