Why The Online Books Page is black for January 18

As I mentioned in my last post, the US Congress is currently considering two bills, the Stop Online Piracy Act (SOPA) and the PROTECT IP Act (PIPA),that would make it easy for copyright infringement complaints (whether ultimately justified or not) to wipe entire sites off the Net by various means, with little recourse or due process for site owners.

As the Electronic Frontier Foundation points out, these bills, if enacted, threaten censorship of a wide variety of sites that host controversial content or unfiltered public discourse, not just flagrant bootleg sites.   Sites hosting online books, in particular, could be cut off in various ways if they host a book that someone says infringes copyright in some way. (Even the threat of wholesale cutoff could cause them to take the book down, without any sort of judicial hearing.)  Even linking to a site that has content that’s the subject of a complaint could put a site at risk.

Many sites are “going dark” in various ways on the 18th, to raise awareness of these bills and show what it could be like if they became targets of SOPA or PIPA-enabled censorship.  This includes a number of the sites linked to from The Online Books Page.  For example, the Internet Archive, which hosts 2 million volumes, is out of service for 12 hours on the 18th.

The Online Books Page will not go offline, but we will turn many of our pages black for the 18th, as a warning both that some of the links on the site may be out of service, and that the site itself, which links to more than 1.4 million books on thousands of sites around the world, could be at risk if the bills currently under consideration in Congress pass.

My objection to the bills is not an objection to opposing copyright violations.  As the US Constitution recognizes, appropriately bounded copyrights serve a useful purpose in “promoting the progress of science and arts“, and a fair bit of the time I spend on The Online Books Page is devoted towards making sure the online books I curate do in fact comply with applicable copyright law.   Without clear and reasonable boundaries, though, copyright and its enforcement can inhibit rather than promote the progress of knowledge and the arts, by becoming tools of censorship and chilled speech.  I believe the current bills in Congress unfortunately do that.  If you are concerned about them as well, I encourage you to contact members of Congress to make your concerns known.

Public Domain Day 2012: Five things we can do in the US

It’s New Year’s Day again, and in much of the world, this means another year’s worth of works enter the public domain.  That’s a cause for celebration, as Europe and many other countries that have “life+70 years” copyright terms welcome works by James Joyce, Virginia Woolf, Jelly Roll Morton, and Elizabeth von Arnim into the public domain.  The Communia Project’s Public Domain Day website focuses on works by these and many other authors that are entering (in many cases, re-entering) the public domain in “life+70 years” countries.  Meanwhie, folks in Canada, New Zealand, and other countries that have held the line at the “life+50 years” terms of the Berne Convention can now freely enjoy the works of people like James Thurber, Ernest Hemingway, and H.D.

There’s not so much excitement about Public Domain Day in the US, where no published works are scheduled to enter the public domain for another 7 years, due to a 20-year copyright extension enacted in 1998.  But Americans don’t have to simply sigh and contemplate what might have been if our copyright terms hadn’t been extended.  The new year still provides a number of important opportunities for Americans to improve access to the public domain.

1. Find and free newly public domain unpublished works

Some works are going into the public domain in the US today: works never published prior to 2003 (or copyrighted under US law prior to 1978) by authors who died in 1941– the same authors whose published works go into the public domain in Europe today.

But who would care about such obscure works? one might ask.  Well, if you’re at all interested in understanding the dense, allusion-laden fiction of Joyce, or the psychology of Woolf, or the jurisprudential thinking of Louis Brandeis, or the inner lives of any of the rest of the “class of 1941”, having the right to freely access, publish, and build on their unpublished works can be crucial.

Up until now, for instance, scholars studying James Joyce have often been frustrated by sharp restrictions and legal threats made by the administrator of Joyce’s literary estate.  In 2008, Rebecca Ganz characterized the administrator thus: “[His] primary purpose is to quell any scholarship that he finds distasteful or an invasion of his family’s privacy. He has a history of harassing authors and artists until they buckle under the strain of trying to obtain legal rights to quote from the late author’s writings.”  Scholars wishing to invoke Joyce’s unpublished works in their work have either had to undertake multi-year legal battles, or cut back on the lines of inquiry they might otherwise pursue.

American libraries and archives have many illuminating papers by authors who died in 1941– even non-US authors like Joyce and Woolf.  US digitizers, librarians, and archivists can open up and publicize these works.   In some cases, we’re uniquely positioned to do so, since their unpublished works may still be under copyright in some other countries.

2. Increase worldwide availability of public domain works

Many of the millions of digitized books on the Internet are hosted in the US, in large-scale repositories like Google Books, HathiTrust, and the Internet Archive.  Many of these services give limited access to non-US readers or materials.  Google and HathiTrust, for instance, limit non-US access by default to books published as long as 140 years ago, to avoid falling afoul of “life+70 years” copyright terms abroad.  JSTOR likewise limits access to non-US journal volumes published in 1870 or later.

With another year’s worth of copyrights expiring in “life+70 years” countries, it should be safe for these US-based services to also open up worldwide access to another year’s worth of works, further freeing up the public domain.  HathiTrust is also willing to manually review copyrights on specific books to open up access.  If you come across any books in HathiTrust solely by authors who died in 1941 (or before) that are currently labeled only as “public domain in the United States”, you can request that they review it for opening up access worldwide.  Just use the “Feedback” button at the bottom of the book’s HathiTrust page, or the suggestion form on my Online Books Page; and make sure you ask specifically for non-US access.

3. Restore access to obscure copyrighted works from 1936 (and earlier)

After libraries and archives expressed concerns about the fate of obscure works under longer copyright terms, Congress included a special exemption in their 1998 copyright tem extension.  The exemption, codified as section 108(h) of the copyright law, states that “during the last 20 years of any term of copyright of a published work, a library or archives, including a nonprofit educational institution that functions as such, may reproduce, distribute, display, or perform in facsimile or digital form a copy or phonorecord of such work, or portions thereof, for purposes of preservation, scholarship, or research”, under certain conditions.  In particular, if the institution finds, after a reasonable investigation, that such a work is not “subject to normal commercial exploitation” (such as by being in print) and cannot “be obtained at a reasonable price”, and no rightsholder has filed a claim otherwise, the work qualifies for this special exemption.  As of this year’s Public Domain Day, qualifying publications from 1936 join what is now 14 years of works in this category.

So far, I have found very little digitized content online where this exemption is explicitly invoked.  (There are advantages to explicitly doing so, both because it helps clarify the right to use the material, and helps prevent inadvertent unauthorized propagation of the works, such as the commercial reprints of digitized books that are now common on many large bookselling sites.)  Yet many of the works in HathiTrust’s (currently suspended) orphan works initiative, and in the Internet Archive’s lending library, and more besides, could well qualify for this treatment– and unlike orphan works, where legislation has yet to be passed, the exemption for these materials is already explicitly authorized by statute.

Providing online access for these works is not without controversy.  A 2002 article by lawyer Mary Minow details some of the potential possibilities and risks.   While she concludes that libraries can put such works on the Web, the recent Author’s Guild complaint in its lawsuit against HathiTrust includes some push-back against this idea. But as the public domain in the US recedes further into history, and digital library projects increasingly look for ways to make our cultural heritage available online, American libraries would do well to proactively establish and exercise these rights for older works now languishing in obscurity.

4. Strengthen and sustain coalitions for reasonable copyright limits

The curtailment of the public domain is just one aspect of the overreach of copyright law in the US and elsewhere.  Right now, Congress is considering two bills, the Stop Online Piracy Act (SOPA) and the PROTECT IP Act (PIPA), whose enforcement provisions threaten to disrupt the core structures of the Internet and enable far-reaching censorship, in the name of stopping piracy.  Supporters of these bills hoped to have them passed by Christmas, but opposition from both “left” and “right” sides of the political spectrum has slowed the process down, caused some companies to withdraw support, and led to the proposal of less harmful alternatives for fighting piracy.

It’s still quite possible that SOPA and PIPA will pass, though.   Public Domain Day provides an opportunity for Americans to reflect on some of the good reasons for limiting the power and scope of copyright enforcement, and to redouble efforts to keep those limits reasonable.  Moreover, a coalition that can stop SOPA and PIPA can also work to prevent further extensions of copyright terms.  This can ensure that Americans will have more to celebrate in Public Domain Days to come– especially starting in 2019, when the remaining 1923 copyrights should finally expire in the US.

5. Give copyrights of your own to the public domain

Of course, those wishing to maximize public access and use of their works don’t have to wait for their copyrights to expire on their own.  They can dedicate them to the public domain any time they want.  Public Domain Day is a particularly auspicious time to make such gifts, no matter what country you’re in.  And with tools like the CC0 declaration, it’s easier than ever to do so.

A few years ago, I started an annual personal tradition of reviewing copyrights to works I’d created more than 14 years ago (the original initial term of copyright enacted by the founders of the US, and also approximately the ideal copyright term given in a recent economic analysis) and dedicating works to the public domain that I didn’t feel needed further copyright.  Accordingly, today I dedicate all the work of my creation that I published in 1997, and for which I still control rights, to the public domain.  For me, this consists primarily of websites like The Online Books Page as of that year, and other online writings.  But others have dedicated more high-profile material to the public domain after the same term.   And I’d be very happy to hear from others who are making similar dedications today (whether or not it’s after 14 years).

So, happy Public Domain Day to everyone in the US and elsewhere!  We all have things to celebrate, and things we can do, in the name of the public domain.

My mother’s orphan

Before my mother was pregnant with me, she was working on a book.

The book had begun its gestation at least a year before. She had been teaching math in Massachusetts, and was involved with the Madison Project, one of the initiatives that arose from the “new math” movement of the 1960s.  What excited her, and what I caught from her not long after I was born, was the sense of discovery and play that was encouraged in the Madison teaching style.  The primary focus wasn’t so much on imparting and drilling facts and rules, or on mundane applications, but on finding patterns, solving puzzles, and figuring out the secrets of numbers and geometry and the other mathematical constructs that underlie our world. Some project participants planned a series of books that would help bring out this sense of discovery and exploration in math classes.

Two small children in the house may have delayed my mother’s ambitions, but we didn’t stop her.  When I was in kindergarten, the piles of papers in my parents’ bedroom went away, and my mother proudly showed me her new book.  The book, Discoveries in Essential Mathematics, was co-written with Ramon Steinen, and published by Charles E. Merrill. Though the textbook was written for middle schoolers, I remember reading through the book after my mother showed it to me, solving the simpler problems, and smiling when I saw my name or my sister’s in an example.

She got small royalty checks for a few years, but the book was out of print by the late 1970s, never reaching a second edition.  We kept some copies in our basement, but I didn’t know of any library that held it.  When I visited the Library of Congress as a middle schooler, wrongly convinced that they had every book ever published, I remember my disappointment when I couldn’t find Mom’s book in their card catalog.

My mother eventually retired from teaching, and the enthusiasm and talent I’d gotten from Mom for math shifted into computing, and then into digital libraries.  And when my kids reached school age, I decided to try putting her book online.  In an era of large classes, detailed state standards, and high-stakes standardized tests, it might not be a viable standard textbook any more, but I think it’s still great for curious kids who show an interest in math.

Mom thought that was a great idea.  But she didn’t know if she could grant permission on her own.  Although long out of print, the book’s copyright had automatically renewed in 2000 under US copyright law, and she wasn’t sure if she had to get the consent of her publisher or co-author before she could give me the go-ahead. She didn’t know how to reach her co-author, and her old imprint was long gone.  Even its acquirer had itself been acquired by a large conglomerate some time ago.  So I let the idea drop, thinking I’d come back to it later when I had a little time to research the copyright.

But not long after, she started a long slide into dementia, and was soon in no position to give permission to anyone.  If her book had been practically an “orphan work” before, due to uncertainty over rights, it was even more so now.  There was no trouble locating the author; but no way of getting valid permission from someone definitely known to hold the rights.

Mom died this past winter, four years after my Dad had reluctantly moved her into the nursing home for good, and four weeks after he’d made his usual daily visit, gone back home, and had a fatal heart attack.  After we paid the last of the bills, and threw out the contents of the basement (where a burst pipe ruined all the books, papers, and other things they kept down there), what remained of what they had would now go to me and my siblings.

I still had a copy at home of the teacher’s edition of Mom’s book that she had once given to Grandma.  And between my mother’s funeral and the burst pipe, I’d taken a student edition out of their basement for my kids to read.  But any faint hope of finding publishing contracts or rights assignment documents was obliterated after the pipe burst.  The basic questions were: had Mom signed her rights to the book away, as many academic authors do? If so, had she gotten them back at some point?  Or had she never had the rights in the first place, as sometimes happens with textbook authors under “work for hire” contracts?

The copyright page of the book, and the record in the 1972 Catalog of Copyright Entries, show the publisher as the copyright claimant, so I couldn’t assume she had the rights.   But I also doubted whether I could get a clear answer, or reasonable licensing terms, from the company that had eventually acquired the assets of Mom’s original publisher.

I eventually found what I needed to know on a trip to Washington, DC.  While attending a meeting on digital format registries, I realized that I was in the same building as the Copyright Office.   So after the meeting, I got a reader’s card, went upstairs, and consulted the librarians there.  We confirmed that, under the automatic renewal laws of the time, the copyright to Mom’s book would have reverted in 2000 to whoever had been declared the “author” in the book in the original registration record.   Moreover, in the absence of any contrary arrangement, any co-owner of a copyright can authorize publication, as long as they split any proceeds with the other copyright owners.

Since I was planning just to put the book online for free, the only question remaining was: who was listed as the author on the original registration: the publisher who claimed the copyright, or my mother and Dr. Steinen?  It’s not clear from the Catalog of Copyright Entries, but the original registration certificate would state it.  And the one copy known to exist of that certificate was in the archives of the Copyright Office where I was sitting.

Twenty minutes later, I had the certificate in front of me.  The name on the “claimant” line was indeed the publisher’s, but the names on the “author” line were Steinen and Ockerbloom.  My mother’s orphan was mine to claim.

There are a lot more books out there like hers.  Since I added records for Hathi Trust‘s public domain books to The Online Books Page, I’ve gotten requests to curate hundreds of out of print, largely forgotten books that are still meaningful to readers online.  Many of the people who opt to leave contact information  live in places where  books tend to be hard to get or pay for. Many others, judging from their names, seem to be related to the authors of the books they suggest. These readers have found the books after Hathi, or Google, or the Internet Archive, has resurfaced them online, and the readers want these books to live on.  If there were an easy, inexpensive, uncontroversially legal way to also bring back books that are still in copyright, but no longer commercially exploited, I’m sure I could fulfill a lot of requests for those books too.

For now, though, I’ll bring back the one orphan book I’ve been given. And I thank my mother for writing it, and the other women and men who have poured so much of their energy and teaching into their books, and the librarians of all kinds who help ensure those books stay accessible to readers who value them.  I’ll try my best to keep your legacies alive.

Libraries: Be careful what your web sites “Like”

Imagine you’re working in a library, and someone with a suit and a buzz cut comes up to you, gestures towards a patron who’s leaving the building, and says “That guy you were just helping out; can you tell me what books he was looking at?”

Many librarians would react to this request with alarm.  The code of ethics adopted by the American Library Association states “We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”  Librarians will typically refuse to give such information without a carefully-verified search warrant, and many are also campaigning against the particularly intrusive search demands authorized by the PATRIOT Act.

Yet it’s possible that the library in this scenario is routinely giving out that kind of information, without the knowledge or consent of librarians or patrons, via its web site.  These days, many sites, including those of libraries, invoke a variety of third-party services to construct their web pages.  For instance, some library sites use Google services to analyze site usage trends or to display book covers.  Those third party services often know what web page has been visited when they’re invoked, either through an identifier in the HTML or Javascript code used to invoke the service, or simply through the Referer information passed from the user’s web browser.

Patron privacy is particularly at risk when the third party also knows the identity of users visiting sensitive pages (like pages disclosing books they’re interested in).  The social networking sites that many library patrons use, for instance, can often track where their users go on the Web, even after they’ve left the social sites themselves.

For instance, if you go to the website of the Farmington Public Library (a library I used a lot when growing up in Connecticut), and search through their catalog, you may see Facebook “Like” buttons on the results.  On this page, for example, you may see that four people (possibly more by the time you read this) have told Facebook they Liked the book Indistinguishable from Magic.  Now, you can probably easily guess that if you click the Like button, and have a Facebook account, then Facebook will know that you liked the book too.  No big surprise there.

But what you can’t easily tell is that  Facebook is informed you’ve looked at this book page, even if you don’t click on anything.  If you’re a Facebook user and haven’t logged out– and for a while recently, even if you have logged out– Facebook knows your identity.  And if Facebook knows who you are and what you’re looking at, it has the power to pass along this information. It might do it through a “frictionless sharing” app you decided to try.  Or it might quietly provide it to organizations that it can sell your data to as permitted in its frequently changing data use policies.  (Which for a while even included tracking non-members.)

For some users, it might not be a big deal if it’s generally known what books they’re looking at online. But for others it definitely is a big deal, at least some of the time.  The problem with third-party inclusions like the Facebook “Like” button in catalogs is that library patrons may be denied the opportunity to give informed consent to sharing their browsing with others.  Libraries committed to protecting their patron’s privacy as part of their freedom to read need to carefully consider what third party services they invite to “tag along” when patrons browse their sites.

This isn’t just a Facebook issue.  Similar issues come up with other third-party services that also track individuals, as for instance Google does.  Libraries also have good reasons to partner with third party sites for various purposes.  For some of these purposes, like ebook provision, privacy concerns are fairly well understood and carefully considered by most libraries.  But librarians might not keep as close track of the development of their own web sites, where privacy leaks can spring up unnoticed.

So if any of your web sites (especially your online catalogs or other discovery and delivery services) use third party web services, consider carefully where and how they’re being invoked.  For each third party, you should ask what information they can get from users browsing your web site, what other information they have from other sources (like the “real names” and exact birthdates that sites like Facebook and Google+ demand), and what real guarantees, if any, they make about the privacy of the information.  If you can’t easily get satisfactory answers to these questions, then reconsider your use of these services.

Early journals from JSTOR and others

Earlier this month,  JSTOR announced that it would provide  free open access to their earliest scholarly journal content, published before 1923.  All of this material should be old enough to be in the public domain.  (Or at least it is in the US.  Since copyrights can last longer elsewhere, JSTOR is only showing pre-1870 volumes openly outside the US.)  I was very pleased to hear they would be opening up this content; it’s something I’d asked them to consider ever since they ended a small trial of open, public domain volumes in their early years.

Lots of early  journal content now openly readable online

The time was ripe to open access at JSTOR.  (And not just because of growing discontent over limited access to public domain and publicly funded research.) Thanks to mass-digitization initiatives and other projects, much of the early journal content found in JSTOR is now also available from other sources.  For instance, after Gregory Maxwell posted a torrent of pre-1923 JSTOR volumes of the Philosophical Transactions of the Royal Society of London, I surveyed various free digital text sites and found nearly all the same volumes, and more, available for free from Hathi Trust, Google, the Internet Archive, Gallica, PubMed Central, and the Royal Society itself.  The content needed to be organized to be usefully browsable across sites, but that required a bit of basic librarianship and a bit of time.

Philosophical Transactions is not an anomaly.  After collating volumes of this journal, I looked at the first ten journals that signed on to JSTOR back in the mid-1990s.  (The list can be found below.)  I again found that nearly all of pre-1923 content of these journals was also available from various free online sites.  Now, when you look them up on The Online Books Page, you’ll find links to both the JSTOR copies and the copies at other sites.

Comparing the sites that provide this content is enlightening.  In general, the JSTOR copies are better presented,  with article-level tables of contents, cross-volume searching, article downloads, and consistently high scan quality.  But the copies at other sites are generally usable as well, and sometimes include interesting non-editorial material, such as advertisements, that might not be present in JSTOR’s archive.  By opening up access to its early content now, though, JSTOR will remain the preferred access point to this early content for most researchers — and that, hopefully, will help attract and sustain paid support for the larger body of scholarly content that JSTOR provides and preserves for its subscribers.

And there’s a lot more in the public domain

JSTOR currently only provides open access for volumes up to 1922 (or up to 1869, if you’re not in the US).   But there’s lots more public domain journal content that can be made available.  Looking again at the initial ten JSTOR journals, I found that all of them have additional public domain content that is currently not available as open access on JSTOR, or as of yet on other sites.  That’s because journals published in the US before 1964 had to renew their copyrights after 28 years or enter the public domain.  But most scholarly journals, including these 10, did not renew the copyrights to all their issues.  Here’s a list of the 10 journals, and their first issue copyright renewals:

  1. The American Historical Review – began 1895; issues first renewed in 1931
  2. Econometrica – began 1933; issues first renewed in 1942
  3. The American Economic Review – began 1911; issues not renewed before 1964 (when renewal became automatic)
  4. Journal of Political Economy – began 1892; issues first renewed in 1953
  5. Journal of Modern History – began 1929, issues first renewed in 1953
  6. The William and Mary Quarterly – began 1892; issues first renewed in 1946
  7. The Quarterly Journal of Economics – began 1886; issues first renewed in 1934
  8. The Mississippi Valley Historical Review (now the Journal of American History) – began 1914; issues first renewed in 1939
  9. Speculum – began 1926; issues first renewed in 1934
  10. Review of Economic Statistics (now the Review of Economics and Statistics) – began 1919; issues first renewed in 1935

This list reflects more proactive renewal policies than were typical for scholarly journals. A few years ago, I did a survey of JSTOR journals (summarized in this presentation) that were publishing between 1923 and 1950, and found that only 49 out of 298, or about 1/6, renewed any of their issue copyrights for that time period.  (JSTOR has since added more journals covering this time period, so the numbers will be different now, but I suspect the renewal rate won’t be any higher now than it was then.)

Currently JSTOR has no plans to open up access to post-1922 journal volumes.  But many of those volumes have been digitized, and are in Google’s or Hathi Trust’s collections; or they could be digitized by contributors to the Internet Archive or similar text archives.

If someone does want to open up these volumes, they should re-check their copyright status.   In particular, I have not yet checked the copyright status of individual articles in these journals, which can in theory be renewed separately.  In practice, I’ve found this rarely done for scholarly articles, but not completely unknown.  It might be feasible for me to do a “first article renewal” inventory for journals, like I’ve done for first issue renewal, which could speed up clearances.

Opportunities for open librarianship

JSTOR’s recent open access release of early journals, then, is just the beginning of the open access historic journal content that can be available online.  JSTOR provides a valuable service to libraries in providing and preserving comprehensive digital back runs of major scholarly journals, both public domain and copyrighted.  But while our libraries pay for that service, let’s also remember our mission to provide access to knowledge for all whenever possible.  JSTOR’s contribution in opening  its pre-1923 journal volumes is a much-appreciated contribution to a high-quality open record of early scholarship.  We can build on that further, with copyright research, digitization, and some basic public librarianship.  (I’ve discussed the basics of journal liberation in previous posts.)

For my part, I plan to start by gradually incorporating the open access JSTOR offerings into the serial listings of the Online Books Page, as time permits.  I can also gather further copyright information on these and other journals as I bring them in.  I’m also happy to hear about more journals that are or can go online (whether they’re JSTOR journals or not); you can submit them via my suggestion interface.

How about you?  What would you like to see from the early scholarly record, and what can you do to help open it up?

A digital public library we still need, and could build now

It’s been more than half a year since the Digital Public Library of America project was formally launched, and I’m still trying to figure out what the project organizers really want it to be.  The idea of “a digital library in service of the American public” is a good one, and many existing digital libraries already play that role in a variety of ways.  As I said when I christened this blog, I’m all for creating a multitude of libraries to serve a diversity of audiences and information needs.

At a certain point after an enthusiastic band of performers says “Let’s put on a show!”, though, someone has to decide what their show’s going to be about, and start focusing effort there.  So far, the DPLA seems to be taking an opportunistic approach.  Instead of promulgating a particular blueprint for what they’ll do, they’re asking the community for suggestions, in a “beta sprint” that ends today.   Whether this results in a clear distinctive direction for the project, or a mishmash of ideas from other digitization, aggregation, preservation, and public service initiatives, remains to be seen.

Just about every digital project I’ve seen is opportunistic to some extent.   In particular, most of the big ones are opportunistic when it comes to collection development.  We go after the books, documents, and other knowledge resources that are close to hand in our physical collections, or that we find people putting on the open web, or that our users suggest, or volunteer to provide on their own.

There are a number of good reasons for this sort of opportunism.  It lets us reuse work that we don’t have to redo ourselves.  It can inform us of audience interests and needs (at least as far as the interests of the producers we find align with the interests of the consumers we serve).  And it’s cheap, and that’s nothing to sneer at when budgets are tight.

But the public libraries that my family prefers to use don’t, on the whole, have opportunistically built collections.  Rather, they have collections shaped primarily by the needs of their patrons, and not primarily by the types of materials they can easily acquire.   The “opportunistic” community and school library collections I’ve seen tend to be the underfunded ones, where books in which we have yet to land on the Moon, the Soviet Union is still around, or Alaska is not yet a state may be more visible than books that reflect current knowledge or world events.  The better libraries may still have older titles in their research stacks, but they lead with books that have current relevance to their community, and they go out of their way to acquire reliable, readable resources for whatever information needs their users have.  In other words, their collections and services are driven by  demand, not supply.

In the digital realm, we have yet to see a library that freely provides such a digital collection at large scale for American public library users.   Which is not to say we don’t have large digital book collections– the one I maintain, for instance, has over a million freely readable titles, and Google Books and lots of other smaller digital projects have millions more.  But they function more as research or special-purpose collections than as collections for general public reference, education, or enjoyment.

The big reason for this, of course, is copyright.  In the US, anyone can freely digitize books and other resources published before 1923, but providing anything published after that requires copyright research and, usually, licensing, that tends to be both complex and expensive.  So the tendency of a lot of digital library projects is to focus on the older, obviously free material, and have little current material.  But a generally useful digital public library needs to be different.

And it can be, with the right motivation, strategy, and support.  The key insight is that while a strong digital public library needs to have high-quality, current knowledge resources, it doesn’t need to have all such resources, or even the most popular or commercially successful ones.  It just needs to acquire and maintain a few high-quality resources for each of the significant needs and aptitudes of its audience. Mind you, that’s still a lot of ground to cover, especially when you consider all the ages, education levels, languages, physical and mental abilities, vocational needs, interests, and demographic backgrounds that even a midsized town’s public library serves.  But it’s still a substantially smaller problem, and involves a smaller cost, than the enticing but elusive idea of providing instant free online access to everything for everyone.

There are various ways public digital libraries could acquire suitable materials proactively.  The America.gov books collection provides one interesting example.  The US State Department wanted to create a library of easy-to-read books on civics and American culture and history for an international audience.  Some of these books were created in-house by government staff.  Others were commissioned to outside authors.  Still others were adapted from previously published works, for which the State Department acquired rights.

A public digital library could similarly create, commission, solicit, or acquire rights to books that meet unfilled information needs of its patrons.  Ideally it would aim to acquire rights not just to distribute a work as-is, but also to adapt and remix into new works, as many Creative Commons licenses allow.  This can potentially greatly increase the impact of any given work.  For instance, a compellingly written,  beautifully illustrated book on dinosaurs might be originally written for 9-12 year old English speakers, and be noticeably obsolete due to new discoveries after 5 or 10 years.  But if a library’s community has reuse and adaptation rights, library members can translate, adapt, and update the book, so it becomes useful to a larger audience over a longer period of time.

This sort of collection building can potentially be expensive; indeed, it’s sobering that America.gov has now ceased being updated, due to budget cuts.  But there’s a lot that can be produced relatively inexpensively.  Khan Academy, for example, contains thousands of short, simple educational videos, exercises, and assessments created largely by one person, with the eventual goal of systematically covering the entire standard K-12 curriculum.  While I think a good educational library will require the involvement of many more people, the Khan example shows how much one person can get accomplished with a small budget, and projects like Wikipedia show that there’s plenty of cognitive surplus to go around, that a public library effort might usefully tap into.

Moreover, the markets for rights to previously authored content can potentially be made much more efficient than they are now.  Most books, for instance, go out of print relatively quickly, with little or no commercial exploitation thereafter.  And as others have noted, just trying to get permission to use  a work digitally, even apart from any royalties, can be very expensive and time-consuming.  But new initiatives like Gluejar aim to make it easier to match up people who would be happy to share their book rights with people who want to reuse them. Authors can collect a small fee (which could easily be higher than the residual royalties on an out-of-print book); readers get to share and adapt books that are useful to them.   And that can potentially be much cheaper than acquiring the rights to a new work, or creating one from scratch.

As I’ve described above, then, a digital public library could proactively build an accessible collection of high-quality, up to date online books and other knowledge resources, by finding, soliciting, acquiring, creating, and adapting works in response to the information needs of its users.  It would build up its collection proactively and systematically, while still being opportunistic enough to spot and pursue fruitful new collection possibilities.  Such a digital library could be a very useful supplement to local public libraries, would be open any time anywhere online, and could provide more resources and accessibility options than a local public library could provide on its own.  It would require a lot of people working together to make it work, including bibliographers, public service liaisons, authors, technical developers, and volunteers, both inside and outside existing libraries.  And it would require ongoing support, like other public libraries do, though a library that successfully serves a wide audience could also potentially tap into a wide base of funds and in-kind contributions.

Whether or not the DPLA plans to do it, I think a large-scale digital free public library with a proactively-built, high-quality, broad-audience general collection is something that a civilized society can and should build.  I’d be interested in hearing if others feel the same, or have suggestions, critiques, or alternatives to offer.