A digital public library we still need, and could build now

It’s been more than half a year since the Digital Public Library of America project was formally launched, and I’m still trying to figure out what the project organizers really want it to be.  The idea of “a digital library in service of the American public” is a good one, and many existing digital libraries already play that role in a variety of ways.  As I said when I christened this blog, I’m all for creating a multitude of libraries to serve a diversity of audiences and information needs.

At a certain point after an enthusiastic band of performers says “Let’s put on a show!”, though, someone has to decide what their show’s going to be about, and start focusing effort there.  So far, the DPLA seems to be taking an opportunistic approach.  Instead of promulgating a particular blueprint for what they’ll do, they’re asking the community for suggestions, in a “beta sprint” that ends today.   Whether this results in a clear distinctive direction for the project, or a mishmash of ideas from other digitization, aggregation, preservation, and public service initiatives, remains to be seen.

Just about every digital project I’ve seen is opportunistic to some extent.   In particular, most of the big ones are opportunistic when it comes to collection development.  We go after the books, documents, and other knowledge resources that are close to hand in our physical collections, or that we find people putting on the open web, or that our users suggest, or volunteer to provide on their own.

There are a number of good reasons for this sort of opportunism.  It lets us reuse work that we don’t have to redo ourselves.  It can inform us of audience interests and needs (at least as far as the interests of the producers we find align with the interests of the consumers we serve).  And it’s cheap, and that’s nothing to sneer at when budgets are tight.

But the public libraries that my family prefers to use don’t, on the whole, have opportunistically built collections.  Rather, they have collections shaped primarily by the needs of their patrons, and not primarily by the types of materials they can easily acquire.   The “opportunistic” community and school library collections I’ve seen tend to be the underfunded ones, where books in which we have yet to land on the Moon, the Soviet Union is still around, or Alaska is not yet a state may be more visible than books that reflect current knowledge or world events.  The better libraries may still have older titles in their research stacks, but they lead with books that have current relevance to their community, and they go out of their way to acquire reliable, readable resources for whatever information needs their users have.  In other words, their collections and services are driven by  demand, not supply.

In the digital realm, we have yet to see a library that freely provides such a digital collection at large scale for American public library users.   Which is not to say we don’t have large digital book collections– the one I maintain, for instance, has over a million freely readable titles, and Google Books and lots of other smaller digital projects have millions more.  But they function more as research or special-purpose collections than as collections for general public reference, education, or enjoyment.

The big reason for this, of course, is copyright.  In the US, anyone can freely digitize books and other resources published before 1923, but providing anything published after that requires copyright research and, usually, licensing, that tends to be both complex and expensive.  So the tendency of a lot of digital library projects is to focus on the older, obviously free material, and have little current material.  But a generally useful digital public library needs to be different.

And it can be, with the right motivation, strategy, and support.  The key insight is that while a strong digital public library needs to have high-quality, current knowledge resources, it doesn’t need to have all such resources, or even the most popular or commercially successful ones.  It just needs to acquire and maintain a few high-quality resources for each of the significant needs and aptitudes of its audience. Mind you, that’s still a lot of ground to cover, especially when you consider all the ages, education levels, languages, physical and mental abilities, vocational needs, interests, and demographic backgrounds that even a midsized town’s public library serves.  But it’s still a substantially smaller problem, and involves a smaller cost, than the enticing but elusive idea of providing instant free online access to everything for everyone.

There are various ways public digital libraries could acquire suitable materials proactively.  The America.gov books collection provides one interesting example.  The US State Department wanted to create a library of easy-to-read books on civics and American culture and history for an international audience.  Some of these books were created in-house by government staff.  Others were commissioned to outside authors.  Still others were adapted from previously published works, for which the State Department acquired rights.

A public digital library could similarly create, commission, solicit, or acquire rights to books that meet unfilled information needs of its patrons.  Ideally it would aim to acquire rights not just to distribute a work as-is, but also to adapt and remix into new works, as many Creative Commons licenses allow.  This can potentially greatly increase the impact of any given work.  For instance, a compellingly written,  beautifully illustrated book on dinosaurs might be originally written for 9-12 year old English speakers, and be noticeably obsolete due to new discoveries after 5 or 10 years.  But if a library’s community has reuse and adaptation rights, library members can translate, adapt, and update the book, so it becomes useful to a larger audience over a longer period of time.

This sort of collection building can potentially be expensive; indeed, it’s sobering that America.gov has now ceased being updated, due to budget cuts.  But there’s a lot that can be produced relatively inexpensively.  Khan Academy, for example, contains thousands of short, simple educational videos, exercises, and assessments created largely by one person, with the eventual goal of systematically covering the entire standard K-12 curriculum.  While I think a good educational library will require the involvement of many more people, the Khan example shows how much one person can get accomplished with a small budget, and projects like Wikipedia show that there’s plenty of cognitive surplus to go around, that a public library effort might usefully tap into.

Moreover, the markets for rights to previously authored content can potentially be made much more efficient than they are now.  Most books, for instance, go out of print relatively quickly, with little or no commercial exploitation thereafter.  And as others have noted, just trying to get permission to use  a work digitally, even apart from any royalties, can be very expensive and time-consuming.  But new initiatives like Gluejar aim to make it easier to match up people who would be happy to share their book rights with people who want to reuse them. Authors can collect a small fee (which could easily be higher than the residual royalties on an out-of-print book); readers get to share and adapt books that are useful to them.   And that can potentially be much cheaper than acquiring the rights to a new work, or creating one from scratch.

As I’ve described above, then, a digital public library could proactively build an accessible collection of high-quality, up to date online books and other knowledge resources, by finding, soliciting, acquiring, creating, and adapting works in response to the information needs of its users.  It would build up its collection proactively and systematically, while still being opportunistic enough to spot and pursue fruitful new collection possibilities.  Such a digital library could be a very useful supplement to local public libraries, would be open any time anywhere online, and could provide more resources and accessibility options than a local public library could provide on its own.  It would require a lot of people working together to make it work, including bibliographers, public service liaisons, authors, technical developers, and volunteers, both inside and outside existing libraries.  And it would require ongoing support, like other public libraries do, though a library that successfully serves a wide audience could also potentially tap into a wide base of funds and in-kind contributions.

Whether or not the DPLA plans to do it, I think a large-scale digital free public library with a proactively-built, high-quality, broad-audience general collection is something that a civilized society can and should build.  I’d be interested in hearing if others feel the same, or have suggestions, critiques, or alternatives to offer.

About John Mark Ockerbloom

I'm a digital library strategist at the University of Pennsylvania, in Philadelphia.
This entry was posted in citizen librarians, copyright, libraries, people, sharing. Bookmark the permalink.

11 Responses to A digital public library we still need, and could build now

  1. Nate says:

    I’m with you…

    Do you think that one can begin developing something from the opportunistic standpoint, with the intention of adding the user-centered acquisition model you describe? I definitely think you are right on here, but I’m not sure that one has to resolve and apply the gluejars and other copyright solutions *before* actually making something. content is content and elements of a platform architecture can be agreed upon and built before solving all of these problems.

    Or am I wrong? Happy to be told I’m wrong about this, explain…

  2. jrochkind says:

    I think you’ve got it exactly right, I have nothing to add or disagree with, well said.

  3. John Mark Ockerbloom says:

    Nate, Jonathan: Glad to hear from both of you!

    Nate asks: “Do you think that one can begin developing something from the opportunistic standpoint, with the intention of adding the user-centered acquisition model you describe?”

    In a word: No.

    Now, don’t misunderstand me: I think that opportunistically acquired materials play an important role. There are various high-quality free online resources already out there, and collecting, adapting, and reusing appropriate ones will be an important part of building up a digital public library collection. (Indeed, insisting on starting from scratch could easily condemn a digital public library project to much the same fate as Nupedia or Citizendium, two projects that aimed for crowd-supported high-quality but completely new content, and failed to thrive.) And all those digitized books from the past can fill out a digital library’s research stacks quite nicely.

    But I really think that if your aim is to create a truly useful and viable digital library for the American (or world) public, you have to commit and invest in the collections that implies from the start. The reason is that the people you’re going to need, the investments you’re going to make, the strategies you employ to develop a collection that meets your users’ needs on a reasonable budget, and the funding models you’re going to depend on, are going to have to be quite different than what they would be if you focus primarily on building platforms and just using or digitizing whatever content is to hand.

    We can see this with the National Science Digital Library, an NSF-funded program that talked for a long time about building a rich public library of K-12 science education materials, but which was largely driven by technology research groups and funded by project grants. As a research program, it produced some useful technology, findings, and collaborations. (And, to be fair, it also produced some content that may well be worth preserving or adapting.) But as a public library, it failed: 10 years and $175 million in, most science teachers and students have rarely or never used it, and it’s about to lose its funding.

    I don’t think the main things holding back an effective public digital library are a lack of platform architectures, or the fact that we haven’t finished digitizing all the public domain volumes in the Harvard libraries. The main things we need to support viable public libraries, whether digital or local, are well-developed, well-funded, user-responsive and cost-effective collections and services. We can build those now, if we want to, and invest the right resources and expertise.

    Don’t get me wrong: I think developing new digital library technologies is important; that’s primarily what I do in my job, after all. And I’m always happy to see and promote newly digitized resources, as I’ve done for years on The Online Books Page. But an initiative that mainly focuses on those isn’t going to build an digital public library at the level of adequacy that Americans have enjoyed in our long-standing local public libraries. If the DPLA wants to focus on technology and digitization, fine: they can rename the project to something like the “Digital Research Library of America”, and I’ll still cheer them on and maybe even pitch in. But billing a project as a “digital library in service of the American public” without investing in what it really takes to become one risks distraction and frustration on the part of the participants and funders, and it can suck the oxygen away from other possible projects that could actually fulfill the promise.

    I’ve gone ahead and submitted a summary of my idea (with a link to this page) to the DPLA beta sprint. If there are groups (involved with the DPLA or not) who are interested in pursuing this proactive collection-building strategy, and perhaps fleshing out a beta plan, I’d love to hear from them. Or, if anyone reading this thinks I’m wrong here, I’d be very interested in hearing your case.

  4. natehill says:

    I’m hoping that someone will put together a realistic beta sprint project that addresses these issues, John.

    I guess one version of a DPLA would do exactly what you speak of here. It seems that people want the DPLA to be everything all at once, likely because of its name. Right now, my library gets different kinds of electronic content from countless vendors… Overdrive, Ebsco, bla bla bla. You are describing a DPLA that seeks to offer the same collections that are popular in print in an electronic format. Nothing wrong with that, sounds like a good plan… but it’s worth noting that this is **not** the only content people want from public libraries.

    I was suggesting that DPLA platform development could start before having some kind of collection development policy because a DPLA could begin by enabling localized conversation and context-creation around whatever opportunistic content exists- and that would be a big win for public library users. It would be great for a small library in Arkansas to offer web users the ability to have conversations about and draw relationships between a painting in the Smithsonian, a copy of Huck Finn, and a photo from their local history collection. If a collection development policy evolved that enabled a digitally lendable copy of a Grisham novel to be included in that conversation, all the better.

  5. John Mark Ockerbloom says:

    Nate: Thanks for a thoughtful reply. I agree “people want[ing] the DPLA to be everything at once” seems to lie behind a lot of the current arguments over the project. It will help when the project itself makes a clearer statement about what it wants to be. For now, my working assumption is that it aspires to be what its name implies: a library with collections and services that broadly serve the American public at a level that compares well to our traditional public libraries.

    Some of these services will indeed be largely technology-driven. Your example of a discussion service build around cultural works, for instance, could be one useful such service. (Though there are lots of other discussion services, some also built around works, that are already out there, and many more than have been proposed or launched but that failed to thrive.) I’m very happy to have people try developing these and other services in a digital public library project, but a robust digital library also needs collections that meet its users needs, and that side of things, and the support requirements it implies, seem to be relatively neglected in the DPLA discussions I’ve seen so far.

    You’re also right that “collections that are popular in print” are not the only content people want from public libraries. I’m not so much looking for collections that are popular, but collections that meet the public’s main information needs in an accessible, appealing way. Not everything that’s useful is popular in the marketplace; and not everything that’s popular needs to be included. The point is to be proactive about determining what sorts of things are most needed in a digital collection that serves a broad public, and to make a serious effort to fill in holes in coverage (particularly in areas where current and reliable content is critical) by whatever means necessary.

    I also want to be clear that a “realistic beta sprint project” cannot simply consist of me. (Nor would I be the appropriate leader for the sort of sustained collection development program that I describe above.) What I hope to do by registering a statement of interest for the DPLA sprint is to see if there are institutions and librarians with interest and expertise in planning a proactive, sustainable collection development program within the DPLA framework. If there are, I’ll be happy to help shape a planning agenda for such a program, insofar as my time and expertise (both limited) allow. If someone else has already put in a statement of interest along these lines with more heft behind it, I’ll be glad to defer to theirs. And if there isn’t sufficient interest within the DPLA community to get a viable effort going– well, that’ll tell us something too.

    To give the proposal a fair shot, I’ll be raising it in DPLA discussions over the next few days. But I’m also happy to see further comments here, and appreciate the ones you’ve made thus far.

  6. Nate says:

    John, sorry I’ve kind of dropped the ball by not commenting here. Busy. I’m curious if you’ve got people joining you in this effort yet?

    A colleague sent me this blog post the other day:

    In it, it says “People want libraries to provide the expensive eBooks versions of print books. And the demand is not just for fiction, over 40% want good non-fiction. People care less about libraries providing cheaper indie eBooks and care almost nothing about eBook versions of free public domain books.”

    As much as I sympathize with this call from the users, there’s a little part of me that says “oh yeah? well too bad.”
    New and popular media will always be ‘premium’ and carry a ‘premium’ price tag. That’s capitalism.
    I want a ‘free’ copy of the newest everything too, but I don’t actually expect to get it from my library.

    What do you think of that? I’m not even sure that I know what I think of that…

  7. Hi John, I’m also involved in the Beta Sprint, and linked here from the list-serve. I think you capture in this post the dilemma of balancing the sort of opportunistic, disorganized collaboration that the internet enables with the sense of purpose and internal logic that usually comes with physical libraries and collections. The challenge, in other words, is having an online library be both open and meaningful, democratic but not haphazard or opaque. My own project involves more a method of inviting and sorting collaborative annotations to texts (notes, illustrations, audio readings, etc) rather than texts themselves, but I feel like we are thinking along similar lines. At any rate, I’m wishing you luck and success!

  8. John Mark Ockerbloom says:

    Nate, Ecological Humanist: A belated thank-you for your thoughtful comments. As you can probably tell, I’ve been a bit swamped myself. There have been some other general expressions of interest in my idea, though no one (including myself) is really taking the ball and running with it for now.

    I’m thinking the best shot of getting this to go forward is to wiki-fy it, so that any interested DPLA participant can contribute their ideas and thoughts without me or others getting in the way as a gatekeeper. Toward that end, I’ve created a Wiki page for what I’ve called Digital Library Core Collections which I hope I and others can fill in to flesh out the idea and see if it’s viable in the DPLA context.

    Regarding your note about what people want to read, I’m not a public librarian, but I have heard that lots of people go into libraries looking for a specific title, often a popular best-seller. As I note above, i don’t think it’ll be viable to get most of those into the DPLA;s digital collection any time soon. But I wonder whether this could be a point of possibly synergy between the DPLA and local libraries; that is, someone going to the DPLA with a specific hot title in mind gets referred to local libraries where they can borrow a print copy, but also gets suggestions of related digital titles they can read online. Or, someone visiting a local library looking for a broader range can get referred, among other things, to relevant, high-quality titles in the DPLA’s collection. (It does seem quite common in my experience for someone going into a library looking for one thing to come up with other things as well.)

  9. sylvia says:

    I read the intire thing im 11 and i love to read and write i loved reading this thank you and ur welcome bye

  10. donnainthesouth says:

    when my son was doing his term paper on the technical aspects of the internal combustion engine the only book we could find was a e-book from our local public library.

Comments are closed.