Hurry, hurry! Free books, going fast! (And new site feature)

Okay, it’s a trend:

The news here isn’t so much that are people are putting their books online for free. Some folks have been doing that for years, and I’ve been listing recent permanent, no-strings-attached free online books on The Online Books Page since the 1990s. (See, for instance, Daniel Solove’s The Future of Reputation, a current book posted last week, or Baen Books’ long-running free library.) What’s new is the number of large trade publishers who have almost simultaneously decided to try offering complete, free, in-print books online for the first time, with the expectation that this could well improve sales. The “limited time” offers let them be careful about it to start with, metaphorically dipping their toes in the water before diving in. They can also potentially compare the effects of short-term free ebook offers to either no free ebook offers or permanent free ebook offers. If the experiments work out well, this may be the first of a lot more current literature that becomes available online for free.

(Or it could just be this year’s publishing fad, forgotten or laughed about by next year. We’ll see, but some of the early reports above sound promising.)

I’ve read and enjoyed a number of the books that I mention above. I’m not listing them on The Online Books Page at present; that site is really designed for permanent titles rather than books that are here today and gone tomorrow. (Though many libraries have “current bestsellers” sections that work that way, using rental programs like McNaughton.) I do find number of these book offers worth noting somewhere. I don’t necessarily want to devote a full post to each one I hear about, though.

So I’ve introduced a new feature to this site that can hold quick links to temporarily-free ebook offers I find of interest, as well as links to other news and stories that are interesting and relevant enough to mention here, but not in a full post. You can find these links in the right column, under the heading “Everybody’s Library Tags”. (I’m using PennTags, our local social tagging system, to collect and manage these links.) The Tags feature has its own RSS feed separate from that of the blog, in case you’d like to put it in your aggregator. If you’d like to limit the feed to just the free books tags, and not the other links I post there, use this feed URL instead.

I haven’t tried this feature before (though I’ve seen it used to good effect on other blogs), and I’m not positive at this point whether I’ll keep it up. (If I find it too hard to keep reasonably current stuff in there, I’ll just discontinue it.) But, as the publishers above are doing with their free book offers, I’ll try it out, and see what happens.

The right to read, circa 1906

For a few years in the early 1900s, some American book publishers came up with a brave new marketing paradigm. Instead of offering books for sale the old-fashioned way, they essentially decided to license them. Purchasers were warned of dire legal consequences if they didn’t go along with the licenses attached to the books. If, for instance, you bought Kate Meredith, Financier, published in 1906, you would be greeted by this text on the first page:


“This copyright volume is offered for sale to the public only through the authorized agents of the publishers, who are permitted to sell it only at retail and at fifty cents per copy, and with the express condition and reservation that it shall not, prior to August 1st, 1907, be resold, or offered or advertised for resale. The purchaser from them agrees to this condition and reservation by the acceptance of this copy. In case of any breach thereof, the title to this book immediately reverts to the publishers. Any defacement, alteration or removal of this notice will be prosecuted by the publishers to the full extent of of the law.”


It wasn’t just done with books, either. Here’s a similar license from a 1907 Edison cylinder record.

In 1908, however, the Supreme Court would put an end to these kinds of licenses, in Bobbs-Merrill vs. Straus. That case helped establish the first sale doctrine, which basically says that a buyer of a book really does own it, and has the right to keep it, share it, lend it, resell it, give it away, or otherwise dispose of it as they see fit, just like they can with other things they buy. Congress would eventually codify this doctrine, with various qualifications, in the copyright statutes; it’s now section 109 of the copyright code.

So thanks to the courts and Congress of a century ago, you can pick up a book in a bookstore and buy it with confidence. You don’t have to carefully look it over in the store or show it to an expert to figure out what you’re allowed to do with it when you’re through reading it yourself.

At least, you don’t if it’s a print book. When you buy pay for an ebook from the bookstores for the best-known current “reading devices”, however, it’s a different story.

Here, have some more Punch

Punch was a British institution for well over a century. Founded in 1841, it was an irreverent weekly magazine of quips, cartoons, essays, stories and poetry, often on the politics and events of the day. Writers and artists like W. M. Thackeray, A. A. Milne, P. G. Wodehouse, Kingsley Amis, Arthur Rackham, and Ernest Shepard contributed to it. Punch folded in 1992, with its best days long past, but in its heyday it enjoyed great popular and critical acclaim. If the Daily Show writers had lived in 19th century London instead of 21st century America, they might have created something like it.

Reading it now can be a bit disorienting, partly because the writers often assume the readers are already very familiar with the contemporary headlines and goings on, and partly because the sense of what’s funny is so mercurial. What makes an 1860s London reader of Punch break out into uncontrollable laughter may be very different from what has the same effect on an 2008 Xkcd fan.

But all sorts of folks still find it of interest, whether they’re researching English history and culture, looking for long out of print literature and drawings from writers and artists they fancy, or just wanting a good read. The first 80 years or so of issues are now in the public domain. Project Gutenberg started transcribing them (including the cartoons) a few years back. Now the mass digitizers have gotten involved too, and in response to a reader’s request I’ve found and organized online copies of most of the issues up to 1922. (After that point, copyright issues get sticky.)

Here’s my listing. Enjoy. (And do tell me if you find any issues I couldn’t.)

We call dibs! (or, the genius of the Harvard mandate)

The Harvard Arts and Sciences faculty recently approved a resolution giving the University permission to make their scholarly articles available to the world at no charge. Here’s the official press release from Harvard, and here’s the text of the resolution, as given in the official faculty council agenda. (The resolution text is on the second page. It could have been amended before the vote, but I haven’t heard of any amendments.)

This is the first university-level open access mandate in the US, from the most prominent university in the US, and as many have noted, this is a huge step forward for open access to research. There are two aspects to the mandate: the familiar aspect directs faculty to supply Harvard copies of their papers to post; the more novel aspect stipulates that Harvard automatically get the rights to post their faculty papers for free. Harvard allows faculty members to exempt papers from these requirements, but it must be done in writing, with reason, separately for each paper that a faculty member wants to exempt.

I find this approach ingenious. As people maintaining institutional repositories have come to know, there are two main barriers to distributing one’s faculty’s work in one’s repository: getting hold of the work, and getting the right to publish the work. The first of these can be handled in various ways; whether the faculty, the departmental administrators, or the librarians get the content to the right place, it’s all purely a matter of local negotiation. But that’s not the case with rights. By the time we repository maintainers get content from authors, the authors have often signed their rights away to the journals that published the papers. The publishers have effectively called dibs on redistribution rights, and we can’t distribute unless they agree to it. A faculty member that may want to have us distribute her work too may no longer have the power to let us– she’s already signed that right away to someone else.

By requiring (non-exclusive) rights to free, open access distribution to any new paper created under its employ, Harvard is effectively calling dibs before the publishers can. So if I’m running a repository at Harvard (or another institution with a similar policy), copyright clearance becomes much easier. I don’t have to look up and carefully parse a journal’s self-archiving policy, try negotiating with publishers, or verify that I have the permitted version of a paper to archive and the proper embargo period. As long as the paper is dated after the mandate went into place, and the paper’s not on my institution’s exception list, I can just grab and go. Or, I can accept my faculty’s and department’s self-deposits without having to go back and forth with them about whether they have the right permissions and are following the right procedures for that publisher. Publishers may want their authors to sign away the rights that they’ve given us, but they can’t, at least not without going out of their way to do so, because we already have those rights. And as Dorothea Salo points out, there are disincentives for both publishers and faculty to rock the boat here. Under this arrangement, the norms have changed– from restricted access as a default, with the onus for exceptions placed on the library or the scholar, to open access as a default, with the onus for exceptions placed on the publisher or the scholar.

Some open access advocates have argued that one could design a mandate that was even more open-access-friendly. That may be, but to judge this mandate a failure (as the linked post above appears to at one point) seems to me an example of the “perfect” being the enemy of the good. This mandate is faculty-friendly as well as being open-access friendly, in that it minimizes the extra work faculty have to do and assures them the last word in access control, should they decide to exercise it. And that, I believe, is crucial to its having been adopted at all, and to its subsequent acceptance by faculty. (Remember, this is the first open access mandate that a US university faculty, let alone one with the clout of Harvard, has adopted on its own.)

In the future, perhaps universities will adopt even more effective policies to share their research output with the world. Right now, though, I’d say this is a big step forward, and one that I hope that my university and many others will consider emulating.

Improving the millions

Michigan’s announcement earlier this month that they had over one million volumes from their collection digitized was widely hailed online. That million books includes both copyrighted and public domain content. According to Michigan’s John Wilkin, who I talked with shortly after the announcement, about 15-20% of what’s digitized is certified as public domain, and made freely accessible.

Those familiar with the OAI-PMH protocol can download catalog records for the public-domain-certified books; as of today, there were over 116,000 such records. (According to John, there are about 3 catalog items for every 4 volumes, due to things like multi-volume works, so one million volumes translates to about 750,000 works, and about 15% of those are in the public domain OAI feed.)

Most of Michigan’s million volumes were digitized by Google. Jessamyn West’s post on the milestone included a link to an interesting article in Campus Technology about the workflow of the Google project. Jessamyn pulls out an interesting quote from the article:

When it comes down to it, then, this brave new world of book search probably needs to be understood as Book Search 1.0. And maybe participants should not get so hung up on quality that they obstruct the flow of an astounding amount of information. Right now, say many, the conveyor belt is running and the goal is to manage quantity…

Quantity is certainly an important aspect of Google Books, but even Google knows it can’t ignore quality. Indeed, it’s worth remembering that Michigan’s announcement wasn’t the first million-book milestone to be announced. Back in November, the Universal Digital Library announced that it had over 1.5 million volumes available digitally. But that project doesn’t seem to have made as big a splash as Google or Michigan. And part of the reason, I think, is quality. Though it pains me to say it (I’ve worked with them in the past, and know and like many of the folks involved) their digitized editions are so often unusable or unreliable that I don’t regularly list them. Google’s not perfect either– they have their share of cut-off, missing and illegible pages– but in my experience, their books are often good enough to be usable, (though some others disagree). Michigan also reportedly has its own review process to weed out or improve the worst scans. (Google also has a bad-page-reporting feature, though I don’t have a sense yet of how responsive they are to fix reported errors.)

The mass digitization I’ve seen with the highest consistent quality is in the American Libraries and Canadian Libraries collections of the Internet Archive text repository, which between them now provide nearly a quarter million volumes online. Since these are all publicly accessible, this represents even more books freely readable by the public than Michigan’s scanned million– and they tend to be of considerably higher quality than Google’s offerings. If you’ve been following the new books listings of The Online Books Page, you may notice their editions are often the ones we’ve been picking to fill open reader requests. These books were produced through the Open Content Alliance, often with help from Microsoft or Yahoo.

There’s still room for improvement. The Internet Archive doesn’t seem to be able to handle heavy loads as well as Google does. They don’t have very good ways of handling multi-volume works (whether monographs or serials), though they at least usually make volume numbers more visible than Google does. And no one yet in the mass-digitization projects seems to be doing a good job at consistently providing readable transcribed text along with the page images. (The text is often good enough to search, but not to read.)

I’m hopeful, though, that we’ll continue to see the quality standard rise over time, as the UDL, Google, Michigan, the OCA, and others all digitize free content, and have to compete for the attention of readers seeking the best online books. In the meantime, there’s much that individuals can do to improve on the scans the giants are providing, whether it’s organizing disparate volumes, putting works into context, producing high quality transcriptions, repackaging them into convenient reader formats, or providing tags and reviews to help people find the most suitable books among the millions.

In libraries, size of your collection is important, but even more important is what your readers can do with the collection. In the early going of mass digitization, quantity makes the big headlines; in the long run, improving quality may well have the greatest impact.

And now, your turn to have a say in ILS interfaces…

I’ve had my head down for the past couple of weeks for various reasons, but I’m happy to surface again and announce that the ILS discovery interface group that I discussed in my last post has produced a new draft of our recommendations, which you can download from our wiki. (MS Word format, 43 pages.) This is the version we’ll be taking to Code4lib, to our March meeting with vendors and developers (which I’m happy to report has broader attendance now), and possibly to other venues as well. (I’ll announce any further public presentations and discussions here.) We’ll use the feedback and discussion we hear at those venues to shape the final draft.

But you don’t have to travel anywhere to give us feedback. If you’re interested in standard interfaces for building cool discovery applications on top of the ILS (either as an ILS provider or an ILS client), we want to hear from you! Some relevant questions:

  • Which of the functions mentioned are most important to you?
  • What have we missed that’s important to you?
  • Do you know of, or can you recommend, specific implementations and/or standards for these functions that we haven’t mentioned in the draft?
  • Are you working on, or interested in working on, more specific bindings, or reference implementations, of these functions? (And if you are, can we see?)
  • Do you have an interesting discovery application we should know about that could make good use of ILS data and services? (And again, can we see?)
  • What are the best ways to cultivate partnerships and developer communities to move these recommendations forward, and evolve them appropriately?

I’m very interested in hearing from you, whether here, on our Wiki, or in our meetings.

Blowing the lid off the ILS (and the providers’ chance to have a say)

It’s now hardly a secret that many large research libraries are increasingly chafing at their traditional integrated library systems (ILSs). Duke University recently announced that they were planning to design an all new, open source ILS, presumably to replace their vendor-supplied ILS. This was just the latest example of a litany of impatience I have heard from various folks involved with organizations like the Digital Library Federation (DLF), which includes many larger North American research libraries doing innovative work in the digital realm. Library polls like the recent Perceptions 2007 survey give the lowest system satisfaction scores for systems like Voyager and Aleph, which are commonly used at large research libraries. Those low scores may have more to do with heightened expectations of their users than relative inadequacies of those systems. But heightened expectations have a way of spreading out to the general population over time.

The responses to a survey conducted by a DLF task force that I’m chairing cited numerous inadequacies with the public access catalog component of the ILS. The vast majority of respondents are using– and in many cases developed themselves– a wide variety of discovery tools that go beyond what the ILS itself offers. A number of libraries are building their own overlay or alternative catalogs, such as VuFind, Extensible Catalog (XC), and NCSU Endeca, to provide better information discovery in different ways. At the library where I work, we’ve hacked into our ILS to provide various enhanced discovery services like a video catalog, a social tagging system, and subject map browsers I’ve discussed in earlier posts.

What’s become increasingly clear to those of us trying to move information discovery forward is that we can no longer expect a single “integrated library system” to satisfy our current and emerging collection discovery needs by itself. And it’s inefficient and frustrating for each of us to hack custom interfaces to each ILS we have to build on top of it. Instead, we need a set of standard machine interfaces that allow us to build and provide new discovery systems on top of whatever ILS we have, using its data and services in whatever ways best help our users make the most of our extensive library collections and resources. (The various Web 2.0 initiatives suggest plenty of possibilities for developing and integrating such systems, and at the same time raise our own users’ discovery expectations.) There’s already been some work in the library world in developing protocols for interoperability, such as Z39.50, SRU, OAI-PMH, and NCIP. What’s needed now are some standard profiles for a complete suite of functions that support catalog alternatives and supplements, and that can be widely implemented and supported.

The DLF convened a group to recommend such technical profiles last year. In the fall, we compiled a set of functional requirements and presented and discussed them with library folks at the last DLF Forum. Now we’re detailing and refining the technical requirements, and preparing to discuss them with developers, vendors, and other service providers, to see what they are able and willing to implement and solicit suggestions. I made a general invitation to ILS vendor representatives at the recent ILS “president’s seminar” at ALA Midwinter. Others in our group are going out to the upcoming Code4lib conference to talk with open source and library-based development projects. And Peter Brantley, executive director of the DLF, recently sent out invitations to selected ILS vendors and developers to a workshop in March to discuss and help shape our recommendations.

We’re heard back from some of our invitees, but there are a number we haven’t heard from yet. I don’t know if this is because the recipients just haven’t gotten around to replying, or they’re not sure how serious we are about these standards, or they’re worried about whether more interoperable and more interchangeable ILS’s might threaten their markets. I’ve heard one respondent wonder whether would it be smarter for some companies to not participate, and hopefully thereby kill off the initiative, and protect vendor lockin with their sales base?

Well, that’s not going to happen. Whether or not particular vendors are on board, we’re going forward with this initiative. There’s already significant interest in the library community for more open, more flexible interfaces to our acquisitions, catalog, and circulation services. And there’s substantial existing work and interest in developing overlays to existing ILS’s, and in building new ones. (Besides the Duke proposal, and the various ILS overlays and catalogs mentioned above, there are already two complete open source ILS’s, Koha and Evergreen, in production in some libraries.) We can move forward just with the existing library and open-interfaces development community if we need to. That said, having more ILS vendors participating will make it easier and quicker for suitable interfaces to emerge, and make them available to a wider set of libraries. ILS vendors who participate can help shape the recommendations, gain understanding that can help them get a jump on development, highlight competitive advantages for libraries concerned about discovery API support in their next ILS, and tap new markets for discovery applications compatible with the standard interfaces.

So I encourage those developers and vendors that Peter and I have invited to get in touch with us soon. And I would also like to invite folks who haven’t heard from us, but are developing in the ILS and information-discovery domain, to contact me as well, to see how you can get involved. There may be room for you at the table as well, if you respond soon.