The right to read, circa 1906

For a few years in the early 1900s, some American book publishers came up with a brave new marketing paradigm. Instead of offering books for sale the old-fashioned way, they essentially decided to license them. Purchasers were warned of dire legal consequences if they didn’t go along with the licenses attached to the books. If, for instance, you bought Kate Meredith, Financier, published in 1906, you would be greeted by this text on the first page:

“NOTICE TO PURCHASER”

“This copyright volume is offered for sale to the public only through the authorized agents of the publishers, who are permitted to sell it only at retail and at fifty cents per copy, and with the express condition and reservation that it shall not, prior to August 1st, 1907, be resold, or offered or advertised for resale. The purchaser from them agrees to this condition and reservation by the acceptance of this copy. In case of any breach thereof, the title to this book immediately reverts to the publishers. Any defacement, alteration or removal of this notice will be prosecuted by the publishers to the full extent of of the law.”

THE AUTHORS AND NEWSPAPERS ASSOCIATION

It wasn’t just done with books, either. Here’s a similar license from a 1907 Edison cylinder record.

In 1908, however, the Supreme Court would put an end to these kinds of licenses, in Bobbs-Merrill vs. Straus. That case helped establish the first sale doctrine, which basically says that a buyer of a book really does own it, and has the right to keep it, share it, lend it, resell it, give it away, or otherwise dispose of it as they see fit, just like they can with other things they buy. Congress would eventually codify this doctrine, with various qualifications, in the copyright statutes; it’s now section 109 of the copyright code.

So thanks to the courts and Congress of a century ago, you can pick up a book in a bookstore and buy it with confidence. You don’t have to carefully look it over in the store or show it to an expert to figure out what you’re allowed to do with it when you’re through reading it yourself.

At least, you don’t if it’s a print book. When you buy pay for an ebook from the bookstores for the best-known current “reading devices”, however, it’s a different story.

Posted in copyright, online books | 1 Comment

Here, have some more Punch

Punch was a British institution for well over a century. Founded in 1841, it was an irreverent weekly magazine of quips, cartoons, essays, stories and poetry, often on the politics and events of the day. Writers and artists like W. M. Thackeray, A. A. Milne, P. G. Wodehouse, Kingsley Amis, Arthur Rackham, and Ernest Shepard contributed to it. Punch folded in 1992, with its best days long past, but in its heyday it enjoyed great popular and critical acclaim. If the Daily Show writers had lived in 19th century London instead of 21st century America, they might have created something like it.

Reading it now can be a bit disorienting, partly because the writers often assume the readers are already very familiar with the contemporary headlines and goings on, and partly because the sense of what’s funny is so mercurial. What makes an 1860s London reader of Punch break out into uncontrollable laughter may be very different from what has the same effect on an 2008 Xkcd fan.

But all sorts of folks still find it of interest, whether they’re researching English history and culture, looking for long out of print literature and drawings from writers and artists they fancy, or just wanting a good read. The first 80 years or so of issues are now in the public domain. Project Gutenberg started transcribing them (including the cartoons) a few years back. Now the mass digitizers have gotten involved too, and in response to a reader’s request I’ve found and organized online copies of most of the issues up to 1922. (After that point, copyright issues get sticky.)

Here’s my listing. Enjoy. (And do tell me if you find any issues I couldn’t.)

Posted in online books, serials | Comments Off on Here, have some more Punch

We call dibs! (or, the genius of the Harvard mandate)

The Harvard Arts and Sciences faculty recently approved a resolution giving the University permission to make their scholarly articles available to the world at no charge. Here’s the official press release from Harvard, and here’s the text of the resolution, as given in the official faculty council agenda. (The resolution text is on the second page. It could have been amended before the vote, but I haven’t heard of any amendments.)

This is the first university-level open access mandate in the US, from the most prominent university in the US, and as many have noted, this is a huge step forward for open access to research. There are two aspects to the mandate: the familiar aspect directs faculty to supply Harvard copies of their papers to post; the more novel aspect stipulates that Harvard automatically get the rights to post their faculty papers for free. Harvard allows faculty members to exempt papers from these requirements, but it must be done in writing, with reason, separately for each paper that a faculty member wants to exempt.

I find this approach ingenious. As people maintaining institutional repositories have come to know, there are two main barriers to distributing one’s faculty’s work in one’s repository: getting hold of the work, and getting the right to publish the work. The first of these can be handled in various ways; whether the faculty, the departmental administrators, or the librarians get the content to the right place, it’s all purely a matter of local negotiation. But that’s not the case with rights. By the time we repository maintainers get content from authors, the authors have often signed their rights away to the journals that published the papers. The publishers have effectively called dibs on redistribution rights, and we can’t distribute unless they agree to it. A faculty member that may want to have us distribute her work too may no longer have the power to let us– she’s already signed that right away to someone else.

By requiring (non-exclusive) rights to free, open access distribution to any new paper created under its employ, Harvard is effectively calling dibs before the publishers can. So if I’m running a repository at Harvard (or another institution with a similar policy), copyright clearance becomes much easier. I don’t have to look up and carefully parse a journal’s self-archiving policy, try negotiating with publishers, or verify that I have the permitted version of a paper to archive and the proper embargo period. As long as the paper is dated after the mandate went into place, and the paper’s not on my institution’s exception list, I can just grab and go. Or, I can accept my faculty’s and department’s self-deposits without having to go back and forth with them about whether they have the right permissions and are following the right procedures for that publisher. Publishers may want their authors to sign away the rights that they’ve given us, but they can’t, at least not without going out of their way to do so, because we already have those rights. And as Dorothea Salo points out, there are disincentives for both publishers and faculty to rock the boat here. Under this arrangement, the norms have changed– from restricted access as a default, with the onus for exceptions placed on the library or the scholar, to open access as a default, with the onus for exceptions placed on the publisher or the scholar.

Some open access advocates have argued that one could design a mandate that was even more open-access-friendly. That may be, but to judge this mandate a failure (as the linked post above appears to at one point) seems to me an example of the “perfect” being the enemy of the good. This mandate is faculty-friendly as well as being open-access friendly, in that it minimizes the extra work faculty have to do and assures them the last word in access control, should they decide to exercise it. And that, I believe, is crucial to its having been adopted at all, and to its subsequent acceptance by faculty. (Remember, this is the first open access mandate that a US university faculty, let alone one with the clout of Harvard, has adopted on its own.)

In the future, perhaps universities will adopt even more effective policies to share their research output with the world. Right now, though, I’d say this is a big step forward, and one that I hope that my university and many others will consider emulating.

Posted in copyright, open access, publishing, sharing | 2 Comments

Improving the millions

Michigan’s announcement earlier this month that they had over one million volumes from their collection digitized was widely hailed online. That million books includes both copyrighted and public domain content. According to Michigan’s John Wilkin, who I talked with shortly after the announcement, about 15-20% of what’s digitized is certified as public domain, and made freely accessible.

Those familiar with the OAI-PMH protocol can download catalog records for the public-domain-certified books; as of today, there were over 116,000 such records. (According to John, there are about 3 catalog items for every 4 volumes, due to things like multi-volume works, so one million volumes translates to about 750,000 works, and about 15% of those are in the public domain OAI feed.)

Most of Michigan’s million volumes were digitized by Google. Jessamyn West’s post on the milestone included a link to an interesting article in Campus Technology about the workflow of the Google project. Jessamyn pulls out an interesting quote from the article:

When it comes down to it, then, this brave new world of book search probably needs to be understood as Book Search 1.0. And maybe participants should not get so hung up on quality that they obstruct the flow of an astounding amount of information. Right now, say many, the conveyor belt is running and the goal is to manage quantity…

Quantity is certainly an important aspect of Google Books, but even Google knows it can’t ignore quality. Indeed, it’s worth remembering that Michigan’s announcement wasn’t the first million-book milestone to be announced. Back in November, the Universal Digital Library announced that it had over 1.5 million volumes available digitally. But that project doesn’t seem to have made as big a splash as Google or Michigan. And part of the reason, I think, is quality. Though it pains me to say it (I’ve worked with them in the past, and know and like many of the folks involved) their digitized editions are so often unusable or unreliable that I don’t regularly list them. Google’s not perfect either– they have their share of cut-off, missing and illegible pages– but in my experience, their books are often good enough to be usable, (though some others disagree). Michigan also reportedly has its own review process to weed out or improve the worst scans. (Google also has a bad-page-reporting feature, though I don’t have a sense yet of how responsive they are to fix reported errors.)

The mass digitization I’ve seen with the highest consistent quality is in the American Libraries and Canadian Libraries collections of the Internet Archive text repository, which between them now provide nearly a quarter million volumes online. Since these are all publicly accessible, this represents even more books freely readable by the public than Michigan’s scanned million– and they tend to be of considerably higher quality than Google’s offerings. If you’ve been following the new books listings of The Online Books Page, you may notice their editions are often the ones we’ve been picking to fill open reader requests. These books were produced through the Open Content Alliance, often with help from Microsoft or Yahoo.

There’s still room for improvement. The Internet Archive doesn’t seem to be able to handle heavy loads as well as Google does. They don’t have very good ways of handling multi-volume works (whether monographs or serials), though they at least usually make volume numbers more visible than Google does. And no one yet in the mass-digitization projects seems to be doing a good job at consistently providing readable transcribed text along with the page images. (The text is often good enough to search, but not to read.)

I’m hopeful, though, that we’ll continue to see the quality standard rise over time, as the UDL, Google, Michigan, the OCA, and others all digitize free content, and have to compete for the attention of readers seeking the best online books. In the meantime, there’s much that individuals can do to improve on the scans the giants are providing, whether it’s organizing disparate volumes, putting works into context, producing high quality transcriptions, repackaging them into convenient reader formats, or providing tags and reviews to help people find the most suitable books among the millions.

In libraries, size of your collection is important, but even more important is what your readers can do with the collection. In the early going of mass digitization, quantity makes the big headlines; in the long run, improving quality may well have the greatest impact.

Posted in copyright, online books, open access | Comments Off on Improving the millions

And now, your turn to have a say in ILS interfaces…

I’ve had my head down for the past couple of weeks for various reasons, but I’m happy to surface again and announce that the ILS discovery interface group that I discussed in my last post has produced a new draft of our recommendations, which you can download from our wiki. (MS Word format, 43 pages.) This is the version we’ll be taking to Code4lib, to our March meeting with vendors and developers (which I’m happy to report has broader attendance now), and possibly to other venues as well. (I’ll announce any further public presentations and discussions here.) We’ll use the feedback and discussion we hear at those venues to shape the final draft.

But you don’t have to travel anywhere to give us feedback. If you’re interested in standard interfaces for building cool discovery applications on top of the ILS (either as an ILS provider or an ILS client), we want to hear from you! Some relevant questions:

  • Which of the functions mentioned are most important to you?
  • What have we missed that’s important to you?
  • Do you know of, or can you recommend, specific implementations and/or standards for these functions that we haven’t mentioned in the draft?
  • Are you working on, or interested in working on, more specific bindings, or reference implementations, of these functions? (And if you are, can we see?)
  • Do you have an interesting discovery application we should know about that could make good use of ILS data and services? (And again, can we see?)
  • What are the best ways to cultivate partnerships and developer communities to move these recommendations forward, and evolve them appropriately?

I’m very interested in hearing from you, whether here, on our Wiki, or in our meetings.

Posted in architecture, libraries | Comments Off on And now, your turn to have a say in ILS interfaces…

Blowing the lid off the ILS (and the providers’ chance to have a say)

It’s now hardly a secret that many large research libraries are increasingly chafing at their traditional integrated library systems (ILSs). Duke University recently announced that they were planning to design an all new, open source ILS, presumably to replace their vendor-supplied ILS. This was just the latest example of a litany of impatience I have heard from various folks involved with organizations like the Digital Library Federation (DLF), which includes many larger North American research libraries doing innovative work in the digital realm. Library polls like the recent Perceptions 2007 survey give the lowest system satisfaction scores for systems like Voyager and Aleph, which are commonly used at large research libraries. Those low scores may have more to do with heightened expectations of their users than relative inadequacies of those systems. But heightened expectations have a way of spreading out to the general population over time.

The responses to a survey conducted by a DLF task force that I’m chairing cited numerous inadequacies with the public access catalog component of the ILS. The vast majority of respondents are using– and in many cases developed themselves– a wide variety of discovery tools that go beyond what the ILS itself offers. A number of libraries are building their own overlay or alternative catalogs, such as VuFind, Extensible Catalog (XC), and NCSU Endeca, to provide better information discovery in different ways. At the library where I work, we’ve hacked into our ILS to provide various enhanced discovery services like a video catalog, a social tagging system, and subject map browsers I’ve discussed in earlier posts.

What’s become increasingly clear to those of us trying to move information discovery forward is that we can no longer expect a single “integrated library system” to satisfy our current and emerging collection discovery needs by itself. And it’s inefficient and frustrating for each of us to hack custom interfaces to each ILS we have to build on top of it. Instead, we need a set of standard machine interfaces that allow us to build and provide new discovery systems on top of whatever ILS we have, using its data and services in whatever ways best help our users make the most of our extensive library collections and resources. (The various Web 2.0 initiatives suggest plenty of possibilities for developing and integrating such systems, and at the same time raise our own users’ discovery expectations.) There’s already been some work in the library world in developing protocols for interoperability, such as Z39.50, SRU, OAI-PMH, and NCIP. What’s needed now are some standard profiles for a complete suite of functions that support catalog alternatives and supplements, and that can be widely implemented and supported.

The DLF convened a group to recommend such technical profiles last year. In the fall, we compiled a set of functional requirements and presented and discussed them with library folks at the last DLF Forum. Now we’re detailing and refining the technical requirements, and preparing to discuss them with developers, vendors, and other service providers, to see what they are able and willing to implement and solicit suggestions. I made a general invitation to ILS vendor representatives at the recent ILS “president’s seminar” at ALA Midwinter. Others in our group are going out to the upcoming Code4lib conference to talk with open source and library-based development projects. And Peter Brantley, executive director of the DLF, recently sent out invitations to selected ILS vendors and developers to a workshop in March to discuss and help shape our recommendations.

We’re heard back from some of our invitees, but there are a number we haven’t heard from yet. I don’t know if this is because the recipients just haven’t gotten around to replying, or they’re not sure how serious we are about these standards, or they’re worried about whether more interoperable and more interchangeable ILS’s might threaten their markets. I’ve heard one respondent wonder whether would it be smarter for some companies to not participate, and hopefully thereby kill off the initiative, and protect vendor lockin with their sales base?

Well, that’s not going to happen. Whether or not particular vendors are on board, we’re going forward with this initiative. There’s already significant interest in the library community for more open, more flexible interfaces to our acquisitions, catalog, and circulation services. And there’s substantial existing work and interest in developing overlays to existing ILS’s, and in building new ones. (Besides the Duke proposal, and the various ILS overlays and catalogs mentioned above, there are already two complete open source ILS’s, Koha and Evergreen, in production in some libraries.) We can move forward just with the existing library and open-interfaces development community if we need to. That said, having more ILS vendors participating will make it easier and quicker for suitable interfaces to emerge, and make them available to a wider set of libraries. ILS vendors who participate can help shape the recommendations, gain understanding that can help them get a jump on development, highlight competitive advantages for libraries concerned about discovery API support in their next ILS, and tap new markets for discovery applications compatible with the standard interfaces.

So I encourage those developers and vendors that Peter and I have invited to get in touch with us soon. And I would also like to invite folks who haven’t heard from us, but are developing in the ILS and information-discovery domain, to contact me as well, to see how you can get involved. There may be room for you at the table as well, if you respond soon.

Posted in architecture, libraries | 3 Comments

Close readers

There’s been a lot of public fretting lately over the state of reading. People don’t read as much as they once did, we’re told. When it’s pointed out that in fact lots of people are reading online, we’re sometimes told it’s the wrong kind of reading– “inanities” of “blogging and blugging”, to quote a recent Nobel laureate. I get the impression from some essays that on the one hand there’s offline reading, deep, solitary, and contemplative, and on the other hand online reading, shallow, social, and mercurial. The two seem to have little in common, by this sort of account.

Of course, it’s not that simple. And I’ve recently encountered a few initiatives that cut right across that dichotomy, gathering people together to closely read and discuss texts with each other.

At a recent lunchtime get-together, JT Waldman told me about the Jewish Publication Society‘s new Yavnet web site, now in alpha. It’s a project to create “a living and breathing commentary” on the Torah, by encouraging readers to look at particular passages, and join online, moderated discussions on them. Drawing on the JPS’s Tanakh translations, its published, scholarly commentaries, and online discussions, readers will be able to read and participate in conversations that help bring out the meaning of Biblical passages. The basic idea isn’t new– the Talmud, after all, is a centuries-old multi-layered commentary on the scriptures– but the Yavnet folks hope to use the Internet to grow and propagate fresh understandings and appreciations of the Torah in online communities.

Another recently announced site is Book Glutton (this one says it’s in beta), which aims to bring groups of people together to discuss a book as they read it. Their reader software is also designed for close reading; instead of just reviewing or discussing a book in general, readers can attach comments to specific passages of a book, and have live chats with people reading the same sections. They appear to be built largely on public domain texts, which lend themselves well to new interfaces and purposes.

Mind you, with openly accessible texts, you don’t have to limit your discussion to a single site. Jon Udell recently wrote about how discussions of scientific articles are often widely distributed over the blog network. An active discussion ensued, and just a couple of days later, he made a followup post showing some of the tools now available to track such discussions online. That was quick!

Close reading, whether of books or other text, may sometimes be solitary, but it doesn’t have to be. And the Net can bring together close discussions of text from far-flung participants, discussions that were not practical to convene offline. As Ursula Le Guin put it in the new (February) issue of Harper’s, “Books are social vectors, but publishers have been slow to see it.” (You’ll have to subscribe online or find a copy at your library or news-stand to read the full article, but it’s already being discussed online in various places. I first saw her quote in this Mediabistro post, which also points to some related discussions elsewhere.)

I look forward to seeing the new directions that the social vectors of books and texts take online, as the sites above and others like them develop further.

Posted in open access, reading, sharing | Comments Off on Close readers

More on subject maps

The slides for my ALA Midwinter presentation on subject maps (which I described in a previous post) are now online. (Yes, I’m playing around with BePress’s Selected Works.) You can also find links to the presentation, a white paper, and demos, from our library lab’s subject maps page.

I’m afraid it’s only the slides (no notes) but I’ll be happy to answer questions or take suggestions. I’m hoping to spend a fair bit of time this semester on subject maps applications for our catalog and digital library architecture, and I’m very interested in seeing how much we can do with them.

During the presentation, someone asked “What do you do when you have more than one subject ontology/thesaurus?” I addressed this question in a talk I gave at last spring’s DLF forum (slides here, though some pictures show up dark). In short, I see four basic strategies one can take. (Warning: What follows has more technical abbrevations and shorthand than my usual posts here, but I’ve provided some explanatory links, and can go into more detail and explanation in later posts if there’s interest.) The four strategies are:

  • Throw all the terms in together, and hope for the best. Sounds like a recipe for chaos, but it is cheap, and may work when users look for both kinds of terms. Ultimately, you’d like to relate the terms somehow (and automated tools may be able to help you find popular terms that are isolated from other terms, so you can relate them to other terms).
  • Normalize to a preferred ontology. This may be a good way to go, for instance, when you have a big, controlled ontology you want to use, and a small, less controlled one that doesn’t get much use. The terms in the less controlled ontology can be added as aliases for terms in the controlled ontology, or can be rewritten to fit in with the main ontology.
  • Make multiple subject maps, link them in appropriate places. This may be appropriate when your maps tend to support different research communities (e.g. MeSH for medical research vs. LCSH for general scholarship), or different kinds of search activities (e.g. folksonomies for current peer awareness vs. LCSH for back-literature searches). In some cases, such as for MeSH and LCSH, existing crosswalks exist that can be used for links between maps.
  • Build a multi-ontology subject map. This can be challenging, but may be appropriate where you have different collections described through different ontologies that people are likely to want to explore in tandem; for example, a set of books on a particular era of history described in LCSH along with a set of photographs of the same era described in TGM. It can get a bit tricky when the two ontologies have different names for the same concept, though, or identical names that refer to two different concepts.

I hope to try out some of these strategies as we develop subject maps at Penn. (Our Franklin catalog has many subject entries in LCSH and MeSH, for example; and we also have uncontrolled tags that PennTags users assign to items in our catalog. I’d like to see if we can let users browse in useful ways across all three topical spaces.) I’ll post here about any interesting new visible developments and demos.

Posted in sharing | Comments Off on More on subject maps

Newbery book quick update

Mary now has a list of the apparently unrenewed-copyright Newbery winners, plus a description of The Windy Hill, the Newbery honor book she put online, on her blog.

Posted in copyright, online books | Comments Off on Newbery book quick update

New Newbery and Caldecott winners announced; Old Newbery winners go online

One of the highlights of the American Library Association‘s Midwinter meeting (which just concluded here in Philadelphia) is the announcement of the winners of the Newbery Medal, the Caldecott Medal, and ALA’s other book prizes. The Newbery is one of the oldest and best-known awards for children’s literature, and winning it can guarantee substantial sales, and keep a book in print for decades.

They’ve made some interesting picks this year. This year’s Newbery Medalist is Good Masters! Sweet Ladies! Voices From a Medieval Village. It’s not just one story; it’s 22: a connected set of vignettes told by characters who inhabit an English village in the year 1255. Author Laura Amy Schlitz (who’s also a librarian) wrote them to be performed by 5th graders at her school. Newbery committee chair Nina Lindsay calls the end result “a pageant that transports readers to a different time and place” through “varied poetic forms and styles offer[ing] humor, pathos and true insight into the human condition.”

The Newbery Honor books this year are:

The Caldecott medal usually goes to short picture books for young readers, but this year’s winner is different. It’s a 500+ page graphic novel by Brian Selznick called The Invention of Hugo Cabret, and tells the story of an orphan in early-20th-century Paris living inside the walls of a train station, trying to finish an invention left by his father. The ALA Caldecott site says “the suspenseful text and wordless double-page spreads narrate the tale… which is filled with cinematic intrigue.” Sounds intriguingly steampunk to me.

The Caldecott Honor books this year are:

I’m also happy to report that some of the earliest Newbery awardees have been put online. The early medal-winners have been up for a while, but the honor books can be just as interesting, though often much harder to find. Mary just posted Cornelia Meigs’ 1922 Newbery Honor Book The Windy Hill today to her Celebration of Women Writers, and the Open Content Alliance has recently scanned most of the other honor books from that year. (I’m still seeking a copy of Cedric the Forester, but all the others are now online.)

While Newbery medalists often stay in print indefinitely, lots of other good children’s books are published every year that quickly fade into obscurity. Even most of the early Newbery Honor Books are now out of print and hard to find. But perhaps not for long. Might some of the rightsholders to the older books, particularly the out of print titles, be willing to let them go online so kids can read them again? Or are some of them already fair game? In some initial investigations, Mary’s found more than a dozen post-1923 Newbery Honor Books, and two post-1923 Newbery medalists, whose copyrights appear not to have been renewed at all, and would therefore now be in the public domain.

You can read the online Newbery winners (and early winners of the Nobel and Pulitzer prizes as well) from the Prize-Winning Books Online exhibit of The Online Books Page. We hope to add more titles to this exhibit in the near future. If you’re interested in clearing copyrights or digitizing any of these books, I’d be very interested in hearing from you.

I hope you’ll enjoy reading these newly honored and newly digitized books!

Posted in awards, copyright, online books | 3 Comments