Everybody's Libraries

What you’re asked to give away

Posted on May 8, 2009 by John Mark Ockerbloom

If you’ve published an article in an Elsevier journal, you might have missed an interesting aspect of the contract you signed with them to get published. It goes something like this:

I grant Elsevier the exclusive right to select and reproduce any portions they choose from my research article to market drugs, medical devices, or any other commercial product, regardless of whether I approve of the product or the marketing.

What, you don’t remember agreeing to that? Actually, the words above are mine. But while it isn’t explicitly stated in author agreements, Elsevier authors usually grant that right implicitly. Elsevier’s typical author agreement requires you to sign over your entire copyright to them. Why ask for the whole copyright, instead of just, say, first serial rights, and whatever else suffices for them to include the article in their journal and article databases? Elsevier explains:

Elsevier wants to ensure that it has the exclusive distribution rights for all media. Copyright transfer eliminates any ambiguity or uncertainty about Elsevier’s ability to distribute, sub-license and protect the article from unauthorized copying or alteration.

That “unauthorized” would be “unauthorized by them”. Not “unauthorized by you”. Once you sign, you’ve given up the right to authorize copying or alteration, or any other rights in the copyright, except for rights they offer back to you. For instance, you can’t “sub-license” your article for anything Elsevier deems “commercial purposes”. But they can, and do.

And sometimes those commercial purposes have had questionable ethics. The Scientist reported about a week ago that “Merck published [a] fake journal” with Elsevier. (Free registration may be required to read the article.) As they report:

Merck paid an undisclosed sum to Elsevier to produce several volumes of a publication that had the look of a peer-reviewed medical journal, but contained only reprinted or summarized articles–most of which presented data favorable to Merck products–that appeared to act solely as marketing tools with no disclosure of company sponsorship.

The publication, Australasian Journal of Bone and Joint Medicine, was published by an Elsevier subsidiary called Excerpta Medica. As that subsidiary explains on their web site, “We partner with our clients in the pharmaceutical and biotech communities to educate the global health care community and enable them to make well-informed decisions regarding treatment options.” In other words, they’re a PR agency for drug companies and other companies selling medical products. Part of what they do is publish various periodicals designed to promote their clients.

Now, a number of companies publish sponsored magazines, and usually such publications clearly disclose their sponsorship, or are otherwise easily recognizable as “throwaway” commercial journals. But this publication was designed to look more like a peer-reviewed scientific journal. The Scientist reports this court testimony from a medical journal editor:

An “average reader” (presumably a doctor) could easily mistake the publication for a “genuine” peer reviewed medical journal, [George Jelinek] said in his testimony. “Only close inspection of the journals, along with knowledge of medical journals and publishing conventions, enabled me to determine that the Journal was not, in fact, a peer reviewed medical journal, but instead a marketing publication for MSD[A].”

Indeed, one of the publication’s “honorary editors” admitted to the Scientist that it included marketing material, but that “[i]t also had papers that were excerpted from other peer-reviewed journals. I don’t think it’s fair to say it was totally a marketing journal.” But that was what Merck paid Elsevier for, and the excerpts from real Elsevier-acquired research articles helped the publication as a whole look like disinterested scholarship instead of advertising.

Elsevier did show some embarrassment from these revelations, particularly after widespread online outrage. A statement posted yesterday by an Elsevier spokesman admitted the journal did not have “the appropriate disclosures”, and added

I have affirmed our business practices as they relate to what defines a journal and the proper use of disclosure language with our employees to ensure this does not happen again.

That’s certainly a step up from a previous statement quoted in the Scientist article, which, after also admitting the disclosure problems in the “journal”, simply said “Elsevier’s current disclosure policies meet the rigor and requirements of the current publishing environment,” and made no promises about what they would do in the future.

But the new statement still leaves unanswered the question of why there are still 4 “peer reviewed journals” published under the imprint of a PR agency whose stated mission is to “support our client’s marketing objectives with strategic communications solutions in [areas that include] Medical Publishing.” And legally, Excerpta Medica still has the right to cherry-pick from any article signed over to Elsevier in any of their marketing publications. Or, as they announce to potential clients, “we can leverage the resources of the world’s largest medical and scientific publisher.” Even with what Elsevier considers “proper use of disclosure language”, some authors might not want their writing used in this way.

Am I being unfair to Elsevier here? They’re not the only academic publisher that asks its authors to sign over their copyrights. And some of the more liberal open publication licenses, which I’ve been known to recommend, are broad enough that they too give marketers rights to reuse one’s work in their promotions.

On the first of those points, I recommend in general that authors avoid signing over their rights entirely (as I’ve managed previously), no matter who the publisher is. But last I checked, most other academic publishers don’t also own a PR firm for commercial product marketing. (And if any do, they should disclose this possible use in their interactions with authors. I find no explicit disclosure of this in either Elsevier’s model agreement or on the current version of Elsevier’s author rights page.)

On the second point, if you grant an open publication license, you generally know what you’re getting into. And you can still defend against misuse of your work in ways that you can’t do if you just sign over your copyright to a publisher. Some open access licenses, for instance, include an attribution condition that requires any reuse of the article to credit and point to the original source, and derivation conditions that either prohibit changes or require changes to be disclosed. (And some licenses simply prohibit commercial use altogether except by permission.) Whatever license you choose, if a company does quote your work out of context in its marketing, and you’ve kept your own rights to reprint the article, you can publish a rebuttal as widely as you like, showing the omitted context that counters a company’s claims. These conditions and rights can provide potent deterrents against misuse of your articles.

Often the debates over scholarly author rights and open access focus on who gets to read and use scholarly articles, and what gets paid to whom. This episode highlights another important part of the debate: who gets the right to guard the integrity of one’s scholarship. In the light of recent revelations, authors might want to think carefully about whether to sign that right away, and to whom.

[Updates, 9 May 2009: Some spelling corrected, and a note added that disclosure is not the only potential concern of authors whose works are used for marketing purposes.]

Posted in copyright, crimes and misdemeanors, open access, publishing, serials | 1 Comment

Do we need a “reader’s rights registry”?

Posted on May 5, 2009 by John Mark Ockerbloom

This spring’s Digital Library Federation forum, the last under DLF as an independent organization, included a session for “lightning talks”, where speakers take 5 minutes or less to present an idea or make a demonstration. Here are the remarks I prepared for the one I gave, building on some recent discussion of the Google Books Settlement. I’ve added some links to provide a bit more context to these remarks.

Like many other folks, I’m excited about the Google Books settlement, and the possibility that millions of out of print books may become available for our patrons to search and read online. I think the services that Google proposes could greatly expand the scope and the usability of our book collections.

But I also appreciate the concern expressed over some of the details of the settlement: in particular, the concerns over the monopolies that it creates. Monopolies have a tendency to impose increasingly onerous terms on consumers. They may raise prices to levels that make it difficult for libraries to fulfill their independent missions. They may also pay insufficient attention to concerns like privacy or intellectual freedom that are important for many of our readers.

A lot of people have commented on Google’s monopoly here, but the settlement actually creates another monopoly, one that may be more problematic, and that’s the Book Rights Registry. The name suggests a passive store of information, but in fact it’s an active agent that has the power to negotiate and set terms of access on behalf of publishers and authors for millions of out of print books. This is the body—and the only body—that will set the default wholesale prices for copyrighted books, whether they’re provided by Google or by anyone else that might get similar rights in the future.

And the information the registry maintains might not be what’s most useful to libraries and the public. As agents for copyright holders, they’re likely to keep very good records about who claims rights to various books, and who you should pay if you want to use them. But they won’t have much incentive to determine, for instance, that a particular work is actually in the public domain. As we saw in yesterday’s first panel, many works published after 1922 actually are public domain, but it’s often very difficult for someone to find this out. And I’ve already heard some folks advise publishers to claim anything they might have rights to, if there’s any uncertainty about its status. So there needs to be some counterbalance to the claims made by commercial publishers and rightsholder organizations.

That’s why I wonder whether it might be a good idea to imagine a sort of “Reader’s Rights Registry”, that would serve as a counterbalance to the Book Rights Registry. The reader’s registry would track information and advocate on behalf of readers and libraries. For instance, it could compile facts related to public domain status or open content licenses, and share these facts widely with the community. It could press Google and other vendors to keep prices affordable, and to make it easy for readers to protect their privacy. It could track books withheld or withdrawn from full text access, for whatever reason, and work to provide access through alternative channels where appropriate. It could press Congress or the Justice Department for reasonable and nondiscriminatory blanket licensing of orphan works and out of print works.

I’m not saying we necessarily have to create a brand new organization to do these things. They could be done by an existing organization, or set of organizations, or a newly merged organization. What’s; important is that we find a way to articulate what our readers need from mass-digitized book services, gather appropriate information to support these needs, and organize the resources to make that a reality. We shouldn’t wait for someone else, whether Google or the courts, to do it for us.

Now, there’s already work underway along these lines. Many of you may be aware, for instance, of OCLC’s Copyright Evidence Registry. It collects facts relevant to determining whether a book is in copyright or in the public domain. At the moment, it’s just a pilot, and the pilot runs until the end of June, with an uncertain future afterwards. And unfortunately, its user interface is limited, and its complex and restrictive terms of use looks to me even more convoluted that what they’re proposing for WorldCat. And I worry that those constraints will effectively choke off the project before it can get sufficient participation. In general, I think the library community is severely handicapping itself by keeping its metadata locked up, but that’s a whole other talk.

There’s also more work to be done in digitizing copyright registrations and renewals. I’m working on some of that now, and would be glad to talk about it later on. Google also is doing its own digitization of the complete Catalog of Copyright Entries, and plans to release it publicly. I’m looking forward to that.

Hathi Trust is also doing important work. As we heard yesterday, they’re doing their own copyright research on their books, and I hope they’ll eventually open up their research registry for the library community to use and contribute to. The scope of activities one might imagine for a Reader’s Rights Registry is a bit broader than what Hathi Trust is currently doing, and it would require a broader base of support that what Hathi Trust has now. But they’ve made a very good start on many of the kinds of independent activities I think we’ll need.

I know that all our libraries are going through difficult times in this economy, and any sort of external investment or organizational overhead can be hard to justify these days. But these mass-digitized collections could seriously reshape the nature of research and reading in the years to come. So I think it’s worth our putting some careful thought into how we want to influence that reshaping. Whatever happens in the courtroom in the next few months, we should marshal the information, the will, and the organization to make sure our readers will be well-served.

Posted in censorship, copyright, metadata, online books, open access, reading, sharing | 3 Comments

David Reed: Some extracts from his life and letters

Posted on April 23, 2009 by John Mark Ockerbloom

Last summer I was looking for a particular book. I couldn’t find it in any library in my State. Went interlibrary loans and found one copy at the library of Congress. Only one copy in the whole country. One of the best stories I ever [heard] about this is one when one of my professors was working on a trash pile of papyrus sheets and came across one that said [it] was the works of Meander. He went through that pile of papyrus with a fine tooth comb. He didn’t find anything but that single piece. He said that it felt as though he was looking across the centuries and saying, “Somewhere out there are the works of Meander.” [Friends,] this is how things get lost forever.

— David Reed, 1997

Today, there are thousands of important books that will likely never share that fate as long as civilization lasts, because they were digitized and sent all over the world. Many of these books were first put online by Project Gutenberg. And many of the Project Gutenberg texts are online thanks to the work of David Reed.

I scanned and released Gibbon’s Decline and Fall of the Roman Empire and hardly a day goes by when I don’t get an email from someone thanking me for releasing it on the web. At one site I know that it has been downloaded 1800+ times in all six volumes.

— David Reed, 2001

In the mid-1990s, Project Gutenberg had an outlandish-sounding goal: to make 10,000 books freely available online by the start of the 21st century. They’d only managed to put a couple hundred online by then. Authors like Clifford Stoll were skeptical that they, or anyone else, would ever reach such a goal.

But Gutenberg was soon publishing more and more texts every month, at an ever-increasing pace. Lots of those texts had David Reed’s name on them. Working persistently with his own scanner, well before the era of well-funded mass digitization, he digitized and proofread long works that few other people at the time would have taken on: Gibbon’s Decline and Fall; Shakespeare’s First Folio; Josephus’ Antiquities of the Jews; Frazer’s Golden Bough; Tocqueville’s Democracy in America. He also scanned numerous works weighty and light from authors like Rudyard Kipling, Louisa May Alcott, Robert Frost, James Joyce, and the US government.

Some critics in academia complained that the books David and others put up for Gutenberg were not up to the standards of scholarly editions. David didn’t begrudge the work of scholars, but he wanted to put up more works, more quickly, to reach a broader audience. As he put it in 1999:

[I] think that [it’s] important to remember that we do all this work because we like to read and we like to share our discoveries with others…. I see no reason why the text specialists can’t have the specialist collections and the general people (like myself) have the general collections. There is room enough on the web for all of us. The real enemy are those who want to lock up all the books in the world. The real enemy are those who don’t read a single book.

David was fighting another enemy besides illiteracy, one closer to home. He had diabetes, and in the last few years of his life his health slowly worsened from complications of that disease. He didn’t mention it in this post (nor, as far as I can remember, in any of the posts he made to the Book People mailing list, from which these quotations are taken). But even while his health was failing, he continued to put books online, like this emergency childbirth manual that was posted this past October. He was working to fulfill a dream that he described back in his 1999 post:

I dream of the day when we have 50,000 and 100,000 etext libraries on the web. Where there are 100 new etexts being released a week or every couple of days. When I can’t keep up with reading every etext that pops up on the Online Book Page or that Project Gutenberg releases. . I appreciate all the work that you are all doing. I love reading the work that you are all doing.

David died on April 21, 2009, according to the email his son Chris sent to David’s contacts list. By then, Google Books and the Internet Archive’s book collection had made over 1 million books freely available online, the various Gutenberg projects had posted just over 30,000 books, and many smaller projects had posted numerous unique titles as well. He lived long enough to see his dream come true, thanks in part to his own pioneering work and dedication.

I have dedicated etexts in honor of my daughter, my sons, my wife, parents and in honor of my companies I work for, even in honor of myself.

— David Reed, 2001

Out there all over the Net, in millions of replicas, are the works of David Reed, transcribing many of the great authors that have also passed on. In some sense, all of those works are dedicated to him. Through them, I hope his name lives on for generations to come.

Posted in online books, people | 43 Comments

Recent copyright news and comment (an extended mix)

Posted on April 9, 2009 by John Mark Ockerbloom

I seem to have a certain degree of inertia over getting a blog post out, and there have been at least 4 interesting recent items related to copyright. Since I haven’t managed to post about each individually, I’ll get over the hump by putting them all into a single post. I hope most of my readers will find at least one of these items of interest.

1. This year marks the 100th anniversary of the passage of the Copyright Act of 1909, the first “modern” copyright law of the US. The announcement for an April 30 conference on the Act describes its significance:

The 1909 Act was the first to protect works upon publication with notice, without prior registration; the first to expressly recognize a right to prepare derivative works; and the first to expressly recognize the public domain. The 1909 Act remained in effect for seven decades, during which time copyright law was repeatedly called upon to deal with the disruptive effect of new technologies, such as motion pictures, sound recordings, radio and television, photocopy machines, and computers. As a result, the 1909 Act had a significant influence on the copyright law we have today.

Several aspects of the law are ones I wish we still had today, like terms of more reasonable length (the maximum under the 1909 act was 56 years from publication), and the earlier expiration of copyrights into the public domain if the owner did not care enough about it to take some basic steps to maintain it (namely, including a copyright notice, and eventually registering and renewing it). Unfortunately, as William Patry laments, the treaty structures we’re now embroiled in prevent us from returning to that regime.

If you’d like to see both the original 1909 act, and the evolution of copyright law since, David Hayes has a wonderful site where you can read the law that was in effect at various times from then until now.

2. I’ve seen more discussion online of the Google Books setttlement, and the monopoly rights it gives to Google for providing digitized copies of “unclaimed” out of print copyrighted works (or what James Grimmelmann calls “zombie works“). It’s worth a reminder that it isn’t just Google that has a potential monopoly here; it’s also the Book Rights Registry itself. Even if other digitizers get the rights to do what Google does, and set their own retail prices for access, the Books Rights Registry can decide on the wholesale prices and other terms, and these will obviously play a big role in determining retail prices that any provider will offer.

If the settlement agreement is upheld (as I hope it will be; I’d much rather see 1 comprehensive collection of digitized out of print books than 0), it could form the model for a future compulsory licensing scheme for such books. Congress has enacted these before, when it’s seen a sufficient need and interest, and it’s set maximum prices for such licenses. For instance, the current maximum license fee for recording many songs is 9.1 cents per copy, reflecting a steady but controlled rise over the last few decades. According to this account of recent negotiations, some publishers reportedly wanted a dramatic boost to 15 cents per copy, while some digital music retailers wanted the maximum cut in half, to 4.5 cents per copy. Congress’ Copyright Royalty Board has managed to find a middle ground that balances the interests of creators and users, while providing a way to avoid the inefficiencies of song-by-song rights negotiation. Congress could do something similar with copyrighted but out of print books, if their constituents urged them to. (They would have to be careful to stay within international treaty constraints, but if the licensing regime for Google falls within those constraints, then I would think that a similar regime for all set up by Congress should as well.)

3. It might eventually be possible to put many of these books online in any case, if a recent decision by a federal court in Colorado is upheld and is applied broadly enough. In 1996, many foreign works were taken out of the public domain and put back into copyright as the result of a law passed as a result of the GATT treaties. (I have a discussion on copyright renewals that goes into some of the details.) Last week, a federal judge struck down this law; in the words of plaintiff attorney Larry Lessig, it “violated the First Amendment to the extent it restored copyright against parties who had relied on works in the public domain.” (Such parties are known as “reliance parties”).

I’m happy to hear about the decision, but I’m not yet ready to fire up the scanners. For one thing, another federal circuit has already ruled the opposite way on the same issue, as William Patry noted in a 2005 blog post about a case involving Luck’s Music Library. It remains to be seen how higher courts, or courts in other jurisdictions, will deal with these contradictory rulings. Also, despite some claims I’ve seen online, the decision doesn’t state outright that removing works from the public domain is necessarily unconstitutional. Rather, it says that doing so requires a higher standard of constitutional scrutiny that is not met for the case in question, in particular because Congress restricted the rights of reliance parties more stringently than international treaties actually required.

It seems possible to me that, even if the new decision survives appeal, it might simply result in an expansion of reliance party rights, and not a general right to put books online whose copyrights had been restored. (On the other hand, the definition of “reliance party” in the law in question seems to me to include libraries and others that have simply “acquire[d].. a copy” of a restored work.) I’m not a lawyer, though, and the copyright restoration laws at issue are notoriously complicated. I’d be interested in hearing more commentary from lawyers about the details and implications of this decision.

4. Finally, the recent publication in the New York Review of Books of a leaked, damning ICRC report on torture at Guantanamo raises some interesting copyright and ethics questions. As David Bigwood suspects, the report is copyrighted (effective the moment it was written down) and was published without the permission of the Red Cross, which has a general policy of opposing publication of its confidential reports. Mind you, I don’t think libraries that simply receive a print copy unsolicited (e.g. as part of of an ongoing subscription to the NYRB) should have any legal or moral qualms about keeping, preserving, and giving their patrons access to it. But what about electronic versions, which typically involve new copies made every time someone new reads them?

There are a few approaches one could take to this question. A fair use defense is certainly worth consideration, for instance. The document clearly reveals many things of great public interest in the US; it’s being published online for noncommercial purposes (they’re distributing it as a free, ad-less PDF); and there’s no market for the work to be affected, since the Red Cross does not market these reports, or put them out in the public at all. On the other hand, the document is not just quoted from, but reproduced in its entirety, and traditionally there’s been less fair use slack given for unpublished works than for published works. But there have been past cases (such as those involving Diebold voting machine memos) where reproducing documents in full in the public interest has been upheld as fair use. I suspect that the Red Cross will most likely not take the trouble to sue over this recent publication, but if they did, I can’t be positive about how the case would turn out.

One might argue that whether or not fair use applies, publication is justified as civil disobedience of copyright law in the service of a higher law against torture. This approach poses some problems of its own, though, particularly under theories where those who engage in civil disobedience gladly accept the legal consequences of their actions. Congress as of late has been steadily increasing the penalties for copyright infringement, and even the statutory and attorney’s fees, independent of any damages, are now large enough to give many people pause.

There’s also another interesting way to resolve the copyright question: A member of Congress could read the report into the Congressional Record. By law and custom, the statements of the legislature are given immunity from most forms of legal liability, so a copy of the report in the CR, including in the online verion, should not be a legal violation of copyright, as far as I’m aware. (Indeed, The Online Books Page already links to one other book that was read in its entirety into the CR.) The online version there would then be readable to anyone with an Internet connection.

Reading the report into the record wouldn’t just clear up a copyright issue. It would also put all of Congress officially on notice about the violations of American and international law by the government. And just as we have obligations under copyright treaties to deal with copyrighted works in various ways, we also have obligations under human rights treaties to outlaw prisoner mistreatment, and investigate and prosecute those who conducted, oversaw, and covered up torture and other human rights violations, no matter how high their rank or office.

In other words, we Americans now have a test before us: Do we take the essential rights of life and integrity of living, breathing human beings at least as seriously as we take the rights of intellectual property? If you think we should, you might want to urge your representatives in Congress and other governmental officials to take appropriate action.

Posted in copyright, libraries | 2 Comments

How to find complete multi-volume works in Google Books

Posted on March 30, 2009 by John Mark Ockerbloom

While Google’s agreement on copyrighted books has been the subject of much discussion lately, they’ve also been continuing to add public domain titles at a brisk pace. For instance, they announced in February that they now had 1.5 million public domain volumes formatted for mobile devices. And last week, they noted that they had completed their scans of hundreds of thousands of volumes of 19th century public domain books from Oxford’s Bodleian library.

If you look at the three example book links in their Oxford post, you’ll notice that each of them goes to a volume of a multi-volume edition. Works from the nineteenth century and before were often originally published in multiple volumes, such as the “three-decker” format common for Victorian novels. When such books are reprinted today, they’re usually printed as a single volume, but to read all of many Google titles, you’ll have to range over multiple volumes.

Unfortunately, as various readers have noted, it can be quite difficult to find readable copies of all of the volumes in a multi-volume edition. For various reasons, they often don’t all come up when you do a search for a particular title. This can make readers think there are no complete digital editions of a work they’re seeking, even when there are.

In working with people who have helped me fill requests for public domain books, I’ve compiled a series of techniques for finding complete multi-volume sets in Google Books. I’d be happy to hear additional tips from readers.

First, do a search for full-view volumes of the work you’re looking for. One good way to do this is to go to Google’s advanced book search page, select the “full view only” option, and enter author and title words in the appropriate blanks.
If you get a hit, check the start and the end of the scan, to verify which volumes are actually present. Sometimes you’ll find more than one volume in the scan, either because multiple volumes were bound together, or because Google combined volumes in its scan.
Go to the “about this book” page for the scan, and look in the lower regions to see if there is an “Other editions” section. This often includes links to other volumes, not just other editions. If there’s a “See more” at the bottom of such a section, click on it to see more volumes or editions. (Sometimes Google will have multiple editions as well as multiple volumes for the same work. It’s best when possible to compile volumes from the same edition. You can do this by matching publishers and dates between volumes, though keep in mind that some multivolume editions came out over the course of multiple years. Editions from different publishers, or from different times, may have inconsistent content, and might not divide into volumes at the same points.)
If the book is from the University of Michigan (as reported either in the “about this book” page or in the scanned front pages) check the Mirlyn catalog for the book. Sometimes this will turn up volumes scanned by Google that have been put in the Hathi Trust repository, or in Google Book Search itself, but that for some reason don’t show up in an ordinary Google books search. Some other Hathi Trust libraries also have links to digitizations of their content; see this page for details.
If this didn’t turn up all the volumes you’re looking for, repeat the process above for the other volumes in your initial hit list. Sometimes those will have “Other editions” links to additional volumes that didn’t appear with the earlier hits.
If you manage to complete a set this way, consider sharing your success with other readers. If you fill in my book suggestion form with the volumes you find, I can list a neatly consolidated edition of all the volumes on The Online Books Page, and help other people avoid going through all the trouble you just did. (Give the book’s title, URL for the first volume, and other information in the appropriate blanks, and then add URLs for subsequent volumes in the “Anything else we should know?” section of the form.)
Even if you only partially succeeded, if it’s a work you’re particularly interested in you can use my suggestion form to let me know what you’ve been able to find. If I can’t easily find the other volumes myself, I can at least list what was found on my works-in-progress page. With luck, someone coming along later will find or digitize the remaining volumes, and I can list the set.

Similar techniques can be used for compiling runs of historic serials, which are also present in Google, and can be of great interest to readers.

If you find these suggestions useful, I hope you’ll help me compile sets of your favorite public domain works, so we can take advantage of all this wonderful old material that Google and others are digitizing.

Posted in online books | 3 Comments

Gloriana St. Clair: A brief appreciation

Posted on March 24, 2009 by John Mark Ockerbloom

The organizer of today’s Ada Lovelace Day, a day to celebrate women in technology, says that women need female role models they can emulate. I’d add that men can use female role models as well. There are at least two obvious reasons. First of all, we get a wider range of inspiration when our role models aren’t limited to half the population. But also, people can be rather clueless about groups of people that they don’t normally see much, and that cluelessness can hold people back needlessly.

I don’t recall being consciously sexist when I entered grad school in computer science, but I wasn’t the most clueful person either. When I noticed that there were only 4 women in our entering class of 36 (a ratio unfortunately not too far off the one I saw in undergraduate computer science), one of the first things I blurted out to one of those women was something like “gee, there’s going to be a lot of romantic competition for you four,” thinking about them more as potential dates than as fellow computer science colleagues.

I was fortunate, however, to have multiple female role models to learn from in my time as a graduate student. Mary Shaw inspired me and many others to gain mastery over all kinds of challenges, from software engineering to bicycle trekking, through systematic and rigorous information gathering and analysis. Jeannette Wing contributed some of the key technical foundations to my own dissertation work (in her work with Barbara Liskov on type substitutability), and as a member of my dissertation committee repeatedly challenged me to write more clearly and logically, helping ensure that my ideas were sound and understandable. And I don’t have the space here to enumerate, or express full thanks for, what I’ve learned from Mary Mark since I met her.

I also found another role model who I’d like to talk about today: Gloriana St. Clair, dean of libraries at Carnegie Mellon. Unlike the other women I’ve mentioned above, she has no “technology” degree, but she’s played very important roles in bringing together technology and librarianship, to the benefit of both.

She has long emphasized the importance of digital technology to the future of libraries, featuring it prominently in strategic plans and library organization, and cultivating people with the skills and knowledge to design and improve the digital library. (And not just her own staff; she encouraged me, while still in the computer science department, to get out to library conferences to find out more what people were doing and thinking, and even gave me a ride to a CNI forum in Washington, DC.)

She’s also helped educate technologists about the important roles that libraries and librarianship play in managing information. While at Carnegie Mellon, I got involved in a computer science-led project to build a massive digital book collection, where much of the early thinking seemed to assume that the problem was largely a matter of committing enough technology and funding. I was very happy to see Gloriana get involved and show how sound librarianship could make that project, as well as other digital library initiatives I’d dabbled in previously, much more effective, usable, and preservable than a purely engineering-oriented project would have been.

She’s also been unafraid to take a leap into a new area or initiative when called for. Not content to settle for an MLS degree as a librarian, she went on to get a PhD in literature, and an MBA that she’s used both to help manage libraries and teach others about library management. And when a commercial publisher bought the library science journal she edited and raised its prices, she organized a mass exodus of editors to a new, lower-cost journal founded under the auspices of SPARC.

My own career jump, from a computer science department at Carnegie Mellon in Pittsburgh to a library at the University of Pennsylvania in Philadelphia, was made easier in a number of ways by her help and example. Indeed, after I got more acclimated to library culture, I gained a better appreciation of how well she accomplished one of the great functions of librarians: to build bridges and spread knowledge among a variety of different disciplines. I also grew to appreciate the importance of building such bridges for librarianship itself. Librarians can be another set of folks that many faculty and professionals don’t see much of, and that they can be correspondingly clueless about. If libraries and their users are not to be held back needlessly, we need to build better bridges between each other.

I’m not alone in my appreciation for Gloriana. Just a few weeks ago, the Association of College and Research Libraries named her Academic/Research Librarian of the Year. To their award, I’d like to add my own personal and professional thanks. And thanks as well to the many other women in technology who have, knowingly or not, given me knowledge, inspiration, encouragement, and some helpful clues. I hope I can make a suitable contribution in turn.

Posted in awards, libraries, people | Tagged AdaLovelaceDay09 | Comments Off

The Google Books settlement: A symposium, and a call for library action

Posted on March 18, 2009 by John Mark Ockerbloom

Last Friday I went to a fascinating symposium at the Columbia Law School: “The Google Books Settlement: What Will it Mean for the Long Term?” The symposium included presentations by US copyright register Marybeth Peters, and antitrust expert Randal Picker [slides], followed by panels featuring speakers from the legal, publishing, and library world, as well as a few folks representing Google, authors, international publishing groups, and photographers. The audience filled most of a large room, and included listeners from all these communities. I prepared for the day by going through Walt Crawford’s lengthy summary of the settlement and its commentators, which I recommend to anyone needing to get up to speed on the issues.

Cornell’s Peter Hirtle has posted a useful summary of the proceedings, and I’ve also seen some interesting commentary on aspects of the meeting from Peter Brantley and Adam Hodgkin. (The session was also live-Twittered.)

What I heard from audience members, both during and between sessions, was interesting as well. Like Peter Hirtle, I didn’t hear many people saying directly that the settlement should not go forward (though I did hear that from a few people who were there but not on the stage). I did hear repeated concerns over how the deal came about, and how it made substantial changes to the contours of copyright through class action litigation rather than legislation; and I also heard substantial concerns over the monopolies given to Google and the Book Rights Registry, and how those monopolies might stifle innovation and competition, drive up prices for online access to books, and infringe on rights of readers. (Despite its name, the Book Rights Registry is not simply a passive store of information; it also actively negotiates terms and prices of online book uses on behalf of rightsholders, and manages the distribution of revenues from these uses.)

There was also some concern about whether those not involved in the negotiations, such as small publishers, unknowing owners of out-of-print works, foreign authors and publishers, and illustrators and photographers, were being properly respected. Though with these latter concerns, there seemed to be more of a desire for a voice and piece of the action than a desire to stop the settlement from happening.

My basic take on the settlement didn’t change much. I’m guardedly hopeful about it, and would much rather see the settlement go forward than fail, because of the new and enhanced access it will give to millions of books for readers throughout the US. That’s not to say it doesn’t pose some problems and dangers, many of the relevant ones having been well-expressed by James Grimmelmann back in November. (Grimmelmann made additional remarks in one of the panels.) The symposium did give me some new appreciation of some of the pitfalls to watch for, and some more ideas on how libraries might respond to them. It also gave me a stronger impression that trying to modify the settlement itself is not likely to be a very fruitful response; my lay understanding is that the judge will likely just rule up or down on it with little or no modifications, with any major changes in terms requiring starting over negotiations with all the parties involved.

Still, there were some important constituencies that weren’t really addressed in the formal sessions. One of them, noted late in the day by Harvard’s Robert Darnton, was individual readers. (If you haven’t read Darnton’s pessimistic take on the settlement, or the response from Michigan’s Paul Courant, who was also there, you should.) But another, I’d argue, was libraries as active agents in shaping the future of reading in the electronic age. For libraries, I kept hearing questions like: Should they endorse or protest the settlement? Should they buy or forgo the institutional subscriptions that will be offered? Should they accept or pass on any of four digitization partnership deals that the settlement allows? These are important questions, but they’re basically all reactive: choosing to go with or against the flow, without considering how we might shape or rechannel it.

How might libraries reshape this flow? Going into full detail would require a number of additional blog posts at least, but among other things, we can work individually and collectively to

Safeguard readers’ privacy (as we do now to some extent via proxy servers).
Ensure the preservation of digital and print books (through efforts like Hathi Trust and shared print collection management).
Collectively press for reasonable pricing and terms of use, respecting traditional user rights under copyright law, for institutional subscriptions.
Encourage Congress to pass legislation that would give other digitizers the rights to offer similar services to what Google will; enable the more widespread use of orphan works; and apply appropriate oversight to the monopolistic aspects of the Book Rights Registry. In her talk, Marybeth Peters said that no one from Congress had asked the Copyright Office for comment on the proposed settlement. While her office doesn’t appear interested in petitioning Congress, libraries can and should do so.
Integrate book digitization (and not just Google’s) with our library’s operations and services, so our users can make the most of what’s available digitally. This can include integrating digital copy information and links in our catalogs, and pressing for links from those copies back to our library services (not just indirectly via OCLC’s “find in a library” link, but also direct links through technologies like OpenURL).
Promote and support the widespread, open sharing of information about digital books and their rights. This information will make it easier to expand and improve what’s available online, and bring it to the widest possible audience. We can’t rely on Google and the Registry to do all the work for us. Google’s own digitizations and metadata will not be perfect or incomplete. The Book Rights Registry will not be freely sharing all of its information, and in any case will be more interested in paying rights claimants than in, say, enhancing access to the public domain. If we want to solve this problem, we’ll need to continue to address some more general issues with sharing metadata more broadly.

We don’t have to wait for the settlement to go forward, or break down, to start thinking and experimenting in these areas. I hope that a lot of libraries will be considering the issues raised in the Columbia symposium, and deciding how best to act (and not just react) to best serve our readers in a world where millions of books are accessible online.

Posted in libraries, metadata, online books, open access, reading, sharing | 2 Comments

Neil Gaiman wins Newbery medal; more Newbery honorees go online

Posted on January 27, 2009 by John Mark Ockerbloom

I just got back from a whirlwind trip to Denver for ALA Midwinter. While I was there, they announced the winner of this year’s Newbery medal: Neil Gaiman‘s Graveyard Book. I’ve been hoping to get around to this book– but if it’s anywhere near as well-written as Gaiman’s other juvenile titles (like Coraline), the Newbery committee chose well. (You can hear an interview with Neil, and an excerpt from the book, on NPR’s website.) Congratulations to Neil, and to the other authors who won Newbery honors and ALA’s other awards for children’s books this year.

When I blogged about last year’s Newbery awards, I noted that most of the 1922 medalists and honorees were online, and that about a dozen later Newbery honorees were also out of copyright and could go online. Since then, my partner-in-bookery Mary has found and digitized many of those later books for a special Celebration of Women Writers exhibit, “Newbery Honor Books and Medal Winners by Women, 1922-1964“. You can also find these and other online prize-winners from my Prize Winning Books Online page.

(As I mentioned in a previous post, I went out to ALA Midwinter to give a couple of talks on the future of library catalog interfaces and data. I’ll have a post with pointers to slides and notes for those talks shortly.)

Posted in awards, copyright, online books | 1 Comment

Repository services, Part 2: Supporting deposit and access

Posted on January 15, 2009 by John Mark Ockerbloom

A couple of days ago, I talked about how we provided multiple repository services, and why an institutional scholarship repository needs to provide more than just a place to store stuff. In this post, I’ll describe some of the useful basic deposit and access services for institutional scholarly repositories (IRs).

The enumeration of services in this series is based in part on discussions I’ve had with our scholarly communications librarian, Shawn Martin, but any wrong-headed or garbled statements you find here can be laid at my own feet. (Whereupon I can pick them up, smooth them out, and find the right head for them.)

Ingestion:

One of the major challenges of running an institutional repository is filling it up with content: finding it, making sure it can go in, and making sure it goes in properly, in a manageable format, with informative metadata. Among other things, this calls for:

Efficient, flexible, user-friendly deposit workflows. Most of your authors will not bother with anything that looks like it’s wasting their time. And you shouldn’t waste your staff’s time either, or drive them mad, with needlessly tedious deposit procedures they have to do over and over and over and over again.
Conversion to standard formats on ingestion. Word processing documents, and other formats tied to a particular software product, have a way of becoming opaque and unreadable a few years after the vendor has moved on to a new version, a new product, or that dot-com registry in the sky. Our institutional repository, for instance, converts text documents to PDF on ingestion, which both helps preserve them and ensures wide readability. (PDF is an openly specified format, readable by programs from many sources, available on virtually all kinds of computers.)
Journal workflows. Much of what our scholars publish is destined for scholarly journals, which in turn are typically reviewed and edited by those scholars. Letting scholars review, compile, and publish those journals directly in the repository can save their time, and encourage rapid, open electronic access. (And you don’t have to go back and try to get a copy for your repository when it’s already in the repository.) Our BePress IR software has journal workflows and publication built into it. Alternatively, specialized journal editing and publishing systems, such as Open Journal Systems, also serve as repositories for their journal content.
Support for automated submission protocols such as SWORD. Manual repository deposit can be tedious and error-prone, especially if there are multiple repositories that want your content (such as a funder-mandated repository, your own institution repository, and perhaps an independent subject repository.) Manual deposit also often wastes people’s time re-entering information that’s already available online. If you can work with an automated protocol that can automatically put content into a repository, though, things can get much better: you can support multiple simultanous deposits, ingestion procedures designed especially for your own environment that use the automated protocol for deposit, and automated bulk transfer of content from one repository to another. SWORD is an automated repository deposit protocol that is starting to be supported by various repositories. (BePress does not yet support it, but we’re hoping they will soon).

From a practical standpoint, if you want a significant stream of content coming into your repository, you’ll probably need to have a content wrangler as well: someone who makes sure that authors’ content is going into the repository as intended. (In practice, they often end up doing the deposit themselves.)

Discovery:

You want it to be easy and enjoyable for readers to explore your site and find content of interest to them. Here are a few important ways to enable discovery:

Search of full text and/or metadata, either over the repository as a whole, or over selected portions of the repository. Full text search can be simple and turn up lots of useful content that might not be discovered through metadata search alone. More precise, metadata-based searches can also be important for specialized needs. Full text indexing is not always available (in some cases, you might only have page images), but it should be supported where possible.
Customization of discovery for different communities and collections. Different communities may have different ways of organizing and finding things. Some communities may want to organize primarily by topic, or author, or publication type, or date. Some may have specialized metadata that should be available for general and targeted searching and browsing. If you can customize how different collections can be explored, you can make them more usable to their audiences.
Aggregator feeds using RSS or Atom, so people can keep track of new items of interest in their favorite feed readers. This needs to exist at multiple levels of granularity. Many repositories give RSS feeds of everything added to the repository, but most people will be more interested in following what’s new from a particular department or author, or in a particular subject.
Search engine friendliness. Judging from our logs, most of the downloads of our repository papers occur not via our own searching and browsing interfaces, but via Google and other search engines that have crawled the repository. So you need to make sure your repository is set up to make it easy and inviting for search engines to crawl. Don’t hide things behind Flash or Javascript unless you don’t want them easily found. Make sure your pages have informative titles, and the site doesn’t require excessive link-clicking to get to content. You also need to make sure that your site can handle the traffic produced by search-engine indexers, some of which can be quite enthusiastic about frequently crawling content.
Metadata export via protocols like OAI-PMH. This is useful in a number of ways: It allows your content to be indexed by content aggregators; it lets you maintain and analyze your own repository’s inventory; and, in combination with automated deposit protocols like SWORD (and content aggregation languages like OAI-ORE), it may eventually make it much simpler to replicate and redeposit content in multiple repositories.

Access:

Persistent URIs for items. Content is easier to find and cite when it doesn’t move away from its original location. You would think it would be well known that cool URLs don’t change, but I still find a surprisingly large number of documents put in content management systems where I know the only visible URIs will not survive the next upgrade of the system, let alone a migration to a new platform. If possible, the persistent URI should be the only URI the user sees. If not, the persistent URI should at least be highly visible, so that users link to it, and not the more transient URI that your repository software might use for its own purposes.
An adequate range of access control options for particular collections and items. I’m all in favor of open access to content, but sometimes this is not possible or appropriate. Some scholarship includes information that needs to be kept under wraps, or in limited release, temporarily or permanently. We want to still be able to manage this content in the repository when appropriate.
Embargo management is an important part of access control. In some cases, users may want to keep their content limited-access for a set time period, so that they can get a patent, obey a publishing contract, or prepare for a coordinated announcement. Currently, because of BePress’ limited embargo support, we sit on embargoed content and have to remember to put it into the repository, or manually turn on open access, when the embargo ends. It’s much easier if depositors can just say “keep this limited access until this data, and then open it up,” and the repository service handles matters from there.

That may seem like a lot to think about, but we’re not done yet. In the next part, I’ll talk about services for managing content in the IR, including promoting it, letting depositors know about its impact, and preserving it appropriately.

Posted in discovery, formats, repositories | Comments Off

Everybody's Libraries

What you’re asked to give away

David Reed: Some extracts from his life and letters

Recent copyright news and comment (an extended mix)

How to find complete multi-volume works in Google Books

Gloriana St. Clair: A brief appreciation

Neil Gaiman wins Newbery medal; more Newbery honorees go online

Repository services, Part 2: Supporting deposit and access

Pages

Recent Posts

Recent Comments

Archives

Access for all

Copyrights and wrongs

General library-related news and comment

Interesting folks

Metadata and friends

Shiny tech

Tales from the repository

Writing and publishing