In which I finally buy an ebook

In my last post, I discussed why I wanted to buy ebooks I could truly own, and my subsequent attempts to buy such a  copy of John Scalzi’s Redshirts from a readers’-rights-friendly retailer.  I initially had a hard time finding an ebook store that fulfilled three basic requirements:

  1. The store must sell a DRM-free copy of the book, in a convenient format.  That eliminated specialized ebook stores that didn’t carry the title at all.  Also, a number of major sites only had DRM-locked versions at first.
  2. The store must make the format and DRM-free status clear. Most mass-market ebooks are still locked down with DRM, and I don’t want to get stuck with that, either for this title or for other titles I might buy.  So the store had to make it clear what I was buying, either by a notation on the book’s catalog page, or by a general policy stating that books they offered were DRM-free.
  3. The store must not require me to agree to give up my rights as a reader under copyright law.  In particular, I would not consent to any terms of sale that significantly limited my rights of fair use or first sale.  Fair use allows me to make copies of copyrighted material under certain conditions, such as quoting and critiquing a small portion in my own work, or making a complete personal copy of a  TV show I’ve received or CD I’ve  bought, for more convenient consumption.  First sale lets me decide how to dispose of a book once I’ve bought it, including giving over the copy of something I’ve already lawfully acquired to someone else.  (First sale rights also let libraries lend out books without having to ask publishers first.)  Each of these rights has limits, and there are still disputes over how far these rights can be applied to digital content.  But I didn’t want to pre-emptively sign away rights that copyright law might give me.

I didn’t think it would be that hard to find a retailer to meet these requirements.  But here’s what I found when I went shopping:

Barnes and Noble: Since we owned a Nook, I first called up the store app on that device.  The ebook was simply marked as a “Nook Book”, with no clear differentation between a DRM-free and a DRM-locked copy.  (The current catalog page for the book now mentions in the overview that it’s being sold without DRM, though  not very prominently.)  I also recalled that to get access to the store in the first place, I had to click through a terms of service agreement.   Reviewing that on the web turned up a clause saying I couldn’t “copy, transfer, sublicense, assign, rent, lease, lend, resell or in any way transfer any rights to all or any portion of the Digital Content to any third party” except under certain explicit, very limited conditions.  In other words, give up first sale rights to anything I bought in the Nook store.  Rather than do that, I moved on to another retailer.

Amazon: There was no clear mention of DRM status on the book’s catalog page initially (even now, I don’t see it there until I click on “show more”).  Anazon uses its own Kindle (mobi) format for its books, so I’d need to convert it to a different format (possibly degrading the layout in the process) or get a Kindle reading program or device. The Kindle License Agreement and Terms of Use limits how I’m allowed to read books they sell, disallows third party transfers except by explicit permission, and in case I missed the point, explicitly states “Digital Content is licensed, not sold”.  No sale here, then.

Google:  Going over to Google Books, I find this book available through Google Play.  The catalog page doesn’t tell me what format it’s in, or whether it has DRM; it instead just asks me to sign in to buy it.  Google then tells me I have to agree to their terms, which again include no third party transfers, before it will give me access to whatever formats it may let me download.  If I read the book online within Google Play itself, its  privacy policy allows it to look over my shoulder to a limited extent while I’m reading.  Google pledges to use this power only for good, but personally I’d prefer to download and keep my reading details to myself in the first place, thanks.

Sony Reader store: Information on format and DRM status is not clear for its books.  Based on Sony’s past history with DRM, there’s no way I’m giving them the benefit of the doubt with the formats they might use.

Independent bookstores: I  also looked into whether I could buy an ebook through one of the independent bookstores I’ve liked shopping in.  Unfortunately, they don’t seem to offer much.  My local indie store doesn’t appear to sell ebooks at all, and Powell’s doesn’t offer seem to offer this title at present.  Independents in the IndeBound ebook program appear to just be referral agents for Google Books.

Diesel eBooks: The slogan “More freedom, more ebooks” seemed promising when I found this site.  Diesel offers both DRM-locked and DRM-free titles, and their catalog pages make it very clear which is which.   Unfortunately, they only offered a DRM-locked version of Redshirts for weeks after it was first released.  However, I recently went back to the site and found they’d switched to the DRM-free version.  Buying that ebook consisted of registering my name and email address, giving them my credit card information, and downloading an EPub file.  No click-through agreements were involved, and when I went over to look at the general terms of use for the site, they basically amounted to “don’t abuse the site, or infringe copyright”.  In short, I gave them money, and they gave me an ebook, and said “Enjoy!”, with no further fuss. That’s the kind of book shopping I like.

So there’s at least one reasonably comprehensive and reader-friendly ebookstore out there.  I’d be happy to hear about others as well.  And I look forward to buying and owning more books, in both print and electronic formats.

Repository services, Part 2: Supporting deposit and access

A couple of days ago, I talked about how we provided multiple repository services, and why an institutional scholarship repository needs to provide more than just a place to store stuff.  In this post, I’ll describe some of the useful basic deposit and access services for institutional scholarly repositories (IRs).

The enumeration of services in this series is based in part on discussions I’ve had with our scholarly communications librarian, Shawn Martin, but any wrong-headed or garbled statements you find here can be laid at my own feet.  (Whereupon I can pick them up, smooth them out, and find the right head for them.)


One of the major challenges of running an institutional repository is filling it up with content: finding it, making sure it can go in, and making sure it goes in properly, in a manageable format, with informative metadata.  Among other things, this calls for:

  • Efficient, flexible, user-friendly deposit workflows. Most of your authors will not bother with anything that looks like it’s wasting their time.  And you shouldn’t waste your staff’s time either, or drive them mad, with needlessly tedious deposit procedures they have to do over and over and over and over again.
  • Conversion to  standard formats on ingestion. Word processing documents, and other formats tied to a particular software product, have a way of becoming opaque and unreadable a few years after the vendor has moved on to a new version, a new product, or that dot-com registry in the sky.  Our institutional repository, for instance, converts text documents to PDF on ingestion, which both helps preserve them and ensures wide readability.  (PDF is an openly specified format, readable by programs from many sources, available on virtually all kinds of computers.)
  • Journal workflows. Much of what our scholars publish is destined for scholarly journals, which in turn are typically reviewed and edited by those scholars.  Letting scholars review, compile, and publish those journals directly in the repository can save their time, and encourage rapid, open electronic access.   (And you don’t have to go back and try to get a copy for your repository when it’s already in the repository.)  Our BePress IR software has journal workflows and publication built into it.  Alternatively, specialized journal editing and publishing systems, such as Open Journal Systems, also serve as repositories for their journal content.
  • Support for automated submission protocols such as SWORD. Manual repository deposit can be tedious and error-prone, especially if there are multiple repositories that want your content (such as a funder-mandated repository, your own institution repository, and perhaps an independent subject repository.)  Manual deposit also often wastes people’s time re-entering information that’s already available online.  If you can work with an automated protocol that can automatically put content into a repository, though, things can get much better: you can support multiple simultanous deposits, ingestion procedures designed especially for your own environment that use the automated protocol for deposit, and automated bulk transfer of content from one repository to another.  SWORD is an automated repository deposit protocol that is starting to be supported by various repositories. (BePress does not yet support it, but we’re hoping they will soon).

From a practical standpoint, if you want a significant stream of content coming into your repository, you’ll probably need to have a content wrangler as well: someone who makes sure that authors’ content is going into the repository as intended. (In practice, they often end up doing the deposit themselves.)


You want it to be easy and enjoyable for readers to explore your site and find content of interest to them.  Here are a few important ways to enable discovery:

  • Search of full text and/or metadata, either over the repository as a whole, or over selected portions of the repository.  Full text search can be simple and turn up lots of useful content that might not be discovered through metadata search alone.  More precise, metadata-based searches can also be important for specialized needs.   Full text indexing is not always available (in some cases, you might only have page images), but it should be supported where possible.
  • Customization of discovery for different communities and collections.  Different communities may have different ways of organizing and finding things.  Some communities may want to organize primarily by topic, or author, or publication type, or date.  Some may have specialized metadata that should be available for general and targeted searching and browsing.  If you can customize how different collections can be explored, you can make them more usable to their audiences.
  • Aggregator feeds using RSS or Atom, so people can keep track of new items of interest in their favorite feed readers.  This needs to exist at multiple levels of granularity.   Many repositories give RSS feeds of everything added to the repository, but most people will be more interested in following what’s new from a particular department or author, or in a particular subject.
  • Search engine friendliness. Judging from our logs, most of the downloads of our repository papers occur not via our own searching and browsing interfaces, but via Google and other search engines that have crawled the repository.  So you need to make sure your repository is set up to make it easy and inviting for search engines to crawl.  Don’t hide things behind Flash or Javascript unless you don’t want them easily found.  Make sure your pages have informative titles, and the site doesn’t require excessive link-clicking to get to content.  You also need to make sure that your site can handle the traffic produced by search-engine indexers, some of which can be quite enthusiastic about frequently crawling content.
  • Metadata export via protocols like OAI-PMH.  This is useful in a number of ways:  It allows your content to be indexed by content aggregators; it lets you maintain and analyze your own repository’s inventory; and, in combination with automated deposit protocols like SWORD (and content aggregation languages like OAI-ORE), it may eventually make it much simpler to replicate and redeposit content in multiple repositories.


  • Persistent URIs for items. Content is easier to find and cite when it doesn’t move away from its original location.  You would think it would be well known that cool URLs don’t change, but I still find a surprisingly large number of documents put in content management systems where I know the only visible URIs will not survive the next upgrade of the system, let alone a migration to a new platform.  If possible, the persistent URI should be the only URI the user sees.  If not, the persistent URI should at least be highly visible, so that users link to it, and not the more transient URI that your repository software might use for its own purposes.
  • An adequate range of access control options for particular collections and items.  I’m all in favor of open access to content, but sometimes this is not possible or appropriate.  Some scholarship includes information that needs to be kept under wraps, or in limited release, temporarily or permanently.  We want to still be able to manage this content in the repository when appropriate.
  • Embargo management is an important part of  access control.   In some cases, users may want to keep their content limited-access for a set time period, so that they can get a patent, obey a publishing contract, or prepare for a coordinated announcement.  Currently, because of BePress’ limited embargo support, we sit on embargoed content and have to remember to put it into the repository, or manually turn on open access, when the embargo ends.  It’s much easier if depositors can just say “keep this limited access until this data, and then open it up,” and the repository service handles matters from there.

That may seem like a lot to think about, but we’re not done yet.  In the next part, I’ll talk about services for managing content in the IR, including promoting it, letting depositors know about its impact, and preserving it appropriately.

What are the marketers of EndNote afraid of?

If you write papers on a regular basis, you’ll find it worthwhile to keep track of sources you might cite. When I was in grad school, I manually edited a BibTeX file to keep track of the references for my dissertation and other papers. Nowadays there are easier to use, Web-aware tools that let you automatically import citations as you do your research, organize, edit, and annotate them, and then include appropriate ones in your paper’s bibliography. One of the first products of this type was EndNote, a Mac and Windows application marketed by Thomson Reuters. It’s still widely used, but it’s hardly alone in this field. Also popular among scholars is the web-based RefWorks, marketed by ProQuest. And a new free entry, the open-source browser-plugin-based Zotero from George Mason University, is gaining popularity.

I don’t currently use any of these tools, but have lately been thinking about adopting one. And just now one of them, Zotero, got an unusual bit of marketing that makes me think it’s worth a try: its makers have just been sued for $10 million by Thomson Reuters, marketers of EndNote.

The text of the complaint filed in Virginia court is interesting both for what it says and what it doesn’t say. Thomson isn’t claiming that Zotero violated their copyrights or stole their trade secrets; they’re claiming rather that GMU violated the license of the software. The violation? Reverse-engineering the proprietary file format used by EndNote for style files, and allowing Zotero uses to import EndNote-formatted style files into Zotero and export them into an open format.

Style files specify how bibliographic references should be formatted for different publishers. They allow you to automatically format the same citation in different ways depending on, say, whether you’re writing for Urban Studies or the Journal of the Royal Society of Medicine. The citation formats are specified by the publishers, not the bibliographic software developers; the style files should simply be an encoding of the publisher’s guidelines in a machine-actionable format.

Thomson claims that the ability to read these publisher guidelines in their proprietary format is a grave threat to EndNote. As they put it in their complaint:

GMU is willfully and intentionally destroying Thomson’s customer base for the EndNote Software […] by allowing and encouraging users of Zotero to freely convert the EndNote Software’s proprietary .ens style files into open source Zotero .csl style files and further distributing such converted files to others.

(I should note that the facts of this allegation are in question. GMU has yet to make an official statement on the claims of the suit, but Peter Murray claims that the Zotero code simply reads and interprets .ens files (which can be created either by Thomson or by EndNote users), and does not export them into other formats.)

Does Thomson have a legal case? That may depend on the language of the license and its enforceability. (On the one hand, Virginia is one of two states that passed UCITA, a law that gives software vendors wide leeway in dictating and enforcing software license terms. On the other hand, the license as quoted prohibits “reverse engineering [the] Software”, and the only reverse engineering I’ve seen has been on the file formats the EndNote software produces, not the software itself.) Folks interested in commentary from legal experts might find these posts of interest.

Thomson’s actions suggest serious weakness of its marketing case, at any rate. As a potential customer, I look for organizations that provide the best software and services for what I need to do, and that empower me to get my work done in the way I see fit. I’m willing to pay a fair price for this software and service, and to respect developer’s copyrights, but I in turn want value for money, and respect for the customer’s needs.

If EndNote provides service and value that’s superior to its competitors, that should be enough to retain and grow its customer base. It shouldn’t need to try to lock the data the software outputs in proprietary formats, to impose license terms on its customers to keep them opaque, or to sue its customers when they nonetheless figure out how to decode them. Whatever their internal motivation, Thomson’s actions appear from the outside to be driven more by fear of competition than recovery of ill-gotten gains.

For users of Zotero, the suit is at worst an inconvenience. Even if the ability to read .ens files is removed from Zotero, users can simply create and share their own style files. (There’s some sweat of the brow involved, but much less than $10 million worth.) Zotero’s style repository has already grown to include styles for over 1100 journals, according to the Zotero blog, and instructions are available for anyone who wants to create and contribute additional styles. And if George Mason is forced or intimidated into stopping development of Zotero, anyone else is welcome to pick up where George Mason left off, thanks to Zotero’s open source license. (But if you want to develop without risking this kind of suit from Thomson, you might want to first make sure you’re not an EndNote customer when you start your work.)

Thomson is hardly the only software company to make its customers deal with proprietary formats, constrictive software licenses, and threats of legal action for disobeying license terms. One of the attractions of using free (“as in freedom“) software is not having to work under these burdens. But companies that sell software and services don’t necessarily have to impose them either. Copyright and trademark laws already prohibit users from misappropriating software and commercial brands. Lots of products do quite well in the marketplace with open formats.

So if you’re considering buying software or services from someone, and they use their own proprietary formats, or say that using their product requires assent to a complicated, onerous license agreement, you might want to ask yourself: “What are they afraid of?” Perhaps you might want to ask them as well.