What are the marketers of EndNote afraid of?

If you write papers on a regular basis, you’ll find it worthwhile to keep track of sources you might cite. When I was in grad school, I manually edited a BibTeX file to keep track of the references for my dissertation and other papers. Nowadays there are easier to use, Web-aware tools that let you automatically import citations as you do your research, organize, edit, and annotate them, and then include appropriate ones in your paper’s bibliography. One of the first products of this type was EndNote, a Mac and Windows application marketed by Thomson Reuters. It’s still widely used, but it’s hardly alone in this field. Also popular among scholars is the web-based RefWorks, marketed by ProQuest. And a new free entry, the open-source browser-plugin-based Zotero from George Mason University, is gaining popularity.

I don’t currently use any of these tools, but have lately been thinking about adopting one. And just now one of them, Zotero, got an unusual bit of marketing that makes me think it’s worth a try: its makers have just been sued for $10 million by Thomson Reuters, marketers of EndNote.

The text of the complaint filed in Virginia court is interesting both for what it says and what it doesn’t say. Thomson isn’t claiming that Zotero violated their copyrights or stole their trade secrets; they’re claiming rather that GMU violated the license of the software. The violation? Reverse-engineering the proprietary file format used by EndNote for style files, and allowing Zotero uses to import EndNote-formatted style files into Zotero and export them into an open format.

Style files specify how bibliographic references should be formatted for different publishers. They allow you to automatically format the same citation in different ways depending on, say, whether you’re writing for Urban Studies or the Journal of the Royal Society of Medicine. The citation formats are specified by the publishers, not the bibliographic software developers; the style files should simply be an encoding of the publisher’s guidelines in a machine-actionable format.

Thomson claims that the ability to read these publisher guidelines in their proprietary format is a grave threat to EndNote. As they put it in their complaint:

GMU is willfully and intentionally destroying Thomson’s customer base for the EndNote Software […] by allowing and encouraging users of Zotero to freely convert the EndNote Software’s proprietary .ens style files into open source Zotero .csl style files and further distributing such converted files to others.

(I should note that the facts of this allegation are in question. GMU has yet to make an official statement on the claims of the suit, but Peter Murray claims that the Zotero code simply reads and interprets .ens files (which can be created either by Thomson or by EndNote users), and does not export them into other formats.)

Does Thomson have a legal case? That may depend on the language of the license and its enforceability. (On the one hand, Virginia is one of two states that passed UCITA, a law that gives software vendors wide leeway in dictating and enforcing software license terms. On the other hand, the license as quoted prohibits “reverse engineering [the] Software”, and the only reverse engineering I’ve seen has been on the file formats the EndNote software produces, not the software itself.) Folks interested in commentary from legal experts might find these posts of interest.

Thomson’s actions suggest serious weakness of its marketing case, at any rate. As a potential customer, I look for organizations that provide the best software and services for what I need to do, and that empower me to get my work done in the way I see fit. I’m willing to pay a fair price for this software and service, and to respect developer’s copyrights, but I in turn want value for money, and respect for the customer’s needs.

If EndNote provides service and value that’s superior to its competitors, that should be enough to retain and grow its customer base. It shouldn’t need to try to lock the data the software outputs in proprietary formats, to impose license terms on its customers to keep them opaque, or to sue its customers when they nonetheless figure out how to decode them. Whatever their internal motivation, Thomson’s actions appear from the outside to be driven more by fear of competition than recovery of ill-gotten gains.

For users of Zotero, the suit is at worst an inconvenience. Even if the ability to read .ens files is removed from Zotero, users can simply create and share their own style files. (There’s some sweat of the brow involved, but much less than $10 million worth.) Zotero’s style repository has already grown to include styles for over 1100 journals, according to the Zotero blog, and instructions are available for anyone who wants to create and contribute additional styles. And if George Mason is forced or intimidated into stopping development of Zotero, anyone else is welcome to pick up where George Mason left off, thanks to Zotero’s open source license. (But if you want to develop without risking this kind of suit from Thomson, you might want to first make sure you’re not an EndNote customer when you start your work.)

Thomson is hardly the only software company to make its customers deal with proprietary formats, constrictive software licenses, and threats of legal action for disobeying license terms. One of the attractions of using free (“as in freedom“) software is not having to work under these burdens. But companies that sell software and services don’t necessarily have to impose them either. Copyright and trademark laws already prohibit users from misappropriating software and commercial brands. Lots of products do quite well in the marketplace with open formats.

So if you’re considering buying software or services from someone, and they use their own proprietary formats, or say that using their product requires assent to a complicated, onerous license agreement, you might want to ask yourself: “What are they afraid of?” Perhaps you might want to ask them as well.

Why Banned Books Week matters

It’s Banned Books Week again, and Amnesty International, the American Booksellers Foundation for Free Expression and the American Library Association are among the groups noting the occasion. I’ve also updated the links on my ongoing exhibit Banned Books Online in preparation for this week, a time when the exhibit gets an especially large volume of visits.

Banned Books Week is really about two different, but related, things. The first of these, the focus of sites like Amnesty’s and the “Books Suppressed or Censored by Legal Authorities” section of my exhibit, deals with attempts to restrict who is allowed to speak about what matters to them. And in a lot of the world, the right to speak out is severely and violently repressed. The other day I added to my online books collection a number of titles from Human Rights Watch, which has many books, press releases, and other publications about grave threats to freedom of the press and freedom to protest in places like Burma, Chile, China, Cuba, Pakistan, Turkey, Venezuela, various Middle Eastern and African countries, former Soviet republics, and many other places around the world.

Americans enjoy a country with a much freer press than the countries above (and indeed, a freer press than we had in my grandparents’ day). We’re not perfect; our legal system does sometimes suppress legitimate expression, for a time at least, in the name of security, copyright, or “the children”. (And sometimes the threat of criminal violence can suppress books when the law does not.) It is worth remembering the important books that can be published thanks to the free press, and not to take them for granted.

But the banned books lists you’ll find in many libraries and bookstores (or in dubious chain emails) doesn’t focus much on the political samizdat, security exposés, or portrayals of Mohammed that are the objects of forcible suppression today. Instead, they’re often full of classics and popular titles sold widely in bookstores and online– or dominated by books written for young readers, or assigned for school reading. Some of the titles in these lists have been the targets of publication suppression at some point, but many (like those in the Harry Potter series) have not.

So is it wrong to call these books banned? Are lists like these just “shameless propaganda”, as some conservatives charge, or a hapless attempt to market classic literature to teens, as satirized in an Onion piece?

Not if you take readers seriously. An unread book, after all, has as little impact as an unpublished book. The bans that dominate the ALA lists are the obverse of publication bans: they’re attempts to restrict who is allowed to hear about what matters to them. True, their reach may be smaller than the government bans that can keep a book out of an entire state or country. And it may often be easier to circumvent these kinds of bans. (Particularly if you have a driver’s license, a credit card, and easy Internet access, things that adults often take for granted but that many kids lack.) But censorship at the reader’s end can be just as injurious as censorship at the writer’s end.

Librarians and teachers necessarily select certain books, and not others, for their collections and classes, and decide where they will best work. And it’s right for patrons of the schools and libraries to have some say in these selections (even if the professionals should generally be allowed to do their jobs). So simply counting “challenges” to a book isn’t very informative. But there’s a world of difference between saying “isn’t this more appropriate for the YA shelves than for the early readers section?” or “Would this title be a better fourth-grade book on this topic than the one currently being used?”, and insisting “None of our kids should be reading about this kind of thing!” when “this kind of thing” is already on the minds of those kids, or something that they should be thinking about. The “Unfit for Schools and Minors?” section of my Banned Books online exhibit describes some of the more dubious attempts to keep books out of the hands of young readers.

My oldest child is only 8, but he’s already coming up with new and challenging questions on an almost-daily basis. By the time kids reach double-digit ages (which is the young end of the audience for most of the controversial books) they have lots of questions about life, death, sexuality, unfairness, hatred, violence, drugs, and religion. They deserve the chance to explore answers to these questions in their reading and in their conversations.

In the process, they may encounter some ideas they’re not ready to deal with fully. (But encounters with text are often naturally self-regulated. When I was a young precocious reader, I’d usually skim over difficult parts or lose interest in a book that had them. More than once I’ve been surprised going back to a book as an adult and seeing what I’d missed as a kid.) Kids will also certainly encounter lots of dubious ideas and counsels. But mainstream culture is full of these as well, and I hope that I and other parents will teach our kids how to evaluate those wisely, whether or not they come from sources we usually think of as “controversial”.

Banned Books Week is thus about twin freedoms: the freedom to write about what matters to you, and the freedom to read about what matters to you. In this week’s observance, I hope we grow to better appreciate these freedoms and the power of books and ideas.

Repositories: Benefits, costs, contingencies (with an example)

(This is the third post in a slow-cooking series on repositories.)

In my last repository post, I listed a variety of repository types that we maintain at our institution, each with different content, operation, and policies. At the end of the post, I wrote:

Once we have a clear understanding of why we would benefit from a particular repository, and what it would manage, we can consider various options for who would run it, where, and how. (And of course, what its costs would be, and how we can realistically expect those costs to be covered….)

Without a clear sense of benefits and costs, you won’t have a sensible repository strategy. And, as Dorothea Salo reminds us today, without a sensible strategy you’re likely to burn through a lot of money, labor, and goodwill with little to show for it at the end. You have to go in knowing what you want, and being realistic about what you’re willing to invest to produce it. (For instance, if you’re planning to build a repository of your community’s own scholarship, and hope to get lots of free help from your community just by doing some marketing, you really need to read Dorothea’s post for a reality check.)

Even when your initial plan is sound, you have to be prepared for change, and the unexpected. Technology changes quickly. Online tools, communities, and scholarly societies also change. Methods of scholarship also change, often more slowly, but sometimes in significant ways. Even if you’ve done your homework, you may eventually find that the repository that seemed just fine a few years ago doesn’t really meet your needs like it used to. Maybe the software hasn’t been updated as you’d like it, and there’s a better system available now. Maybe you’re storing different kinds of things, or you’ve found a new application that your scholars really want to use that’s not compatible with your existing setup. Maybe the formats you’re managing have gone out of date. Maybe it becomes more cost effective to move to a big externally managed repository that your scholars are flocking to already– or away from one that they’re not finding useful. Maybe you even decide it no longer makes sense for you to maintain a particular repository.

You need to start thinking about strategies for change (and for exit) the moment you start planning a repository. Remember, repositories ultimately don’t exist for themselves, but for their content (and for the people using that content). And the kind of content that libraries often care about is likely to remain relevant much longer than any particular repository configuration. You want to ensure that the content remains useable for as long as your patrons care about it, even as it moves and migrates between systems (and possibly, between caretakers).

An example: Planning for data repository services

What does it mean, practically, to plan with benefits, costs, and contingencies in mind? Well, at Penn, we’re starting to consider repository services for data sets. We have a general idea of the benefits of archiving data sets, because we’ve heard from faculty in various departments who want to analyze data previously collected by research groups (their own or others), who are having a hard time managing their own data, or who are required by their journals or support agencies to publish or maintain their data sets. Before we commit to providing a new data repository service, though, we need a better sense of these benefits. How broad and deep is the desire for data services among our faculty? Where is it most acute, in terms of disciplines and services? What would be gained from having our institution provide our own data repository services, rather than just having our scholars use someone else’s services, or fend for themselves? What are the benefits of introducing services specifically for data, rather than just, say, saving data sets alongside other files in existing repositories? If we’re considering a significant investment, we need more than just anecdotal answers to these questions. A survey of faculty in various disciplines can give us a better idea of how they could benefit from and support data repository services.

We also have to consider costs. What options do we have for creating, acquiring, or contracting with a data repository or repository service? What do they cost to install and run, both in monetary and staffing terms? What are the costs of acquiring content (again in money and labor, where the labor might come from librarians, scholars, or students)? How about costs of maintaining, accessing, and migrating the content? How will these costs be covered? What about costs associated specifically with this kind of content? Are there confidentiality, security, intellectual property, or liability concerns we have to consider? To help answer these questions, we should evaluate various data repository systems in existence and in development. The faculty survey mentioned above could also help us answer some of the questions about labor and support.

Contingencies, by their nature, tend not to be fully foreseeable. But there are a few obvious things we can ask about and plan for. Will our data still be readable for decades to come? Can we migrate it to new formats, and if so, what would be involved? Can we make sure we have good enough metadata and annotation to know how to read, use, and migrate the data in the future? Do we have clear identifiers for our content that will survive a move to a new platform (and leave a workable forwarding address, if necessary)? What happens to our content if our repository loses funding, our machine room is sucked into a mini-black-hole, or we simply decide it’s not worth the trouble of keeping the repository going? What do we do if we’re told to withdraw or change the data we’re maintaining, by the person who deposited it, by someone else using or mentioned in the data, or by the government? We won’t necessarily come up with definitive answers to all these questions, but brainstorming and thinking through possible and likely scenarios should help us know what to expect and reduce the chance of our getting caught unawares by a costly problem.

Is it worth it?

That’s a lot to do, you might be thinking, before you even get started. Can’t we just put this cool system up and see what happens? Well, you could, if you and your community will be satisfied with something that might be here today and gone tomorrow, and that doesn’t have any support or reliability guarantees. But if you have scholars to serve, and you’d like them to take the time and trouble to entrust their content to your repository, they’re probably going to want some reassurance that the repository will have staying power, and give them benefits worth their time. Otherwise, they have plenty of other, more important things to do.

Running a large, successful, long-lasting repository takes a lot of work over its lifespan. Better to do some planning work up front than get stuck with a lot of costly and unnecessary work later on.

Getting fair use right: Maximize what you give, minimize what you take

This week’s Harry Potter court decision in New York is well worth reading for anyone who’s interested in knowing whether something is fair use or an illegal copyright infringement. The case involved an unauthorized lexicon of the Harry Potter books that was to have been published as a book last fall. (The book was adapted from a free web site edited by the lexicon’s writer.) J. K. Rowling and Warner Brothers, who created the Harry Potter books and movies, sued. The PDF of the resulting court decision has been posted at Groklaw, which also has a text version with commentary. There’s also some interesting discussion on Teleread, including a long comment from someone involved in a similar case with a different outcome.

Neither side got all they wanted in the Potter case. The judge, Robert Patterson, ruled that the lexicon violated Rowling’s copyrights, put the kibosh on the book, and fined the publisher. But he imposed the minimum fine prescribed by statute, and made it clear that, contrary to Rowling’s claims, other people were welcome to publish lexicons and other nonfiction books that comment on Harry Potter or other works of fiction, without having to get the copyright holder’s permission. They just had to be more sparing in their reuse of the work than the author of this lexicon was.

A key question in the case had to do with the purpose of the book at issue. Was it “transformative”; that is, was it trying to do something essentially new and original, using the older work as base material? Or was it simply a rearrangement of the work, or derivative variation on a theme? Fan fiction, for instance, is usually considered derivative rather than transformative work (since it, like the original it’s based on, is typically a story meant for entertainment, based on the same characters, settings, and plot structure.) As a derivative work, it gets minimal fair use protection. Likewise, in the same circuit that decided the Harry Potter case, an unauthorized Seinfeld trivia game was ruled not to be fair use, since it simply retold imagined events from the TV show in a new arrangement, without adding significant original content. (The Seinfeld case, known as Castle Rock vs. Carol Publishing, was repeatedly cited in the Harry Potter decision, as were a number of “unauthorized guidebook” cases.)

Patterson ruled that a lexicon was transformative use of Rowling’s novels, since a set of stories was transmuted into a reference guide that included original commentary on the story elements. Unfortunately, there wasn’t that much original commentary in the lexicon, and the amount of material quoted from Rowling was a good deal more than what was needed for that commentary, the judge ruled. (Note that I have not read the lexicon myself; for the purposes of this post, I’m relying on Patterson’s findings of fact.) Moreover, the lexicon also borrowed heavily from two companion volumes by Rowling, Quidditch Through the Ages and Fantastic Beasts and Where to Find Them, that already were very similar in form and intent to the lexicon.

A pure lexicon could simply have had short definitions (say, 1 or 2 sentences of original prose) for each character or concept in Rowling’s books, and then simply cited places in Rowling’s works where the character or concept appears or is further described. Instead, all too often the author apparently wanted to mention everything significant Rowling had to say about things in the lexicon, and borrowed extensively from Rowling’s text, either literally quoted or closely paraphrased. (Paraphrasing doesn’t avoid the problem of copyright infringement, if you’re still copying the author’s imagery or other original expression.)

Extensive reuse of Rowling’s expression might still have been okay if the author needed to comment specifically on that expression. (For instance, a critic might quote Rowling’s use of imagery for magical spells to compare it to, say Tolkien’s imagery for the same concept.) But too often, the copying of Rowling’s expression in the lexicon was not used to back up original commentary by the author, but was used instead of original comment. This happened often enough, Patterson decided, that he could not uphold a claim of fair use.

The lexicon’s publication as a book sold commercially, as opposed to its earlier form as a noncommercial website, was also a factor in the final ruling. But it wasn’t as decisive as one might imagine, and the judge devotes relatively little text in the decision to this factor. Even a commercial book on Harry Potter can be fair use; and a noncommercial website on Harry Potter (such as one that posts complete copies of Rowling’s books) can be infringing.

The take-away from this decision is that authors of commentaries and guides to other works of fiction can proceed in many cases without permission, provided that they’re making significant original contributions to readers’ understanding of the works they comment on, and that they reuse or quote only what is necessary to provide these contributions. In other words, if you’re writing one of these guides, the focus should be on what you’re giving to the reader, rather than on what you’re taking from the earlier writer.

Online opinion of Patterson’s decision has been mixed, with some applauding the final ruling and some arguing against it. I’m not a lawyer, and don’t presume to say whether he got the ruling exactly right. But I think his extensive discussion of the facts and precedents behind his decision provides a valuable guide for writers who want to maintain the proper and legal focus in their own fair use of others’ work.