Libraries: Be careful what your web sites “Like”

Imagine you’re working in a library, and someone with a suit and a buzz cut comes up to you, gestures towards a patron who’s leaving the building, and says “That guy you were just helping out; can you tell me what books he was looking at?”

Many librarians would react to this request with alarm.  The code of ethics adopted by the American Library Association states “We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”  Librarians will typically refuse to give such information without a carefully-verified search warrant, and many are also campaigning against the particularly intrusive search demands authorized by the PATRIOT Act.

Yet it’s possible that the library in this scenario is routinely giving out that kind of information, without the knowledge or consent of librarians or patrons, via its web site.  These days, many sites, including those of libraries, invoke a variety of third-party services to construct their web pages.  For instance, some library sites use Google services to analyze site usage trends or to display book covers.  Those third party services often know what web page has been visited when they’re invoked, either through an identifier in the HTML or Javascript code used to invoke the service, or simply through the Referer information passed from the user’s web browser.

Patron privacy is particularly at risk when the third party also knows the identity of users visiting sensitive pages (like pages disclosing books they’re interested in).  The social networking sites that many library patrons use, for instance, can often track where their users go on the Web, even after they’ve left the social sites themselves.

For instance, if you go to the website of the Farmington Public Library (a library I used a lot when growing up in Connecticut), and search through their catalog, you may see Facebook “Like” buttons on the results.  On this page, for example, you may see that four people (possibly more by the time you read this) have told Facebook they Liked the book Indistinguishable from Magic.  Now, you can probably easily guess that if you click the Like button, and have a Facebook account, then Facebook will know that you liked the book too.  No big surprise there.

But what you can’t easily tell is that  Facebook is informed you’ve looked at this book page, even if you don’t click on anything.  If you’re a Facebook user and haven’t logged out– and for a while recently, even if you have logged out– Facebook knows your identity.  And if Facebook knows who you are and what you’re looking at, it has the power to pass along this information. It might do it through a “frictionless sharing” app you decided to try.  Or it might quietly provide it to organizations that it can sell your data to as permitted in its frequently changing data use policies.  (Which for a while even included tracking non-members.)

For some users, it might not be a big deal if it’s generally known what books they’re looking at online. But for others it definitely is a big deal, at least some of the time.  The problem with third-party inclusions like the Facebook “Like” button in catalogs is that library patrons may be denied the opportunity to give informed consent to sharing their browsing with others.  Libraries committed to protecting their patron’s privacy as part of their freedom to read need to carefully consider what third party services they invite to “tag along” when patrons browse their sites.

This isn’t just a Facebook issue.  Similar issues come up with other third-party services that also track individuals, as for instance Google does.  Libraries also have good reasons to partner with third party sites for various purposes.  For some of these purposes, like ebook provision, privacy concerns are fairly well understood and carefully considered by most libraries.  But librarians might not keep as close track of the development of their own web sites, where privacy leaks can spring up unnoticed.

So if any of your web sites (especially your online catalogs or other discovery and delivery services) use third party web services, consider carefully where and how they’re being invoked.  For each third party, you should ask what information they can get from users browsing your web site, what other information they have from other sources (like the “real names” and exact birthdates that sites like Facebook and Google+ demand), and what real guarantees, if any, they make about the privacy of the information.  If you can’t easily get satisfactory answers to these questions, then reconsider your use of these services.

Getting bugs out of our systems

Very soon after we start learning to program, we start learning to deal with bugs.   Folks who have programmed for a while might forget that effective bug handling, like effective programming, is a skill that doesn’t come entirely naturally.

Many of us instinctively avoid criticism, ignore it, minimize it, or even argue against our critics.  But our programs will almost invariably include bugs, and to handle them, we have to go against the grain of our instincts.  If we’re smart, we make it as easy as possible to report bugs to us, so we minimize their impact.  We respect and listen carefully to what our clients tell us, to understand the problems they’re encountering with our product.  After we fix the bugs, we often  review our code and our practices to avoid similar problems in the future.

It helps a lot if we can keep our egos out of the bug-fixing process.  I know that my work will sometimes have bugs, and that a bug report should not be taken as a personal attack.  Rather, I try to make it an opportunity to improve my products and my future work.

Bugs exist at various levels.  Bugs that cause crashes are often the easiest to deal with: it’s clear that something is going wrong, and it usually isn’t hard to figure out what to do about it.  But less obvious bugs can be worse.  One product our library uses, for example, implemented boolean searches incorrectly, omitting important results. This kind of bug can mislead lots of people who never notice the problem.   (And it can also take longer to address.  I had to send multiple emails and examples to the developers of this product before they admitted that their implementation was buggy.)

Bugs at the overall system level can be the worst.  The reservation system with interminable holds, the customer support service that never returns our calls, the open source effort that repels key constituencies it should be attracting: all of these are buggy systems, and they can drive people away just as surely as a crashing program.  As Michael Bolton puts it, “a bug is something that bugs somebody who matters.”  System-level bugs can be challenging to fix, but they can be the most essential to repair.

I hope none of these principles seems new or controversial.  But I’ve recently seen a few bug reports concerning the Ruby on Rails community that drew many responses that ignored them.  The reports concerned buggy systems, not buggy code.  In particular, they noted a professional developer conference that attracted very few women, and an accepted presentation at that conference that included blatantly unprofessional themes, themes that one could easily predict would put off many of the people who could benefit from the talk.  (They would be particularly problematic if you were one of the few women there, but I found it distinctly off-putting as well.)

The comments on those two posts include plenty of examples of denial, minimization, rationalization, and attacking the reporters of the bugs.  (Indeed, some read as if they were cribbed right from this checklist of cliched defenses of in-group privilege.)

Assuming that the respondents are active members of the Ruby community, the responses suggest that there are still serious social bugs in that community.  I recently came back from another open source-focused conference (one that had a significantly higher proportion of women, though still far from 50%), where there were some good things said about using Ruby on Rails for library application development.  I like open source projects with good technical bases, but if I’m going to rely on a technology, I want its developer community to be healthy.  Healthy communities generally provide more reliable, long lasting development support, and can be much easier and more pleasant to work with.

It can be positively uncomfortable for many of us to confront social problems, particularly ones in our own communities that we might be partly responsible for.  (And Ruby is not the only community that’s had this kind of problem.)  Perhaps if we get used to thinking of these problems as bugs, welcoming and paying close attention to reports, and getting our egos out of the way, we’ll find it easier to fix them.

Gender inequities are bugs in our systems.  Bugs happen.  But they can be fixed.  As my library considers involvement in various community source development projects, I want to find out more about what these communities are doing, going forward, to fix and prevent these sorts of bugs.

What you’re asked to give away

If you’ve published an article in an Elsevier journal, you might have missed an interesting aspect of the contract you signed with them to get published.  It goes something like this:

I grant Elsevier the exclusive right to select and reproduce any portions they choose from my research article to market drugs, medical devices, or any other commercial product, regardless of whether I approve of the product or the marketing.

What, you don’t remember agreeing to that?  Actually, the words above are mine.  But while it isn’t explicitly stated in author agreements, Elsevier authors usually grant that right implicitly. Elsevier’s typical author agreement requires you to sign over your entire copyright to them. Why ask for the whole copyright, instead of just, say, first serial rights,  and whatever else suffices for them to include the article in their journal and article databases?  Elsevier explains:

Elsevier wants to ensure that it has the exclusive distribution rights for all media. Copyright transfer eliminates any ambiguity or uncertainty about Elsevier’s ability to distribute, sub-license and protect the article from unauthorized copying or alteration.

That “unauthorized” would be “unauthorized by them”.   Not “unauthorized by you”.  Once you sign, you’ve given up the right to authorize copying or alteration, or any other rights in the copyright, except for rights they offer back to you.  For instance, you can’t “sub-license” your article for anything Elsevier deems “commercial purposes”.  But they can, and do.

And sometimes those commercial purposes have had questionable ethics.  The Scientist reported about a week ago that “Merck published [a] fake journal” with Elsevier.  (Free registration may be required to read the article.)  As they report:

Merck paid an undisclosed sum to Elsevier to produce several volumes of a publication that had the look of a peer-reviewed medical journal, but contained only reprinted or summarized articles–most of which presented data favorable to Merck products–that appeared to act solely as marketing tools with no disclosure of company sponsorship.

The publication, Australasian Journal of Bone and Joint Medicine, was published by an Elsevier subsidiary called Excerpta Medica.  As that subsidiary explains on their web site, “We partner with our clients in the pharmaceutical and biotech communities to educate the global health care community and enable them to make well-informed decisions regarding treatment options.”  In other words, they’re a PR agency for drug companies and other companies selling medical products.  Part of what they do is publish various periodicals designed to promote their clients.

Now, a number of companies publish sponsored magazines, and usually such publications clearly disclose their sponsorship, or are otherwise easily recognizable as “throwaway” commercial journals.  But this publication was designed to look more like a peer-reviewed scientific journal.   The Scientist reports this court testimony from a medical journal editor:

An “average reader” (presumably a doctor) could easily mistake the publication for a “genuine” peer reviewed medical journal, [George Jelinek] said in his testimony. “Only close inspection of the journals, along with knowledge of medical journals and publishing conventions, enabled me to determine that the Journal was not, in fact, a peer reviewed medical journal, but instead a marketing publication for MSD[A].”

Indeed, one of the publication’s “honorary editors” admitted to the Scientist that it included marketing material, but that “[i]t also had papers that were excerpted from other peer-reviewed journals. I don’t think it’s fair to say it was totally a marketing journal.”  But that was what Merck paid Elsevier for, and the excerpts from real Elsevier-acquired research articles helped the publication as a whole look like disinterested scholarship instead of advertising.

Elsevier did show some embarrassment from these revelations, particularly after widespread online outrage.  A statement posted yesterday by an Elsevier spokesman admitted the journal did not have “the appropriate disclosures”, and added

I have affirmed our business practices as they relate to what defines a journal and the proper use of disclosure language with our employees to ensure this does not happen again.

That’s certainly a step up from a previous statement quoted in the Scientist article, which, after also admitting the disclosure problems in the “journal”, simply said “Elsevier’s current disclosure policies meet the rigor and requirements of the current publishing environment,” and made no promises about what they would do in the future.

But the new statement still  leaves unanswered the question of why there are still  4 “peer reviewed journals” published under the imprint of a PR agency whose stated mission is to “support our client’s marketing objectives with strategic communications solutions in [areas that include] Medical Publishing.”  And legally, Excerpta Medica still has the right to cherry-pick from any article signed over to Elsevier in any of their marketing publications.  Or, as they announce to potential clients, “we can leverage the resources of the world’s largest medical and scientific publisher.”  Even with what Elsevier considers “proper use of disclosure language”, some authors might not want their writing used in this way.

Am I being unfair to Elsevier here?  They’re not the only academic publisher that asks its authors to sign over their copyrights.  And some of the more liberal open publication licenses, which I’ve been known to recommend, are broad enough that they too give marketers rights to reuse one’s work in their promotions.

On the first of those points, I recommend in general that authors avoid signing over their rights entirely (as I’ve managed previously), no matter who the publisher is.  But last I checked, most other academic publishers don’t also own a PR firm for commercial product marketing.  (And if any do,  they should disclose this possible use in their interactions with authors. I find no explicit disclosure of this in either Elsevier’s model agreement or on the current version of Elsevier’s author rights page.)

On the second point, if you grant an open publication license, you generally know what you’re getting into.  And you can still defend against misuse of your work in ways that you can’t do if you just sign over your copyright to a publisher.   Some open access licenses, for instance, include an attribution condition that requires any reuse of the article to credit and point to the original source, and derivation conditions that either prohibit changes or require changes to be disclosed.  (And some licenses simply prohibit commercial use altogether except by permission.)  Whatever license you choose, if a company does quote your work out of context in its marketing, and you’ve kept your own rights to reprint the article, you can publish a rebuttal as widely as you like, showing the omitted context that counters a company’s claims.  These conditions and rights can provide potent deterrents against misuse of your articles.

Often the debates over scholarly author rights and open access focus on who gets to read and use scholarly articles, and what gets paid to whom.  This episode highlights another important part of the debate: who gets the right to guard the integrity of one’s scholarship.  In the light of recent revelations, authors might want to think carefully about whether to sign that right away, and to whom.

[Updates, 9 May 2009: Some spelling corrected, and a note added that disclosure is not the only potential concern of authors whose works are used for marketing purposes.]

What are the marketers of EndNote afraid of?

If you write papers on a regular basis, you’ll find it worthwhile to keep track of sources you might cite. When I was in grad school, I manually edited a BibTeX file to keep track of the references for my dissertation and other papers. Nowadays there are easier to use, Web-aware tools that let you automatically import citations as you do your research, organize, edit, and annotate them, and then include appropriate ones in your paper’s bibliography. One of the first products of this type was EndNote, a Mac and Windows application marketed by Thomson Reuters. It’s still widely used, but it’s hardly alone in this field. Also popular among scholars is the web-based RefWorks, marketed by ProQuest. And a new free entry, the open-source browser-plugin-based Zotero from George Mason University, is gaining popularity.

I don’t currently use any of these tools, but have lately been thinking about adopting one. And just now one of them, Zotero, got an unusual bit of marketing that makes me think it’s worth a try: its makers have just been sued for $10 million by Thomson Reuters, marketers of EndNote.

The text of the complaint filed in Virginia court is interesting both for what it says and what it doesn’t say. Thomson isn’t claiming that Zotero violated their copyrights or stole their trade secrets; they’re claiming rather that GMU violated the license of the software. The violation? Reverse-engineering the proprietary file format used by EndNote for style files, and allowing Zotero uses to import EndNote-formatted style files into Zotero and export them into an open format.

Style files specify how bibliographic references should be formatted for different publishers. They allow you to automatically format the same citation in different ways depending on, say, whether you’re writing for Urban Studies or the Journal of the Royal Society of Medicine. The citation formats are specified by the publishers, not the bibliographic software developers; the style files should simply be an encoding of the publisher’s guidelines in a machine-actionable format.

Thomson claims that the ability to read these publisher guidelines in their proprietary format is a grave threat to EndNote. As they put it in their complaint:

GMU is willfully and intentionally destroying Thomson’s customer base for the EndNote Software […] by allowing and encouraging users of Zotero to freely convert the EndNote Software’s proprietary .ens style files into open source Zotero .csl style files and further distributing such converted files to others.

(I should note that the facts of this allegation are in question. GMU has yet to make an official statement on the claims of the suit, but Peter Murray claims that the Zotero code simply reads and interprets .ens files (which can be created either by Thomson or by EndNote users), and does not export them into other formats.)

Does Thomson have a legal case? That may depend on the language of the license and its enforceability. (On the one hand, Virginia is one of two states that passed UCITA, a law that gives software vendors wide leeway in dictating and enforcing software license terms. On the other hand, the license as quoted prohibits “reverse engineering [the] Software”, and the only reverse engineering I’ve seen has been on the file formats the EndNote software produces, not the software itself.) Folks interested in commentary from legal experts might find these posts of interest.

Thomson’s actions suggest serious weakness of its marketing case, at any rate. As a potential customer, I look for organizations that provide the best software and services for what I need to do, and that empower me to get my work done in the way I see fit. I’m willing to pay a fair price for this software and service, and to respect developer’s copyrights, but I in turn want value for money, and respect for the customer’s needs.

If EndNote provides service and value that’s superior to its competitors, that should be enough to retain and grow its customer base. It shouldn’t need to try to lock the data the software outputs in proprietary formats, to impose license terms on its customers to keep them opaque, or to sue its customers when they nonetheless figure out how to decode them. Whatever their internal motivation, Thomson’s actions appear from the outside to be driven more by fear of competition than recovery of ill-gotten gains.

For users of Zotero, the suit is at worst an inconvenience. Even if the ability to read .ens files is removed from Zotero, users can simply create and share their own style files. (There’s some sweat of the brow involved, but much less than $10 million worth.) Zotero’s style repository has already grown to include styles for over 1100 journals, according to the Zotero blog, and instructions are available for anyone who wants to create and contribute additional styles. And if George Mason is forced or intimidated into stopping development of Zotero, anyone else is welcome to pick up where George Mason left off, thanks to Zotero’s open source license. (But if you want to develop without risking this kind of suit from Thomson, you might want to first make sure you’re not an EndNote customer when you start your work.)

Thomson is hardly the only software company to make its customers deal with proprietary formats, constrictive software licenses, and threats of legal action for disobeying license terms. One of the attractions of using free (“as in freedom“) software is not having to work under these burdens. But companies that sell software and services don’t necessarily have to impose them either. Copyright and trademark laws already prohibit users from misappropriating software and commercial brands. Lots of products do quite well in the marketplace with open formats.

So if you’re considering buying software or services from someone, and they use their own proprietary formats, or say that using their product requires assent to a complicated, onerous license agreement, you might want to ask yourself: “What are they afraid of?” Perhaps you might want to ask them as well.

Why Banned Books Week matters

It’s Banned Books Week again, and Amnesty International, the American Booksellers Foundation for Free Expression and the American Library Association are among the groups noting the occasion. I’ve also updated the links on my ongoing exhibit Banned Books Online in preparation for this week, a time when the exhibit gets an especially large volume of visits.

Banned Books Week is really about two different, but related, things. The first of these, the focus of sites like Amnesty’s and the “Books Suppressed or Censored by Legal Authorities” section of my exhibit, deals with attempts to restrict who is allowed to speak about what matters to them. And in a lot of the world, the right to speak out is severely and violently repressed. The other day I added to my online books collection a number of titles from Human Rights Watch, which has many books, press releases, and other publications about grave threats to freedom of the press and freedom to protest in places like Burma, Chile, China, Cuba, Pakistan, Turkey, Venezuela, various Middle Eastern and African countries, former Soviet republics, and many other places around the world.

Americans enjoy a country with a much freer press than the countries above (and indeed, a freer press than we had in my grandparents’ day). We’re not perfect; our legal system does sometimes suppress legitimate expression, for a time at least, in the name of security, copyright, or “the children”. (And sometimes the threat of criminal violence can suppress books when the law does not.) It is worth remembering the important books that can be published thanks to the free press, and not to take them for granted.

But the banned books lists you’ll find in many libraries and bookstores (or in dubious chain emails) doesn’t focus much on the political samizdat, security exposés, or portrayals of Mohammed that are the objects of forcible suppression today. Instead, they’re often full of classics and popular titles sold widely in bookstores and online– or dominated by books written for young readers, or assigned for school reading. Some of the titles in these lists have been the targets of publication suppression at some point, but many (like those in the Harry Potter series) have not.

So is it wrong to call these books banned? Are lists like these just “shameless propaganda”, as some conservatives charge, or a hapless attempt to market classic literature to teens, as satirized in an Onion piece?

Not if you take readers seriously. An unread book, after all, has as little impact as an unpublished book. The bans that dominate the ALA lists are the obverse of publication bans: they’re attempts to restrict who is allowed to hear about what matters to them. True, their reach may be smaller than the government bans that can keep a book out of an entire state or country. And it may often be easier to circumvent these kinds of bans. (Particularly if you have a driver’s license, a credit card, and easy Internet access, things that adults often take for granted but that many kids lack.) But censorship at the reader’s end can be just as injurious as censorship at the writer’s end.

Librarians and teachers necessarily select certain books, and not others, for their collections and classes, and decide where they will best work. And it’s right for patrons of the schools and libraries to have some say in these selections (even if the professionals should generally be allowed to do their jobs). So simply counting “challenges” to a book isn’t very informative. But there’s a world of difference between saying “isn’t this more appropriate for the YA shelves than for the early readers section?” or “Would this title be a better fourth-grade book on this topic than the one currently being used?”, and insisting “None of our kids should be reading about this kind of thing!” when “this kind of thing” is already on the minds of those kids, or something that they should be thinking about. The “Unfit for Schools and Minors?” section of my Banned Books online exhibit describes some of the more dubious attempts to keep books out of the hands of young readers.

My oldest child is only 8, but he’s already coming up with new and challenging questions on an almost-daily basis. By the time kids reach double-digit ages (which is the young end of the audience for most of the controversial books) they have lots of questions about life, death, sexuality, unfairness, hatred, violence, drugs, and religion. They deserve the chance to explore answers to these questions in their reading and in their conversations.

In the process, they may encounter some ideas they’re not ready to deal with fully. (But encounters with text are often naturally self-regulated. When I was a young precocious reader, I’d usually skim over difficult parts or lose interest in a book that had them. More than once I’ve been surprised going back to a book as an adult and seeing what I’d missed as a kid.) Kids will also certainly encounter lots of dubious ideas and counsels. But mainstream culture is full of these as well, and I hope that I and other parents will teach our kids how to evaluate those wisely, whether or not they come from sources we usually think of as “controversial”.

Banned Books Week is thus about twin freedoms: the freedom to write about what matters to you, and the freedom to read about what matters to you. In this week’s observance, I hope we grow to better appreciate these freedoms and the power of books and ideas.

Don’t shade your eyes

Back in 2006, Paul Collins wrote an article in Slate asking “Will Google Book Search uncover long-buried literary crimes?” Now that we have large corpuses of texts searchable online, he argued, it will become much easier to find words lifted from other writers than it once was. (Collins reported on a few such cases found in early GBS searches.) The Net may make it easier for people to misappropriate other authors’ words, but it also makes such misappropriation easier to detect.

Indeed, it’s starting to finger some well-known contemporary authors. Last week, some bloggers used GBS to finger a prolific, and still publishing, romance author who was repeatedly plagiarizing the works of others. (The linked story is the first report of the news last Monday; its sidebar currently point to numerous followups, including more examples uncovered by the same blog and its readers.) Cassie Edwards may be one of the first well-known current authors to be caught out via online-library Googling, but she’s not likely to be the last. Here are some things mentioned in the ensuing discussions that may be worth remembering the next time this sort of thing comes to light:

Plagiarism is not the same as copyright infringement and is therefore not justifiable with a “fair use” excuse. Plagiarism and infringment both involve improper copying, but otherwise they have important differences. Copyright infringement is copying without proper authorization (either from the copyright holder, or from copyright law), and is a legal offense. Plagiarism is copying without proper attribution, and is an ethical offense. While they sometimes go together, it’s perfectly possible to plagiarize without violating copyright (such as by plagiarizing public domain sources), and violate copyright without plagiarizing (such as by putting the Harry Potter books on your web site, with J. K. Rowling’s name left on them.)

Standards of proper attribution may vary by genre, but they exist for all genres. Formal scholarly writing standards are especially strict about attribution, with detailed citations generally required for words or even ideas taken from someone else. Popular fiction, music, preaching, and the like, may not usually include footnotes, and reuse may be more common in those genres (especially for things like standard chord progressions), but generally speaking, if you “quote” extensively from someone else, you’re expected to credit them. This can be in your acknowledgements, your liner notes, or wherever, but if it’s more than just a brief or obvious allusion (like the title of this post, taken from a Tom Lehrer song about plagiarism) you need to credit it.

Plagiarism may be forgivable, but not excused or justified, at least for anyone who expects professional respect. Being caught early on may ironically be a blessing; had Edwards been caught out by her editor or a reader on one of the first of her 100+ books, perhaps she could have apologized, changed her ways, and gone on to earn lasting plaudits on her original writing in the books that followed. For that matter, had Edwards’ plagiarism been limited just to nonfiction “infodumps”, and kept out of her story narratives, as it appeared in the first examples to be uncovered, it might have been easier to let go of. Unfortunately, it turns out she also lifted descriptive passages from fiction, which cuts closer to her main line of work as a storyteller.

On the other hand, I find it easier to forgive someone like Martin Luther King, who was (posthumously) found to have plagiarized in many of his academic writings, including his doctoral dissertation. Since Rev. King is best known and honored as a great civil rights leader, it’s easier to forgive this flaw that it would be if he were mainly remembered as an academic or an author. But it’s still sad, as it is when one uncovers the flaws of any of our American heroes (as you will for any of them, if you look closely enough). And now I can’t hear someone call him “Doctor Martin Luther King” without mentally interpolating an asterisk after the title.

More than ever, now that people’s words are increasingly available for searching and scanning in perpetuity, it’s important to take responsibility for what goes out under your name, whether you’re a storyteller, a scholar, or a politician. If you own up to your mistakes and sins early on, and do what you can to fix them, there may be good hope of redemption. Wait, and the damage can worsen both for those you’ve wronged and for yourself. I’ll try to remember that in my own writing, and I hope my readers will help hold me to it.