From Wikipedia to our libraries

I’ve heard the lament in more than one library discussion over the years.  “People aren’t coming to our library like they should,” librarians have told me.  “We’ve got a rich collection, and we’ve expended lots of resources on an online presence, but lots of our patrons just go to Google and Wikipedia without checking to see what we have.”  The pattern of quick online information-finding using search engines and Wikipedia is well-known enough that it has its own acronym: GWR, for Google -> Wikipedia -> References.  (David White gives a good description of that pattern in the linked article.)

Some people I’ve talked to think we should break this pattern.  With the right search tool or marketing plan, some say, we can get patrons to start with us first, instead of Google or Wikipedia.  This idea seems to me both futile and beside the point.  Between them, Google and Wikipedia cover a vast array of online information, more than librarians could hope to replicate or index ourselves in that medium.  Also, if we truly have better resources available in our libraries than can be found on the open Web, it’s less important that our researchers start from our libraries’ websites than that they end up finding the knowledge resources our libraries make available to them.

Looked at the right way, Wikipedia can be a big help in making online readers aware of their library’s offerings.  One of the things we spend a lot of time on in libraries is organizing information into distinct, conceptual categories.  That’s what Wikipedia does too: so far,  their English edition has over 4 million concepts identified, described, and often populated with reference links.  And Wikipedia has encouraged people to add links to relevant digital library collections on various topics, through programs like Wikipedia Loves Libraries and Wikipedian in Residence programs.  But while these programs help bring some library resources online, and direct people to those selected resources, there’s still a lot of other relevant library material that users can’t get to via Wikipedia, but can via the libraries that are near them.

So how do we get people from Wikipedia articles to the related offerings of our local libraries?  Essentially we need three things: First, we need ways to embed links in Wikipedia to the libraries that readers use.  (We can’t reasonably add individual links from an article to each library out there, because there are too many of them– there has to be a way that each Wikipedia reader can get to their own favored libraries via the same links.)  Second, we need ways to derive appropriate library concepts and local searches from the subjects of Wikipedia articles, so the links go somewhere useful.  Finally, we need good summaries of the resources a reader’s library makes available on those concepts, so the links end up showing something useful.  With all of these in place, it should be possible for researchers to get from a Wikipedia article on a topic straight to a guide to their local library’s offerings on that topic in a single click.

I’ve developed some tools to enable these one-click Wikipedia -> library transitions.  For the first thing we need, I’ve created a set of Wikipedia templates for adding library links. The documentation for the Library resources box template, for instance, describes how to use it to create a sidebar box with links to resources about (or by) the topic of  a Wikipedia article in a reader’s library, or in another library a reader might want to consult.  (There’s also an option for direct links to my Online Books Page, if there are relevant books online; it may be easier in some cases for readers to access those than to access their local library’s books.)

For the links to work, we need to know about the reader’s preferred library.  Users can register their preferred library (which will set a cookie in their browser recording that choice), or select it for each individual search.  We know how to link to several dozen libraries so far, and can add more libraries on requestWorldcat.org, which includes holdings of thousands of libraries worldwide, is also an option.  Besides the “Library resources box” template, I’ve also provided templates for in-text links to library resources, if those work better in a given article.  Links to these templates can be found at the end of the “Library resources box” documentation.

For the second thing we need, I’ve created a library forwarding service (“Forward to Libraries”, or FTL– catchier name suggestions welcome) that transforms links from Wikipedia into searches for appropriate  headings or keywords in local libraries.  This is the same service I describe in my “From my library to yours” blog post from last month, but it now supports links from Wikipedia as well as to Wikipedia.

Thanks to information included in the Library of Congress’ Authorities and Vocabularies datasets, OCLC’s VIAF data feeds, Wikipedia’s database downloads, and my own metadata compiled at The Online Books Page, FTL already knows how to link directly to over 240,000 distinct authority-controlled headings known to the Library of Congress from their corresponding Wikipedia articles.   (Library of Congress headings are used in most sizable US libraries, and many English-language libraries outside the US also use similar headings.)

For other articles, FTL by default will try a general keyword search based on the Wikipedia article’s title, which will often turn up useful results at the destination library.  Alternatively, my templates allow Wikipedia editors to determine a specific Library of Congress heading to use in library links, if appropriate.  I’m hoping to incorporate suggested headings into FTL’s own knowledge base as I detect them showing up in Wikipedia articles.  I also plan to publish FTL’s data sets under open access terms, so that others can use and improve on them as well.

The third part of this solution– displaying relevant resources at the destination library— can be implemented differently at each library.  For most of the libraries in FTL’s current knowledge base, links go to searches in the library’s regular online catalog.  But with some libraries, I’ve linked to another discovery system, if it seems to be the main search promoted at that library, and it seems to produce useful results.  The Online Books Page’s subject map displays also have features that I think will be useful to Wikipedia subject researchers arriving at my site, such as also showing related subjects and books filed under those subjects.  I hope in future posts to talk more about other useful guideposts and contextual information we could be providing to readers arriving from Wikipedia.

But if you’ve read this far, you probably want to see how this all works in practice.  So I’ve added some example library resources boxes in a few Wikipedia articles that seemed particularly relevant this month, including those for Women’s history, Elizabeth Cady Stanton, and Flannery O’Connor.  Look down in the “External links” or “Further reading” sections of those articles for the boxes, and view the page source of the articles to see how those boxes are constructed.

As with most things related to Wikipedia, this service is experimental, and subject to change (and, hopefully,  improvement) over time.  I’d love to hear thoughts and suggestions from users and maintainers of Wikipedia and libraries.  And if you find creating these sort of links from Wikipedia useful, and need help getting started, I’d be happy to help you bring them to your favorite Wikipedia topics and local libraries, as time permits.

From my library to yours

Even with well over one and half million books and serials, the collection I maintain at The Online Books Page is far from comprehensive.  The gaps in coverage are not hard to notice at sites like mine, because most material published under copyright– which can be as much as 90 years old at this point– is not made freely available online.  But all libraries, no matter how large or well-provisioned, have their gaps.  No one can collect everything, and a persistent reader or researcher will eventually find that their questions and interests go beyond the bounds of any particular collection.

However, there are lots of libraries out there, as well as lots of online information and literature that hasn’t been collected into an institutional library.  A good library, of whatever size, serves its users well by collecting the most useful materials it can get for their needs, and helping them get whatever else they need in other places.  Jeff Jarvis expressed this basic idea well a few years ago when discussing news organizations: “Cover what you do best.  Link to the rest.”

Many libraries already do this, in certain ways.  The inter-library loan system helps library users who know they want a particular title their own library doesn’t have.  Many libraries also maintain links to websites on various topics from their own library website or catalog.  But these links, often maintained separately by each library, can only cover so much ground, as librarians have limited time to collect and maintain links.  Even consortially maintained collections of links struggle to go beyond fairly generalized or particular-niche focuses, and stay current.

Libraries can do more, though.  People coming to a library often have a particular topic in mind that they want to learn or read more about.  They’re often looking for something they can pick up quickly, and for free.  Knowing what that topic is, we should be able to point them towards useful literature they can quickly and freely obtain, whether or not it’s a title they already had in mind, and whether or not it’s in our own collection or something we link to directly.  That’s the purpose of some new links now available on The Online Books Page.

For example, say you’re a high school student looking for books on the Underground Railroad.  If you browse to this subject on the Online Books Page, you’ll find a number of free online books I list on this topic, and related topics.  As before, you can explore those related topics, if you’re interested (maybe checking out fugitive slave biographies, for instance); or you can try digging deeper for books specifically on the Underground Railroad via the extended shelves.

But most of what you’ll find on my site will be 19th century and early 20th century materials.  Your local library is likely to have books you can freely read as well, reflecting more up-to-date historical research, as well as books that might be more accessible to a high school student.  There might also be useful research materials online that you can look at for free.

That’s why there’s a new “See also…” note just under the big “Underground Railroad” heading.  If you click on the words “your library” in that note, you’ll be referred to your regular library, if we know about it, to see what they have on the Underground Railroad.  (If you haven’t already told us which local library you want to use regularly, we give you a list of choices.  It’s a pretty small list to start with, but I’m taking requests for more libraries to add.  Or you can opt for OCLC’s Worldcat.org– they cover lots of libraries throughout North America and beyond.)  Even after you register a preferred library, you’re not stuck with only using that one.  You can click on the “elsewhere” link in the note to try a different library or service from the one you usually check– like maybe the university library that’s near your public library (or vice versa).

You might also want to find online research resources that aren’t books.  For some of those, try clicking on the Wikipedia link provided for this subject.  While the quality and reliability of Wikipedia articles themselves can vary, most mature Wikipedia entries include a rich set of useful links to more information.  (I’ve discussed previously how useful Wikipedia is as a concept-oriented catalog.)  The references and external links on Wikipedia’s Underground Railroad article, for instance, cover a wide range of informational websites, contemporary and current books, and digital library collections.

Similarly, if you’re looking at a list of online books by a particular author (like, say, W. E. B. Dubois), you’ll find a link at the bottom of the page to find more books by the author in libraries, as well as links to online books or Wikipedia articles about the author near the top.  There are also links to find library copies of a particular book on its detailed catalog page; see for instance, the links at the bottom of our catalog entry for The Souls of Black Folk.  This can be useful for people who want a print copy, or a different edition from the ones we list.

So far, I’ve added links from The Online Books Page to Wikipedia for more than 17,000 subjects, and links to library catalogs for millions of subjects, authors, and titles.  (My thanks to OCLC, the Library of Congress, and Wikipedia for providing bulk access to the data that makes it possible to do much of this automatically.)  I’ll be developing this service further, and doing more things with this data, in ways that I hope to describe here shortly.  But I hope this first step is a useful demonstration of ways that different kinds of libraries and catalogs– online and local, academic and public, institutional and informal– can support each other through user-directed, context-sensitive, concept-level links between collections.

Public Domain Day 2013: or, There and Back Again

The first day of the new year is Public Domain Day, when many countries celebrate a year’s worth of copyrights expiring, and the associated works become freely available for anyone to share and adapt.  As the Public Domain Day page at Duke’s Center for the Public Domain notes, the United States once again does not have much to celebrate.  Except for unpublished works by authors who died in 1942, no copyrights expire in the US today.  Under current law, Americans still have to wait 6 more years before any more copyrights of published works will expire.  (Subsisting copyrights from 1923 are scheduled to finally enter the public domain at the start of 2019.)

The start of 2013 is more significant in Europe, where the Open Knowledge Foundation has a more upbeat Public Domain Day site featuring authors who died in 1942, and whose published works enter the public domain today in most of the European Union. But that isn’t actually breaking new ground in most of Europe, because 2013 is also the 20th anniversary of the 1993 European Union Copyright Duration Directive, which required European countries to retroactively extend their copyright terms from the Berne Convention‘s “life of the author plus 50 years” to “life of the author plus 70 years”, and put 20 years’ worth of public domain works back into copyright in those countries.

For countries that used the Berne Convention’s term and implemented the directive right away, today marks the day that the public domain finally returns to its maximum extent of 20 years ago.  Only next year will Europe start seeing truly new public domain works.  (And since many European countries took a couple of years or more to implement the directive– the UK implemented it at the start of 1996, for instance– it may still be a few years yet before their public domain is back again to what it once was.)

At least the last US copyright extension, in 1998, only froze the public domain, without rolling it back.  If the US had not passed that extension, we would be seeing works published in 1937, such as the first edition of J.R.R. Tolkien’s The Hobbit, now entering the public domain.  (If the US hadn’t made any post-publication extensions, we’d also have the more familiar revision of The Hobbit, in which Gollum does not voluntarily give Bilbo the Ring, in the public domain now as well, along with all three volumes of The Lord of the Rings.)   Folks in Canada and other “life+50 years” countries, now celebrating the public domain status of works by authors who died in 1962, may be able to freely share and adapt Tolkien’s works in another 11 years.  Folks in Europe and the US who’d like to see a variety of visual adaptations, though, will have to content themselves with the estate-licensed Peter Jackson and Rankin/Bass adaptations for a while to come.

But there are still things Americans can do to make today meaningful.  For the last few years, I’ve been releasing copyrights I control into the public domain after 14 years (the original term of copyright set by the country’s founders, with an option to renew for another 14).  So today, I dedicate all such copyrights for works I published in 1998 to the public domain.  This includes my computer science doctoral dissertation, Mediating Among Diverse Data Formats.  If I believed a recent fearmongering statement from certain British journal editors, I should be worried about plagiarism resulting from this dedication, which doesn’t even have the legal attribution requirement of the CC-BY license they decry.  But as I’ve explained in a previous post on plagiarism, plagiarism is fundamentally an ethical rather than a legal matter, and scholars can no more get away with plagiarizing public domain material than they can with copyrighted material.   Both are and should be a career-killer in academia.

I’ll also continue to feature “new” public domain works from around the world on The Online Books Page.  Starting today, for instance, I’ll be listing works featured in The Public Domain Review, a wonderful ongoing showcase of public domain works inaugurated by the Open Knowledge Foundation on Public Domain Day 2011.  I’ll also be continuing to add listings from Project Gutenberg Canada and other sites in “life+50 years” countries, as well as other titles suggested by my readers.

Finally, I’ll be keeping a close eye on Congress’s actions on copyright.  In this past year, the Supreme Court ruled that Congress could take works out of the public domain, meaning that the public domain in the US is now under threat of shrinking, and not just freezing.  And the power of the copyright lobby was evident this year when a Republican Study Committee memo recommending copyright reform (including shorter terms) was yanked within 24 hours of its posting, and its author then fired.  On the other hand, 2012 also saw one of the largest online protests in history stop a copyright lobby-backed Internet censorship bill in its tracks.  If the public shows that it cares as much about the public domain as about bills like SOPA, we could have a growing public domain back again before long, instead of works going back again into copyright.

Persistence

The week between Christmas and New Years is mostly time off for me– I’ve added no new listings to The Online Books Page this past week, for instance– but even on vacation, as long as I have a working Internet connection I still tend to fix bad links as I hear about them from readers’ reports.  I try to draw from a variety of free online book sources, instead of just a few big ones; that’s worthwhile to me because it increases the diversity of titles and editions on the site.  But the tradeoff is that many of these sites disappear, reorganize, or otherwise have links go bad over time.  I’m grateful to my readers for reporting bad links to me, and I can often fix other bad links to the same site when I fix the one reported to me.

The links and sites that persist, and those that don’t, often aren’t the ones you might expect.  Who’d have thought, for instance, that a shoestring-budget project that didn’t even maintain its own website until fairly recently would have the longest-lived (and still one of the largest) electronic book collections in common use, outlasting many better-funded or more systematically planned projects (as well as its own doggedly persistent original champion)?  Although the links to Project Gutenberg’s ebooks have changed over the years, the persistence of their etext numbers, and the proliferation of Gutenberg sites and mirrors, has made it relatively easy for me to keep links working for their more than 40,000 ebooks.

Some library-sponsored sites use persistent link redirection technologies, such as PURLs, to keep their links working.  But technology alone isn’t sufficient for persistence.  I recently had to update all of my links going to a PURL-based library consortium site.  I’m sure the people who worked at the organization hosting the site would have kept the links working if they could, but the organization itself was defunded by the state, and its functions were taken over by a new agency that didn’t preserve the links.

Fortunately, the failure had a couple of graceful aspects that eased recovery.  First of all, the old links didn’t stop working altogether, but redirected to the front page of a digital repository in which people could search for the titles they were looking for.  Second, the libraries in the consortium still maintained their own websites, and the old links included a serial number unique to each text (similar to Gutenberg’s etext numbers) that was also used by member libraries.  I found that in most cases I could automatically rewrite my links, using that serial number, so that they would point to a copy at a contributing library’s website.  This made it easier for me to rewrite my links, even though they go to new sites, than it’s often been for me to update links to sites that persist but reorganize.   (For instance, I’ve seen sites change to new content management systems that used completely different URLs from their old design, and then had to manually relocate and verify each link one at a time.)

Sometimes I have to replace links that still “work”, technically.  I used to have thousands of links to a Canadian consortium that provided free access to scanned public domain books and pamphlets from that country’s history.  Not long ago, I discovered that while my links still work, the site had gone to a subscription model where readers have to pay for access beyond the first dozen or so pages of each text. Given the precarious state of Canadian library funding, I’m sure the people running the site were simply doing what they thought necessary to ensure the persistence of the sponsoring organization (which continues to provide new electronic texts and services).  Personally, however, I was more concerned about the persistence of free access to the digitized texts I’d pointed to.  Fortunately, a number of the consortium’s member libraries had also uploaded copies of their scans to the Internet Archive, using the same serial numbers used on the Canadian consortium’s website.  As a result, I was able to quickly update most of my links to point to the Internet Archive’s copies.  I intend to track down working alternative links to the 200 or so remaining texts, or post requests seeking other copies of these texts, when time permits.  (I’ve also sent along a donation to the Internet Archive, in part to thank them for continuing to provide access to texts like these.)

It’s been said in digital library literature that persistence of identifiers is more a matter of policy than technology.  Based on the experiences I’ve related above, the practical persistence of links is even more a matter of will than of policy: the will (and ability) to keep maintaining access through changing conditions; the willingness to consider alternatives to specific organizational structures or policies if the original ones turn out not to be tenable; the willingness to pick things up again, or let others pick them up, after a failure.

It’s also clear from my experience that practically speaking, failure is not the main enemy of persistence.   More of a threat is not recovering from failure, or being so worried about failure that one doesn’t even begin to sustain the thing or the purpose that should persist.  To riff off a famous G. K. Chesterton quote, if it’s worth doing something, it’s worth being willing and ready to fail at doing it.  And then, to be willing to pick up again where you left off, or to make it easy for someone else to pick it up, and try something new.

That’s persistence.  That’s what’s ultimately gotten the dissertation rewritten, the estates settled, the blog picked up again, the books put and kept online for the world to read, and many other things I’ve found worthwhile, despite difficulties, anxieties, and setbacks.  I value that persistence, and I hope you value it as well, for the things you find worthwhile. I look forward to seeing where it takes us in the year to come.

In which I finally buy an ebook

In my last post, I discussed why I wanted to buy ebooks I could truly own, and my subsequent attempts to buy such a  copy of John Scalzi’s Redshirts from a readers’-rights-friendly retailer.  I initially had a hard time finding an ebook store that fulfilled three basic requirements:

  1. The store must sell a DRM-free copy of the book, in a convenient format.  That eliminated specialized ebook stores that didn’t carry the title at all.  Also, a number of major sites only had DRM-locked versions at first.
  2. The store must make the format and DRM-free status clear. Most mass-market ebooks are still locked down with DRM, and I don’t want to get stuck with that, either for this title or for other titles I might buy.  So the store had to make it clear what I was buying, either by a notation on the book’s catalog page, or by a general policy stating that books they offered were DRM-free.
  3. The store must not require me to agree to give up my rights as a reader under copyright law.  In particular, I would not consent to any terms of sale that significantly limited my rights of fair use or first sale.  Fair use allows me to make copies of copyrighted material under certain conditions, such as quoting and critiquing a small portion in my own work, or making a complete personal copy of a  TV show I’ve received or CD I’ve  bought, for more convenient consumption.  First sale lets me decide how to dispose of a book once I’ve bought it, including giving over the copy of something I’ve already lawfully acquired to someone else.  (First sale rights also let libraries lend out books without having to ask publishers first.)  Each of these rights has limits, and there are still disputes over how far these rights can be applied to digital content.  But I didn’t want to pre-emptively sign away rights that copyright law might give me.

I didn’t think it would be that hard to find a retailer to meet these requirements.  But here’s what I found when I went shopping:

Barnes and Noble: Since we owned a Nook, I first called up the store app on that device.  The ebook was simply marked as a “Nook Book”, with no clear differentation between a DRM-free and a DRM-locked copy.  (The current catalog page for the book now mentions in the overview that it’s being sold without DRM, though  not very prominently.)  I also recalled that to get access to the store in the first place, I had to click through a terms of service agreement.   Reviewing that on the web turned up a clause saying I couldn’t “copy, transfer, sublicense, assign, rent, lease, lend, resell or in any way transfer any rights to all or any portion of the Digital Content to any third party” except under certain explicit, very limited conditions.  In other words, give up first sale rights to anything I bought in the Nook store.  Rather than do that, I moved on to another retailer.

Amazon: There was no clear mention of DRM status on the book’s catalog page initially (even now, I don’t see it there until I click on “show more”).  Anazon uses its own Kindle (mobi) format for its books, so I’d need to convert it to a different format (possibly degrading the layout in the process) or get a Kindle reading program or device. The Kindle License Agreement and Terms of Use limits how I’m allowed to read books they sell, disallows third party transfers except by explicit permission, and in case I missed the point, explicitly states “Digital Content is licensed, not sold”.  No sale here, then.

Google:  Going over to Google Books, I find this book available through Google Play.  The catalog page doesn’t tell me what format it’s in, or whether it has DRM; it instead just asks me to sign in to buy it.  Google then tells me I have to agree to their terms, which again include no third party transfers, before it will give me access to whatever formats it may let me download.  If I read the book online within Google Play itself, its  privacy policy allows it to look over my shoulder to a limited extent while I’m reading.  Google pledges to use this power only for good, but personally I’d prefer to download and keep my reading details to myself in the first place, thanks.

Sony Reader store: Information on format and DRM status is not clear for its books.  Based on Sony’s past history with DRM, there’s no way I’m giving them the benefit of the doubt with the formats they might use.

Independent bookstores: I  also looked into whether I could buy an ebook through one of the independent bookstores I’ve liked shopping in.  Unfortunately, they don’t seem to offer much.  My local indie store doesn’t appear to sell ebooks at all, and Powell’s doesn’t offer seem to offer this title at present.  Independents in the IndeBound ebook program appear to just be referral agents for Google Books.

Diesel eBooks: The slogan “More freedom, more ebooks” seemed promising when I found this site.  Diesel offers both DRM-locked and DRM-free titles, and their catalog pages make it very clear which is which.   Unfortunately, they only offered a DRM-locked version of Redshirts for weeks after it was first released.  However, I recently went back to the site and found they’d switched to the DRM-free version.  Buying that ebook consisted of registering my name and email address, giving them my credit card information, and downloading an EPub file.  No click-through agreements were involved, and when I went over to look at the general terms of use for the site, they basically amounted to “don’t abuse the site, or infringe copyright”.  In short, I gave them money, and they gave me an ebook, and said “Enjoy!”, with no further fuss. That’s the kind of book shopping I like.

So there’s at least one reasonably comprehensive and reader-friendly ebookstore out there.  I’d be happy to hear about others as well.  And I look forward to buying and owning more books, in both print and electronic formats.

In which I try to buy an ebook

Not long ago I went to the bookstore and bought some books.

This is how: I found some books I liked on the shelves, brought them to the front counter, and handed the clerk some money.  The clerk put my books and a receipt into a bag, and ended the transaction by handing me the bag and saying “Enjoy!”

So that’s what I did.  As I left the store, I thought about which book I’d start on the train home, and about all the other things I could do with my new books.  I could read them to myself, read out load to my family, lend them out to friends, cite or briefly quote them in my own work, trade them in at the used bookshop, donate them to the local library, bequeath them to my heirs, cut things out from them and post them on my wall, make origami art out of the pages, or lots of other things that neither I nor the bookseller had yet imagined.  As long as I didn’t violate copyright or other laws, neither the bookstore, the publisher, nor anyone else had any further say in how I  enjoyed the books.  They were mine.

I also own a number of books on my computers, but not ones I’ve bought, at least not as ebooks.  (They’re all in the public domain, or came bundled with a print edition I bought, or are free authorized digital editions. I don’t do bootlegs.)  But I wanted to buy electronic books as well– books I liked that weren’t being offered for free or in bundles; books where I could support the authors and publishers through my purchase.

Unfortunately, there weren’t many ebooks of interest that I could buy– at least not if “buying” means “owning”.  Oh, I could call up a store app on my Nook, or go to Amazon online, where they offered me book files in return for some money and my consent to a take-it-or-leave-it agreement.   A file I paid for wouldn’t be a book I owned, it would be a file that I licensed under a non-negotiable contract, and I could only do with the file what the vendor, the publisher, and other parties to the agreement decreed I could do.  The file itself would be encrypted with “Digital Rights Management” (DRM), that would only allow display by approved programs that carefully controlled whether and how I could read the book. And if those programs stopped working, or decided to revoke my right to read the book, or if I wanted to use the books on some other system, or in some other way they didn’t anticipate and approve of, tough luck for me.  (Technically, I could break the encryption, but I would be breaking the law if I did.)   And I shouldn’t even think about trying to pass along the book to someone else– unless I was lucky enough to find a title eligible for some very limited lending experiments certain publishers and vendors were trying out.  I have books in my home that my grandparents read 100 years ago, but I had little hope my grandchildren would be able to read ebooks like these, at least not legally.

A few places offered DRM-free books for sale, but they tended either to offer titles I preferred to read in print (like the computer books published by O’Reilly), or they didn’t offer many titles of interest to me.  I wasn’t going to get into the habit of buying ebooks unless there was a critical mass of titles worth aggregating into a personal library.

So I was thrilled when Tor, a major science fiction publisher and an imprint of one of the Big Six publishing companies, announced that all of their books would soon be sold DRM-free.  They weren’t the first SF imprint to take this route– Baen, for instance, has been offering DRM-free titles for years– but Tor had enough authors I liked that I could see myself buying ebooks from them fairly regularly.  Tor’s first DRM-free release would be John Scalzi’s Redshirts, a book I’d already been hoping to buy, and which I now decided to buy as an ebook.  That would let me try out the new format, and also thank Tor and Scalzi for taking the initiative to let readers just own their books.  (And if Tor’s initiative does well, other imprints might follow.)

I originally planned to buy the ebook on its release date.  But even when an author and a publisher are ready to go, it can take a while to get the retailers on board.  On the day Redshirts came out, many ebook stores delivered DRM-locked files instead of the DRM-free edition readers expected.  (Thankfully, Tor offered free exchanges almost right away.)  More worrisome to me, though, was that many of the major ebook retailer sites wouldn’t complete a transaction unless I first indicated consent to a “take it or leave it” agreement that appeared to sign away important rights readers normally have to books they buy.  Unlike the print books I bought in the bookstore, my enjoyment of the ebooks I got from these sellers would be restricted by their contractual demands, above and beyond the standard constraints of copyright law.  DRM or no DRM, the ebooks would not  truly be my own if I agreed to those demands.

Eventually, though, I found a retailer that offered what I wanted without any unacceptable strings attached, and I’m now a happy Redshirts ebook owner and reader.  I’ll describe my experience buying the book from that retailer, and not buying the book from some better-known retailers, in my next post.

Building on a full complement of copyright records

Thanks to recent efforts of the US Copyright Office, we now have a complete digitization of summary copyright registration and renewal records back to the late 19th century.  As Mike Burke and others at the Copyright Office have been reporting on their blog, Copyright Matters: Digitization and Public Access, the Copyright Office has now digitized nearly every volume of the Catalog of Copyright Entries, and its predecessor publication, the Catalogue of Title Entries of Books and Other Articles, to the start of that serial in 1891.  Combined with the current online Copyright Catalog database, and some independent scans that fill in gaps in the Copyright Office set, records for every copyright registration and renewal still in force in the US can now be found online, free of charge.

This is a great benefit for people wanting to make better use of copyrighted works and the public domain.  With the information now online, we can quickly verify copyright and public domain status for lots of works, and also get useful leads on current owners of copyrights, in ways that were not possible when the only copies of the Catalog were in closed reserve at certain federal depository libraries.  Various people in the Copyright Office  have been hoping for a while to get approval and funding for this digitization, and I’m very thankful for their persistence in seeing the work through.

Not all the work is done, though.  Although the Catalog is now online, its records are not as easy to search, navigate through, and interpret as they could be.  There’s no one-stop search box, for instance, that will reliably bring you to any copyright record with your query terms, regardless of date or type of record.  And the Copyright Office also has more information about its copyright registrations– some of it on catalog cards, and more of it on original registration certificates like the one I found when researching the status of my mother’s book— that could be useful to people researching copyright status and looking for rightsholders.

For now, the Copyright Office is scanning the cards used to look up volumes of registration certificates, and that are also the basis of the Catalog of Copyright Entries printed volumes.  From my (limited) experience with these cards, they don’t seem to add much information to what’s in the printed Catalog, but it’s easier to automatically create a searchable, structured database of copyright records from the cards, with their fairly regular typefaces and formats, than it would be to create one from the Catalog scans.  According to their latest blog post, the Copyright Office is now creating digital images of the relevant cards, and hope to be done by the end of Fiscal Year 2014, or a little over 26 months from now.  They’re also hoping to work with various partners– including “crowdsourcing” partnerships– to reliably convert the information on the cards into machine-readable form.

There are also lots of ways to make the existing online records more useful.   On my own copyright records site, for instance, I’ve now made a comprehensive index to all the Catalog volumes, and created a table to make it easier to look up records in digitized Catalog volumes, based on the year and type of copyright registration.  I’m still working on further refinements, and would be very happy to hear suggestions.  (I’ve also been unable to find one 12-month stretch of records for copyrights from 1895 and 1896.  Fortunately, all the copyrights from those years have long since expired, but I’d still be grateful to anyone who can help me fill this last gap.)

At the same time, I’ve been using the comprehensive record set to help me research and publicize copyright status for listings on The Online Books Page.  For instance, if I’m listing public domain issues of a journal, magazine, or other serial, I’ll also look to see whether additional issues might also be in the public domain if their copyrights were not renewed.  Then I’ll place a note about this on my cover page for the serial, if applicable.

As for the Copyright Office, I’m hoping that they can soon start digitizing their volumes of registration certificates, which contain a lot of useful additional information about copyrights and copyright holders, and which no one else has.  Digitizing all of them wouldn’t be cheap– there are a lot of pages potentially to digitize, usually two for each registration.  But perhaps they could start digitizing incrementally, either on a prioritized systematic basis (e.g., starting with the most recent volumes), or on a demand-based basis (e.g., digitizing when someone wants to obtain a copy of one of a volume’s certificates).

These are only a few of the things that could be done with the records now online, by people anywhere with the suitable motivation.  I’d love to hear what others are doing or thinking of doing.