Changing the subject(s)

When I implemented subject maps for browsing the Online Books Page by subject a while back, I had a big problem to face: I didn’t actually have subject terms for the books. How could I implement subject browsing without subjects?

By bootstrapping. I did have call numbers for the books, which arrange books by discipline. (That’s what lets you see similar books grouped together in a library’s nonfiction shelves.) It’s possible to infer a subject from a call number, though you’ll sometimes get a more general subject than the book’s really about, or miss secondary subjects. The Library of Congress authority records for subjects include call number ranges that apply to many of their authorized terms. My library had some of these authority records in its catalog. They’re often a few years behind the state of LC’s official list, but they’re still recent enough to be useful.

So I downloaded those records and then wrote a program that, given a call number, would try to find the subject with the smallest range that included that call number. Doing that helps you get specific subjects instead of general ones; if your call number is HD1306, you want to match the range HD1301-HD1306 (for “Land, Nationalization of“) rather than the wider range HD101-HD1131 (for the more general subject “Land use“). After filtering out some bad data in a few authority records, and suppressing some terms to break ties, I ran the program, and instantly got subject terms for tens of thousands of books. Some of them were pretty generic (thousands of books were simply labeled “English literature”, for instance), but many were quite specific, and I’d say over 90% of the time were useful descriptions of the book. The maps I built largely based on these assigned subjects worked pretty well from day one.

I didn’t stop there, though. Little by little I went back and found more precise and appropriate subjects for books in various parts of my collection. When I did this, I could also assign multiple subjects, instead of having to make do with one. (I’m not trained as a cataloger, but I know the basics of how LCSH subject assignment works, and can look over terms assigned by the Library of Congress or other libraries and choose the ones that seem to make the most sense for a given title.) I also kept track of which books had the automated subject assignments, and which had human-overseen cataloging.

As of today, I’ve assigned these more precise and comprehensive subjects to all the non-literature books in my collection that had call numbers. A lot of the fiction, and some of the more obscure nonfiction without a call number, still lacks subject cataloging. (As far as I can tell from Worldcat, many fiction books have never been subject-cataloged by anyone.) Some of these books will eventually get subject terms as well. But by now, the only automatic subject assignments left were for a few (mostly large) generic literature categories, which by now are mostly getting in the way of discovery of the other books. So today I’m turning those automated categorizations off.

Now that I’ve completed this phase of subject browsing enhancement, I’m excited to think about what might come next. I know from my usage logs that lots of people are browsing by subject (and that, based on the bad link reports I get, they’re finding books that had largely been overlooked before). Now that I have consistent high-quality subject metadata for nonfiction, I can think of various ways to improve subject-based discovery, both for this collection and for others. I can work on ways to keep the subject map up to date with the latest changes in subject vocabulariess. I can implement techniques for establishing more relevant connections between subjects. I can investigate ways to integrate data from less consistent sources (such as most large library catalogs) into subject maps and compensate for (or even automatically correct) their inconsistencies.

For now, though, I’ll stop for a moment, take a breath, and come up to blog, before diving back into this and other projects.

We the mediators

Back in early 2006, Peter Brantley (now the director of the Digital Library Federation) got a lot of interesting folks in libraries and publishing together in one room to talk about issues related to reading in the digital age. While libraries and publishers have different focuses and priorities, we both serve as mediators between authors and audience, and both kinds of mediators are seeing dramatic upheavals and innovations in the ways we carry out our missions.

So the meeting touched off an interesting series of discussions. I’m having a hard time finding the “official” presentation pages from the original meeting, but here’s a short summary from me and a more detailed list of talk summaries from Tim O’Reilly. After the meeting, discussions continued on a mailing list of participants that over time added a number of other folks in publishing and libraries.

A number of the folks involved, mostly on the publishing side of things, have now started a group blog to take many of these conversations public. The blog is called Publishing Frontier, with the tagline “a raucous public discussion of the publishing revolution”. Its starting contributors include folks who’ve worked at trade publishers, scientific imprints, commercial research labs, and grassroots book digitizing.

The blog promises to be an interesting forum and chronicle of the digital revolutions in communication, largely from publishing perspectives (much as I hope this blog to be another such forum and chronicle, largely from librarianship perspectives). I encourage readers here to check it out.

Book People postscript

This past Friday I closed down the Book People mailing list, a forum for people making and reading free online books that Mary and I started in 1997. Much of the activity of folks on the list would be early examples of the sort of citizen librarianship that I referred to in the first post to this blog. I announced the list’s closing about three weeks ago, giving my reasons in a later post.

In the last three weeks of the list’s activity, various listmembers wound up conversations, planned or announced various new forums, and said their goodbyes. You can read all this, and the rest of the list’s history, in the archives, which are remaining online. The most direct successor to the list is Book Futures, a Yahoo Groups mailing list maintained by Kent Larsen, and there were some other lists announced as well.

I closed the list with my own retrospection and thanks. But I continued to get some more listmember reflections even after my last post (and for all I know some more may have come in after the list’s email address was decommissioned.) Here’s one of them, a message I got from Michael Stutz (posted here with his permission):


When you started Book People back in 1997, I began a list for the discussion of what has now become known as “open content,” in an attempt to prove a concept I’d been working on in obscurity for years.

My list, Linart, shut down years ago, and that goes so far back that a whole lifetime is packed in the interim. But I do know firsthand what it’s like to administer and moderate a list like this and I know that to do it justice takes more time and work than most people would believe. I’ve never known a list with closer and more careful moderation than Book People. Absolutely every time a BP post came into my inbox, I thought of this and how keeping a good list running takes a massive amount of work.

It’s always sad to see an end, but looking back I do think that Book People had a good run and, like Linart, it reached the end of its course—a decade ago, the idea of publishing an online or electronic edition of a book was a novelty, there weren’t so many of them and they weren’t always easy to find. Not so anymore—at the very instant your announcement came into my inbox, I was downloading several gigabytes of rare old books, dozens of volumes among hundreds that I’d found through a full-text keyword search.

Just the same, Linart was a great idea because at the time no one was publishing copylefted work online—and even more importantly, _no one thought it was possible._ My main interests were books and art, but I wanted to see every kind of copyrighted work digitized online with “copyleft” licensing. And it might seem crazy now, but the reactions
from open source and free software figures to my dream went from complete disinterest to overt hostililty: “Copyleft is for software! You can’t do that with books, music, art”—replies like that were typical. Few people in the world were copylefting non-software works, but Linart is best left in the 20th century and the world as it was before Wikipedia and Creative Commons. In fact, after seeing the results of several years of online “open content” and having tested it extensively firsthand, I’m now critical of the method—I know its weaknesses and errors and have come to see that it isn’t the right solution for the age.

But what remains important today is the greater question of online publishing in general—and, of course, the future of the book. As a reader I’m nearly exclusively online for newly-published material, and as a writer that’s also where I want to find my audience, but how to do it and how it will all work out, how new writing and new books will be published and read and sold, remains entirely unclear—I’m still looking for the answer, and so I think the new Book Futures list is very aptly named and hope it takes off on its quest from this place we’ve come to after over a decade’s worth of Book People.

If anyone else from the list would like to add any postscripts or other comments here, feel free to add a comment to this post.

What’s this all about, Part 2: Everybody’s Libraries

In my previous post, I discussed “citizen librarianship” and the rise of online library services that go beyond the established library organizations and practices. And I claimed that the most promising future of libraries involved understanding and building up “everybody’s libraries”, as a collective group and as a concept.

The collective group is easy enough to understand. It’s just the sum of all the library content and services usable by the global community. The bigger this is, the more we can benefit.

But what do I mean by “everyone’s libraries” as a concept? I mean a group of characteristics that I think will describe and build up the best libraries of the future. “Everybody’s libraries”, as I see it, includes

  • Libraries everybody can use. We’ve been sharing information with the online world at large almost since the day we set up computer networks. (The work of Project Gutenberg, for instance, started over 37 years ago.) Openly accessible information can be used by anyone it reaches, enlightening the world, making it easier to build on old work to create new knowledge, and enabling new kinds of production and commerce. Open-access libraries become even more usable when they make their information easy to find and repurpose, and when they accommodate varying languages, abilities, and education levels. For various reasons, not everything can be used by everyone all the time, but many of the barriers to access today can and should be removed.
  • Libraries everybody can put their work in. Libraries need to accommodate whatever information is important to their communities, from whatever source, and in whatever form, whether that be books, serials, images, multimedia, ephemera, or any of the forms of electronic information introduced in the Internet age. Many libraries are rightly selective about what they acquire, but we shouldn’t limit what they are able to select to benefit their users.
  • Libraries everybody can build. This includes the “citizen’s” libraries people build themselves and the established libraries that people contribute to. I started a kind of library 14 years ago as a computer science graduate student. It serves the Internet as a whole, and I continue to grow it. I also now work for another library that serves a smaller, university-based community with a broader range of collections and services (including some that are enhanced by our users’ contributions). The work I do with one library often enhances the work I do with the other. Many other people are now also building their own libraries, with the help of various tools for collecting, describing, organizing, preserving, and providing access to the information their communities need.
  • Libraries everybody can share. This is a crucial characteristic, distinct from but dependent on the characteristics above. In the past, if my library bought a new book or introduced a new service, it improved the lot of my library’s constituents, but did little or nothing for anyone else’s library. That no longer has to be true. My library, if it’s willing and able, can now share its content, its metadata, and even much of its services and technical infrastructure with any number of other libraries. The costs of turning local resources into shared resources can be very small; the benefits to the users of all these libraries can be very large. In this kind of environment, the improvements that I make in my library can also be turned into improvements in your library, and in someone else’s library– ultimately, in everybody’s libraries.

Most of these characteristics assume lots of libraries, large and small, independently managed but sharing whatever collections, services, knowledge, and other resources they see fit. People sometimes imagine that one day everyone will just use one big “universal library”, containing all knowledge, and run by some overarching organization, government, or corporation. I don’t think that’s going to happen, and I hope it doesn’t. There are too many ways that people want to collect and use information for various purposes. The library landscape of the future should support the construction, cooperation, and use of many kinds of libraries– physical, virtual, and hybrid– serving many kinds of communities and needs.

Everybody’s libraries, then, include libraries for everybody, by everybody, shared with everybody, and about everything. No one library is all things to all people, but collectively, they can be much greater than any single library can be. And if we understand and support everybody’s libraries (as I hope to encourage with this blog), we can make each of our own libraries better serve their users.

What’s this all about, Part 1: The Rise of Citizen Librarians

I got the idea for this blog from Dan Gillmor, a journalist who over the past few years has been documenting and encouraging the “citizen journalism” movement online. He gave an inspiring presentation at the Digital Library Federation’s forum earlier this month on some of the work being done by amateurs and professionals to gather, analyze, and spread news around the world using the powerful, easy-to-use tools provided by blogs and other online communication technology.

The technology is essential for enabling this kind of activity broadly, but the true value comes from what many people have been inspired to do with the technology. “Citizen journalism” isn’t a pretentious synonym for “blogging”. Rather, it describes ways in which ordinary people provide news, analysis and commentary to the community at large, using relevant journalistic principles, but outside established, professional media channels. The blog is one common medium that now makes this work easier, but it’s not the only medium for the work.

Collectively, citizen journalists cover many beats that the traditional media do not and cannot, due to limited time, resources, and interest. Through the Internet they can reach any other online reader who finds their work of interest. They aren’t limited to readers who live in a limited area or who pay for a subscription to their service. They do not replace professional journalists (though they may be threatening to some of them). Rather, savvy journalists and news organizations find ways to improve their own work by building on the work of their non-professional colleagues.

It occurred to me that a lot of what I’ve observed and encouraged online for the past fifteen years could, along similar lines, be characterized as “citizen librarianship”. The term’s not new; it’s been used, for instance, in discussions on rebuilding New Orleans. The practice it describes goes back considerably further than that: Lots of people, inside and outside of established library organizations, have been collecting, describing, organizing, making accessible, helping people find and use, and preserving information of all kinds. They’re serving constituencies that are potentially much larger than that of any purely physical library. It’s becoming increasingly easy for people to do this work online, with the various digital tools that are available or in development. And collectively, these citizen librarians have the potential to provide much more in the way of both collections and services than professional librarians can on their own.

I’m not claiming that everyone who uploads their pictures to Flickr or tags some web sites in is a librarian, any more than everyone who blogs is a journalist. But the more that people adopt principled methods for collecting, describing, and doing all the other things I list above with information, as a service for their communities, the more they’re acting as librarians. And the more the things they build function as libraries.

It’s not just individuals, of course, that are creating these new libraries and library-like services. Big corporations are doing it too (sometimes to much publicity). And many non-profit organizations, including many established libraries, are putting up new libraries and library services online. Many of these new sites, though, are still powered largely by the particular individuals who thought them up, or by lots of independent individuals that collectively build them up.

All this activity has seriously disrupted libraries, and disrupted the way that many library constituents perceive them. In some circles, there’s been notable pessimism and concern that libraries may now be obsolete. The new libraries and services may be threatening to some librarians and libraries that don’t adapt, just as bloggers may threaten the livelihood of some news purveyors that don’t adapt. But I believe that savvy library professionals can and will find ways to improve the services they offer by building on the work of their non-professional colleagues. Going the other way, I believe that the “citizen” and other non-professional librarians can increase the usefulness of their collections and services through adopting principles and practices that librarians have developed over the years. And I think there’s a lot more that people in both camps can do to share their work and expertise.

Or, to put it another way, the future of libraries, if they are to best serve their communities, must include understanding and building up “everybody’s libraries”. And here I mean “everybody’s libraries” both as a collective group, and as a concept. I’ll explain what I mean by the concept, which I’ve chosen to title this blog, in Part 2.