Repositories: What they are, and what we use them for

(Note: This is the second of an ongoing series of posts on repositories. The first post is here.)

The JISC Repositories Support Project defines a digital repository as “a mechanism for managing and storing digital content.” I find this a useful definition, both for what it says and what it doesn’t say. It notes that repositories, as such, focus on content and its management. It doesn’t say anything about the kind of digital content managed by the repository, or about the use this content is put to.

A repository’s focus is related to, but distinct from, the focus of a library or an application. Repositories focus on particular information content. Applications (like Zotero, FeedReader, or Google Docs) focus on particular information tasks, like tracking citations, getting news, or authoring documents. Libraries focus on the information needs of particular communities (which might be towns, schools, peer researchers, or Internet users with particular interests). Applications and libraries may use repositories to support their tasks or communities, and some may be primarily built around one specific repository (as most libraries in the pre-computer age were built around what was in their physical stacks). But they are not identical to their repositories, and it’s often useful to distinguish the functions of a library and the functions of the repositories that it uses.

At the same time, though, you can’t plan the development of a library without thinking about its repositories. Repositories really are essential infrastructure for libraries, but not simply as a place to “capture and preserve the intellectual output of university communities” (as a 2002 SPARC white paper put it), or, more pessimistically, as “a place where you dump stuff and then nothing happens to it” (as a 2005 JISC workshop annex put it). The Penn Libraries today rely on hundreds of digital repositories, mostly run by various publishers. We also manage a few important ones ourselves. Here are a few that we manage, or are considering managing:

  • A repository providing open access to the scholarly output of our researchers (what is often thought of as the traditional “institutional repository”). For this repository, we manage the content, and contract with an outside company to manage the servers and develop the software. While many faculty cooperate in populating this repository, and some faculty deposit their own work themselves, librarians do much of the work to populate it.
  • A repository preserving content from some of our electronic subscription resources. This repository is normally only seen by library staff, but it’s an important part of our preservation strategy, and will be exposed selectively when subscription resources it preserves are no longer available from the publisher. We run this repository on a local server, using open source software developed elsewhere, and its content is selected by us and ingested and preserved largely automatically, in cooperation with other users of the same repository software. (We also subscribe to another preservation initiative, involving a centralized preservation repository system that we don’t manage.)
  • The repository used to store content in our main courseware management system. The server is managed by us, using proprietary software, and is populated by instructors from all over the university. It is largely torn down and built anew every semester (sometimes carrying over material from previous semester’s incarnations). While this isn’t a permanent repository, it has very strong and definite persistence requirements that we have to take pains to support. And if some of our users just think of this as a place to do their teaching, and the “repository” aspects just come along for the ride, that’s a feature, not a bug.
  • Repositories for various digital image collections and digitized special collections. Historically these collections have been a mishmash of systems developed ad-hoc, involving filesystems, metadata in a database, custom-built websites, backup procedures, and sometimes little else. We’re currently locally developing a digital library architecture that will unify discovery and usage of many of these collections, and we hope to similarly unify repository management for many of these collections as well. Traditionally, the content is selected by bibliographers and the repositories and collection sites created by techies; we hope that the new architecture will let the bibliographers do more repository management and site design, and let the techies do less site-by-site management and more unified service management.
  • We have also tested repositories for managing numeric data, which are increasingly important shared research resources in many fields. We do not currently have a repository in production for this, but the repositories developed by projects like this one have important features for data-centric research that are not supported to the same extent by “traditional” repository systems.

As you can see from these examples, libraries like ours have all kinds of different uses for repositories, and various ways we can develop and manage them. We’re not starting repositories because they’re what all the cool Research I libraries are doing this year. We’re managing them because they help us provide what we see as important services to our communities. We recognize that different repositories have different uses, and that it often makes more sense to integrate multiple repositories into a single library than to build One Repository to Rule Them All. Once we have a clear understanding of why we would benefit from a particular repository, and what it would manage, we can consider various options for who would run it, where, and how. (And of course, what its costs would be, and how we can realistically expect those costs to be covered. But that’s a topic for another post.)

Now it’s official

As I hoped, the good news was announced by Peter Brantley of the Digital Library Federation while I was away in Canada: the recommendations of the ILS-Discovery interface task group, which we’ve been talking about and drafting over the last many months online and off, have now been officially released. You can find the official release on the DLF website. We’ll be putting some supplementary information on there shortly as well; for now, you can still find background and supplementary material on our wiki.

I’d like to thank the members of the task group for all their work in putting the recommendation together; the Digital Library Federation for sponsoring this work; our steering group (Dale Flecker, Robert Wolven, Marty Kurth, Terry Ryan, and especially Peter Brantley) for all sorts of help and support in making this initiative viable; the Penn Libraries for supporting my chairing the task group this past year (as well as hosting one of the early meetings); the vendors that signed the Berkeley Accord for meeting with us and agreeing to support the basic discovery interface functions describes in our recommendation; and the many library folks, developers, and vendors that gave us suggestions and publicity.

We’ve intended the recommendations to be a first step in an ongoing process of supporting interoperability between the online data and services of libraries and a wide range of discovery applications. The recommendations we produced give fairly detailed proposals for a basic level of interoperability, and more open-ended proposals for higher levels. But you should only spend so long on proposals before it’s time to shift emphasis onto implementing them. With the official version now out, I hope we can start implementing these functions in earnest. (And once we’ve accumulated some experience with implementations, I hope that folks will revisit and refine the recommendations to further help things along.)

Locally, we already have one demonstration implementation, and we hope to now work on getting the basic functions implemented for our actual ILS.) And I hope that many others will be working on or using implementations soon. The DLF is now planning a developer’s workshop for folks interested in implementing the ILS-DI recommendations, which hopefully will convene later this summer. There should also be online forums of various kinds to support folks who are interested in implementing the recommendations or using them in their application. Exactly how these forums will develop over time remains to be seen; but for now, the ILS-DI Google Group is one good place to look for news and discussion of activities related to the ILS-DI recommendation.

I’m thankful myself for having the opportunity to work with so many good people on this project, and look forward to getting to work on implementations, and to continuing the conversations that have started to make the most of library resources and services.

A break, and coming attractions

I’m about to head off to the wilds (okay, the farms) of Saskatchewan to relax with family on a much-welcomed break. I’ve got to the point in packing where we’re trying to figure out which books to bring. (Which involves some careful selection to narrow it down to the number of books we can bring on the ever-more-limited-space airlines without excess baggage problems.)

I leave the ILS-Discovery Interface work in good hands, and there should be good news shortly (hopefully, quite shortly, and well before I return) for folks who are interested in this initiative. I’ll have more to say on what comes next after I get back. Also after I come back, a couple of weeks from now, I’ll be picking up on the repositories series I started last month, with a review of the what-why-who-and-where of the various kinds of repositories that libraries may find of use.

Online book fans may also be interested in following a debate going on now about ebook publishing, business models, and piracy. Author David Pogue had a Times Blog post a couple of weeks ago giving his reasons for not issuing electronic editions of his titles, that drew a long set of reader comments. Now Adam Engst has posted an interesting and detailed rebuttal, where he describes his own sales successes with his ebooks (piracy notwithstanding).

You might also enjoy “Reading sets you free”, an article posted about a month ago by K. G. Schneider (who I had the pleasure of meeting in person recently at a NISO discovery forum.) I was reminded of it again just now as I was trying to think of what books the kids might bring. As in the picture accompanying her article, both of them are very much read-under-the-covers kids at this point, as were both their parents. We’re all looking forward to spending a lot of time conversing with each other and with our books these next couple of weeks.

100 years of the first sale doctrine

On June 1, 1908, 100 years ago today, the US Supreme Court decided Bobbs-Merrill v. Straus, a case that established what would become known as the “first sale doctrine”. This doctrine, now codified as part of the US Copyright Act, says that in general the owners of books or other copyrighted works have the right to dispose of them as they see fit (such as by reselling them, giving them away, or lending them out). The copyright holder can still control the right to make copies, make public performances, or other derivative works. But once a reader has bought a book, they can pass it along as they see fit. (Or keep it, or fold it into little origami shapes for their own amusement. They own it, after all.)

This right exists even in the presence of notices to the buyer that claim to conditionally license the work, rather than sell it. Indeed, those kinds of licenses, familiar now to most computer users, were also at issue in the Bobbs-Merrill case. (For historical background, including some examples of old-time “end user license agreements”, see a post of mine from a few months ago, “The right to read, circa 1906.”)

Despite attempts by many software, music, and ebook publishers to extend control over their products to their buyers, the first sale doctrine is still salient today. Just last month, for example, a federal judge cited the first sale doctrine to uphold the right of an eBay merchant to resell used software. An article in Ars Technica has a link to the decision, and an excellent explanation of the case and the importance of the principles it upholds. Ultimately, as the article points out, the first sale doctrine is what “makes libraries and used book stores possible” without needing the permission of publishers to exist or carry out their missions.

The free access to literature that libraries provide, and the freedom to provide access to literature that the first sale doctrine provides, promote the literacy and education of all our citizens. So this is an anniversary well worth remembering for its contribution to society. Happy First Sale Day!