The library where I work has decided to think long and hard about its digital repository strategy. Your library may be doing this too, or may have recently done so and is now working on carrying out that strategy. If it’s not, it probably should be.
Libraries have for a long time hosted repositories of content in paper form; indeed, such repositories account for a large portion of both the budget and the floor space of many libraries. But many of them have been slow to take on responsibility for digital repositories, or have only done so in a very limited way, compared to their physical repository investments.
But while established libraries have often hesitated in taking up digital repositories, the rest of the world has not. As folks in research libraries have known for a while, a lot of the money we now spend on content pays for electronic resources held in publisher repositories. In typical arrangements, libraries no longer own this content (as they owned the print content the electronic versions supplant) but lease it. And even if a library has a “perpetual access” contract that lets it download publisher content after ending a subscription, for practical purposes many libraries are not ready to host it or make it available as readily and seamlessly as their patrons have grown to expect.
However, even if publisher repositories, or scholar-run discipline repositories like the social scientists’ SSRN, aren’t directly run by traditional libraries, those libraries are among their primary customers. Therefore, the folks who run those repositories have incentives to provide the kinds of services that those libraries need to carry our their missions (at least, if the libraries know to ask for them).
Increasingly, though, people are using new kinds of repositories that have little or no connection to traditional libraries. Some of these repositories are on their users’ own computers– their digital music collection and photo library, managed by programs like ITunes, IPhoto, and Picasa. Some of these repositories are on Internet sites like YouTube, Flickr, Google Docs and Google Base, and the various WikiMedia sites. We often don’t think of all of these as “repositories”, but that’s how people are using them: to manage and provide access to information in a stable way, potentially over a long period of time.
I’m not using “repository” here to mean just “glorified filesystem or website”. The everyday repositories I mention above typically put substantial effort into managing metadata, supporting discovery, providing for access control (and often backup and version control), and supporting long-term access and use of the content. They tend to do all these things much more quietly and unobtrusively than the repositories typically designed for and marketed to libraries, but that’s a feature, not a bug. We who work in research libraries need to consider these “repositories for everybody” very carefully. A lot of the digital content that libraries will want to include in our own collections will come out of those repositories. And those repositories can potentially teach us a lot about how to design and run our own.
That’s one big reason why I want to discuss my library’s strategic thinking about repositories in open forums like this one. True, the Penn Libraries don’t have exactly the same uses and needs for repositories as other people and groups. But I think there are a lot of repository issues where we and many others share common interests, or have common questions we all need to answer. Over a series of posts, I hope to discuss repository purposes, infrastructure, technologies, ingest, workflow, labor allocation, lifecycles, legal concerns, integration, policy, and community, all of which are relevant to our repository plans. The strategies and issues most salient for Penn may or may not be the same as yours. But if repositories matter to you, I hope that discussing our issues in a broader context will give you useful things to think about for your own situation. And I hope that we will learn from you as well.
Lots of other people have already written thoughtfully on repositories. I hope to stealreuse and build on their ideas wherever I can. A good introduction to many of the issues can be found at JISC’s Repository Support Project, a website to help institutions planning repositories, starting from “What is a repository, anyway?” and working from there. (It’s not a given, by the way, that libraries should always run their own repositories for their digital content– but more on that later.)
Repository planners should be familiar with both the theory and practice of repositories. You don’t have to know all the details of the OAIS reference model, for instance, but it’s helpful to know the general principles it sets out, both for issues to think about in running a repository over a long term, and for a conceptual vocabulary for understanding and interacting with other repository initiatives. Likewise it helps to at least be conversant with standard metadata schemas, protocols, recommended procedures, and the like. But you also very much need to know how repositories are working, or not working, in practice. The JISC site I mentioned earlier has an interesting case studies section, where folks who have run repositories describe their experiences, and how they may have differed from expectations. Some repository managers also run blogs where they talk about their day-to-day experiences with repositories, good and bad. Les Carr’s RepositoryMan and Dorothea Salo’s Caveat Lector are two blogs that I find must-reads, for keeping track of new developments repository maintainers can use and practical problems that repository planners can’t afford to ignore.
Future installments in this series will be posted under the “repositories” category. In the meantime, if you’re interested in these issues, I recommend you check out the resources above. And I’d be very interested in hearing about particular issues that should be discussed here.