Repositories: Benefits, costs, contingencies (with an example)

(This is the third post in a slow-cooking series on repositories.)

In my last repository post, I listed a variety of repository types that we maintain at our institution, each with different content, operation, and policies. At the end of the post, I wrote:

Once we have a clear understanding of why we would benefit from a particular repository, and what it would manage, we can consider various options for who would run it, where, and how. (And of course, what its costs would be, and how we can realistically expect those costs to be covered….)

Without a clear sense of benefits and costs, you won’t have a sensible repository strategy. And, as Dorothea Salo reminds us today, without a sensible strategy you’re likely to burn through a lot of money, labor, and goodwill with little to show for it at the end. You have to go in knowing what you want, and being realistic about what you’re willing to invest to produce it. (For instance, if you’re planning to build a repository of your community’s own scholarship, and hope to get lots of free help from your community just by doing some marketing, you really need to read Dorothea’s post for a reality check.)

Even when your initial plan is sound, you have to be prepared for change, and the unexpected. Technology changes quickly. Online tools, communities, and scholarly societies also change. Methods of scholarship also change, often more slowly, but sometimes in significant ways. Even if you’ve done your homework, you may eventually find that the repository that seemed just fine a few years ago doesn’t really meet your needs like it used to. Maybe the software hasn’t been updated as you’d like it, and there’s a better system available now. Maybe you’re storing different kinds of things, or you’ve found a new application that your scholars really want to use that’s not compatible with your existing setup. Maybe the formats you’re managing have gone out of date. Maybe it becomes more cost effective to move to a big externally managed repository that your scholars are flocking to already– or away from one that they’re not finding useful. Maybe you even decide it no longer makes sense for you to maintain a particular repository.

You need to start thinking about strategies for change (and for exit) the moment you start planning a repository. Remember, repositories ultimately don’t exist for themselves, but for their content (and for the people using that content). And the kind of content that libraries often care about is likely to remain relevant much longer than any particular repository configuration. You want to ensure that the content remains useable for as long as your patrons care about it, even as it moves and migrates between systems (and possibly, between caretakers).

An example: Planning for data repository services

What does it mean, practically, to plan with benefits, costs, and contingencies in mind? Well, at Penn, we’re starting to consider repository services for data sets. We have a general idea of the benefits of archiving data sets, because we’ve heard from faculty in various departments who want to analyze data previously collected by research groups (their own or others), who are having a hard time managing their own data, or who are required by their journals or support agencies to publish or maintain their data sets. Before we commit to providing a new data repository service, though, we need a better sense of these benefits. How broad and deep is the desire for data services among our faculty? Where is it most acute, in terms of disciplines and services? What would be gained from having our institution provide our own data repository services, rather than just having our scholars use someone else’s services, or fend for themselves? What are the benefits of introducing services specifically for data, rather than just, say, saving data sets alongside other files in existing repositories? If we’re considering a significant investment, we need more than just anecdotal answers to these questions. A survey of faculty in various disciplines can give us a better idea of how they could benefit from and support data repository services.

We also have to consider costs. What options do we have for creating, acquiring, or contracting with a data repository or repository service? What do they cost to install and run, both in monetary and staffing terms? What are the costs of acquiring content (again in money and labor, where the labor might come from librarians, scholars, or students)? How about costs of maintaining, accessing, and migrating the content? How will these costs be covered? What about costs associated specifically with this kind of content? Are there confidentiality, security, intellectual property, or liability concerns we have to consider? To help answer these questions, we should evaluate various data repository systems in existence and in development. The faculty survey mentioned above could also help us answer some of the questions about labor and support.

Contingencies, by their nature, tend not to be fully foreseeable. But there are a few obvious things we can ask about and plan for. Will our data still be readable for decades to come? Can we migrate it to new formats, and if so, what would be involved? Can we make sure we have good enough metadata and annotation to know how to read, use, and migrate the data in the future? Do we have clear identifiers for our content that will survive a move to a new platform (and leave a workable forwarding address, if necessary)? What happens to our content if our repository loses funding, our machine room is sucked into a mini-black-hole, or we simply decide it’s not worth the trouble of keeping the repository going? What do we do if we’re told to withdraw or change the data we’re maintaining, by the person who deposited it, by someone else using or mentioned in the data, or by the government? We won’t necessarily come up with definitive answers to all these questions, but brainstorming and thinking through possible and likely scenarios should help us know what to expect and reduce the chance of our getting caught unawares by a costly problem.

Is it worth it?

That’s a lot to do, you might be thinking, before you even get started. Can’t we just put this cool system up and see what happens? Well, you could, if you and your community will be satisfied with something that might be here today and gone tomorrow, and that doesn’t have any support or reliability guarantees. But if you have scholars to serve, and you’d like them to take the time and trouble to entrust their content to your repository, they’re probably going to want some reassurance that the repository will have staying power, and give them benefits worth their time. Otherwise, they have plenty of other, more important things to do.

Running a large, successful, long-lasting repository takes a lot of work over its lifespan. Better to do some planning work up front than get stuck with a lot of costly and unnecessary work later on.

About John Mark Ockerbloom

I'm a digital library strategist at the University of Pennsylvania, in Philadelphia.
This entry was posted in repositories. Bookmark the permalink.