Everybody's Libraries

October 13, 2008

What repositories do: The OAIS model

Filed under: preservation,repositories — John Mark Ockerbloom @ 11:23 pm

(Another post in an ongoing series on repositories.)

In my previous post, I mentioned the OAIS reference model as an influential framework for thinking about and planning repositories intended for long-term preservation. If you’re familiar with some of the literature or marketing for digital repositories, you may well have seen OAIS mentioned, or seen a particular system marketed as “OAIS compliant”. You may have also noticed remarks that it’s not always clear in practice what OAIS compliance means. The JISC Standards Catalogue notes “The [OAIS] documentation is quite long and complex and this may prove to be a barrier to smaller repositories or archives.” A common impression I’ve heard of OAIS is that it’s a nice idea that one should really try to pay more attention to, but complex enough that one will have to wait for some less busy time to think about it. Perhaps, one might think, if we just pick a repository system whose marketing says it’s OAIS compliant, we can be spared thinking about it ourselves.

I think we can do better than that, even in smaller projects. The basics of the OAIS model can be understood without having to be conversant with all 148 pages of the reference document. Those basics can help you think about what you need to be doing if you’re planning on preserving information for a long term (as most libraries do). The basics of OAIS also make it clear that following the model isn’t just a matter of installing the right product, but of having the right processes. It’s made very explicit that repository curators need to work with the people who produce and use the information in the repository, and make sure that the repository acquires all the information necessary for its primary audience to use and understand this information far into the future.

To help folks get oriented, here’s a quick introduction to OAIS. It won’t tell you everything about the model, but it should let you see why it’s useful, how you can use it, and what else you might need to consider in your repository planning.

What OAIS is and isn’t

First, let’s start with some basics: OAIS is a reference model for Open Archival Information Systems (whose initials make up the OAIS), that’s now an ISO standard, but is also freely available. It was developed by NASA’s Consultative Committee for Space Data Systems, who have had to deal with large volumes of data and other records generated by decades of space missions and observations, so they’ve had to think hard about how to manage and preserve it. To develop OAIS, they had open discussions with lots of other people and groups (like the National Archives) who were also interested in long-term preservation. OAIS is called “Open” because of the open process that went into creating it. It does not require that the archives are open access, or have open architecture, and it has no direct relation to the similarly-acronymed Open Archives Initiative (OAI). (Though all of these things are also useful to know about in their own right.) An “archival information system” or “archive” can simply be thought of as a repository that’s responsible for long-term preservation of the information it manages.

Unlike many standards, OAIS specifies no particular implementation, API, data format, or protocol. Instead, it’s an abstract model that provides four basic things:

  • A vocabulary for talking about common operations, services, and information structures of a repository. (This alone can provide very useful common ground for different people who use and produce repositories to talk to each other.) A glossary of this vocabulary can be found in section 1 of the reference model.
  • A simple data model for the information that a repository takes in (or “ingests”, to use the OAIS vocabulary), manages internally, and provides to others. This information is assumed to be in distinct, discrete packages known as Submission Information Packages (SIPs) for ingestion, Archival Information Packages (AIPs) for internal management, and Dissemination Information Packages (DIPs) for providing the information to consumers (or to other repositories). These packages include not just raw content, but also metadata and other information necessary for interpreting, preserving, and packaging this content. They have different names because the information they contain can take different forms as it goes into, through, and out of the archive. They are described in more detail in sections 2 and 4 of the reference model.
  • A set of required responsibilities of the archive. In brief, the archive (or its curators) must negotiate with producers of information to get appropriate content and contextual information, work with a designated community of consumers to make sure they can independently understand this information, and follow well-defined and well-documented procedures for obtaining, preserving, authenticating, and providing this information. Section 3 of the model goes into more detail about these responsibilities, and section 5 discusses some of the basic methodologies involved in preservation.
  • A set of recommended functions for carrying out the archive’s required responsibilities. These are broken up into 6 functional modules: ingest, data management, archival storage, access, administration, and preservation planning. The model describes about half a dozen functions in each model (ingest, for example, includes things like “receive submission”, “quality assurance”, and “generate AIP”) and data flows and dependencies that might exist between the functions. Some of these functions are automated, some (like “monitor technology”), are carried out by humans, and some may involve a combination of human oversight and automated assistance. The functions are described in more detail in section 4 of the model (with issues of multi-archive interoperability discussed in Section 6.)

OAIS conformance and usage

It is important to note that OAIS compliance simply requires fulfilling the required responsibilities, and supporting the basic OAIS data model of information packages. A repository is not required to implement all the functions recommended in the OAIS model, or replicate the detailed internal data flows, to be OAIS compliant. But it can be very useful to look through the functions in any case, both to make sure that your repository is doing everything it needs to do, and to see how the big problem of reliable data preservation can be broken down into smaller, more manageable operations and workflows.

You may also find the functions a useful reference point for detailed descriptions of the exact formats and protocols your repository uses for ingesting and storing information, providing content to users, and migrating it to other repositories. Although the OAIS model does not itself provide specific formats or protocols to use, it makes it clear that a repository provider needs to specify these so it can receive information from producers and make it clearly understandable to consumers.

The OAIS model has been used to help construct more detailed criteria for trusted repositories, as well as checklists for repository audit and certification. In most cases, repositories will operate perfectly well without satisfying every last criterion or checklist item. At the Partnerships in Innovation symposium I attended last week, Don Sawyer, one of the main people behind OAIS, remarked that the archives where he worked satisfied about 80% of the trusted repository checklist items. But he still found it useful to go through the whole list to verify that certain functions were not relevant or required for their repository needs, as well as to spot aspects of the repositories (like disaster recovery or provenance tracking) that might need more attention. Similarly, you can go through the recommended OAIS functions and data-model breakdowns to evaluate what’s important to have in your repository, what can be safely omitted, and what might need more careful attention or documentation.

What else you need to think about

Although the OAIS model includes examples of various kinds of repositories that might use it, it’s at its heart a fairly generic, domain-independent model, largely concerned with preservation needs. It doesn’t say a whole lot about how a repository needs to interact with specific communities to fulfill its purposes. For instance, in the talk I gave last week, I stressed the importance of designing the architecture of repositories to support rich discovery mechanisms. As Ken Thibodeau noted in later conversation, the access model of OAIS is more primitive than the architectures I described. OAIS is not incompatible with those architectures, but designing the right kinds of discovery architectures requires going beyond the criteria of OAIS itself.

You’ll also need to think carefully about the needs of the communities you’re collecting from and serving. The OAIS model notes this requirement, but doesn’t pursue it in depth. I can understand why it doesn’t, since those needs are highly dependent on the domain you’re working in. A repository intended to preserve static, published text documents for possible use in legal deposition will need to interact with its community very differently from, say, a repository intended to manage, capture, and ultimately preserve works in progress used in ongoing research and teaching. They both have preservation requirements that OAIS may well address effectively, but designing effective repositories for these disparate needs may require going well beyond OAIS, doing detailed requirements analyses, and assessing benefits and costs of various options.

I’ll talk more about requirements for particular kinds of repositories in later posts. But I hope I’ve made it clear how the OAIS model can be useful for general thinking and planning what a repository needs to do to manage and preserve its content. If it sounds promising, you can download the full OAIS model as a PDF. A revised document that will clarify some of the terminology and recommendations, but will not substantially change the model, is expected to be released in early 2009.

2 Comments

  1. Nice post, thanks John. A couple of comments may be worth making. First, as you mentioned, OAIS is undergoing review at the moment; a process that does appear to be taking an inordinately long time, and unfortunately does not appear to be marked by the same openness. I’ve written about this at http://digitalcuration.blogspot.com/2008/09/oais-revision-moving-forward.html. The good news is that we now do have some responses to our comments, and the original group that commented will get a chance to have a look at those responses in the next month or so. The revisers think they will have a new draft for consultation early next year. It’s important, I’d ask people to look out for it, and get involved in improving it.

    Second is much more sobering. A couple of times recently I’ve asked groups involved in repositories their views on the applicability of OAIS, eg in a discussion in JISC using the Ideascale system, see http://jiscrepository.ideascale.com/akira/dtd/2276-784. The participants are intelligent people with a good knowledge of repositories, and they voted 16 to 3 AGAINST the idea that “The repository should be a full OAIS preservation system”. I’m not sure what this means! I did get them to vote 13 to 1 in favour of the idea that “Repository should aspire to make contents accessible and usable over the medium term”, see http://jiscrepository.ideascale.com/akira/dtd/2643-784. But it seems pretty clear that repository managers quite distrust the OAIS model. Interesting, no?

    Comment by Chris Rusbridge — October 14, 2008 @ 3:32 am

  2. John, You are on the right track in your efforts to make OAIS more approachable for repositories. I’m not sure that your description is simple enough, even now. I’ve presented on digital preservation to repository managers in the UK as part of the Repositories Support Project. On the first occasion I put up an untitled slide on OAIS and asked who knew what it was. Not one hand went up. Further presentations tell me that for non-specialists you can’t make digital preservation simple enough. I can’t fault the level of interest of my audiences and their willingness to participate in practical work, but I always end up asking myself if I have given them enough to go away with and act on for their repositories, and the answer, however hard I try to focus on the basics, is probably not enough.

    The other problem, as you say, is there are various kinds of repositories. Repositories are not well defined yet are still evolving, towards more service-oriented architectures, ‘cloud’ services, etc. This evolution is in its early stages. Although it is possible to apply OAIS to these architectures in principle, how effective it will be in practice at enabling repositories to plan and manage preservation requirements given the prospect of multiple, interacting and overlapping services and providers is harder to gauge.

    I believe it will continue to be the case for the time being that IT development will be driven by new technology and applications rather than the needs of preservation, and as a result preservation approaches will necessarily have to be reactive. In the case of repositories, we will have to fit OAIS analyses to emerging architectures, when we can see more clearly what these are, rather than expect it to happen the other way round.

    Comment by Steve Hitchcock — October 29, 2008 @ 10:06 am


RSS feed for comments on this post.

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 83 other followers

%d bloggers like this: