Views of possible future architectures of cataloging

PALINET convened a symposium today here in Philadelphia on the future of cataloging. There was a full turnout, with over 150 library professionals attending. It appeared that the organizers had to scramble a bit to distribute lunch to the large crowd. I waited for a few minutes in a line that hardly seemed to move at all, and then some logjam cleared, enabling us all to get our food in short order. (I did notice that by the time I picked up my own box lunch, no one was checking the tickets that specified what food we were entitled to take.) Would that all our cataloging projects could resolve their workflow and backlog issues so quickly.

The opening keynote was by Karen Calhoun, now at OCLC, whose controversial 2006 report for the Library of Congress touched off a fierce debate among librarians over what kinds of changes should take place in library catalogs. Her address at this symposium was less controversial, and dealt with transitions in the work of the folks that catalog and manage collections. My Penn colleague Beth Picknally Camden took part in the followup panel, remarking on the “perpetual beta” viewpoint that we’re encouraging in our library as we shift to new responsibilities and strategies. Also on the panel were Diane Hillmann (at Cornell until recently), and Christine Schwartz, whose blog, Cataloging Futures, is well worth following if you’re interested in future directions of library catalogs. (Besides the ongoing posts, its “key resources” column gives a useful overview of many of the current debates on cataloging.)

The symposium also provided an opportunity to learn more about the Library of Congress’s 2007 recommendation on the future of bibliographic control (in a presentation by Nancy Fallgren), as well as FRBR and RDA, two bibliographic standards proposed to become the new basis for bibliographic description (and featured in a presentation by John Attig.) I would have loved to go to both talks, preferably one right after another– if nothing else, the contrasting points of view would have been interesting. (The LC report recommended that work on RDA be suspended, in part due to concerns about the practicality of FRBR.) Alas, they were at the same time, so I attended Attig’s talk, which covered material less familiar to me than the contents of the LC report.

I also had to miss Christine di Bella’s talk on special collections cataloging to give my own talk. I’m not firmly settled into any established camp in the cataloging debate, but I’ve noticed that architectural issues– information architecture, systems architecture, and social architecture– underly many of the ongoing cataloging debates, and aren’t always explicitly considered or fleshed out. So I tried to address some of them in my talk, using projects I’m involved with such as subject maps, ILS discovery interfaces, and PennTags, as examples of designs that aim for a more robust catalog architecture. The slides I’ve used, which include pointers to more information about all these projects, are now posted on my Selected Works website. PALINET also intends to put the audio and slides of all of us who spoke on their website (though I’m not quite sure where they will end up, or whether they will be all accessible to the general public.)

I was happy to see several people raise the importance of freely sharing cataloging data, something that’s all too often hindered by existing contracts, and which severely impairs the community’s ability to improve the catalog collectively. Diane Hillmann was particularly eloquent on this issue, urging people to consider open source-like business models that support themselves by providing the best services, not by hoarding data. My talk also touched on “open data” issues. (And Karen Coyle recently blogged on an example of the kind of damage we’re inflicting on ourselves by not agreeing to share.) I did hear some encouraging hints suggesting that some aggregators might be moving towards more open sharing of commonly managed catalog records, as well as easier ways for the cataloging community to refine and improve on these records. We’ll see what happens.

This was my first PALINET symposium, and the first conference I’ve been to that focused specifically on cataloging issues. I’m very glad I went, and I thank PALINET for inviting me to speak (and running a smooth and enjoyable conference, lunch lines notwithstanding). If you’re interested in these issues, I hope you’ll find my talk slides of interest, and hope we’ll see more materials from the speakers online as well before long.

An implementation of the DLF’s Basic Discovery Interfaces recommendation

The DLF’s ILS-Discovery interface recommendation work, which I’ve been leading, continues. We’re now in the process of producing the official recommendation, which I hope will be out soon. (Especially since I fully intend it to be out there before I head off to the great white North in early June.) And the May Library Gang podcast features a conversation with me and various other folks in libraries and the commercial world about the ILS-DI work and its implications.

You don’t have to wait until the official release, though, to start experimenting with the interfaces. I’ve now implemented the Level 1 recommendations for The Online Books Page, so folks can see what an implementation can look like to an application. (And you’re also free to just use the interfaces if you find the data and services useful, though I reserve the right to limit access to them if out server gets overloaded.) I’ve also put up a page with more information on the interfaces and how to use them.

I’m hoping we’ll see ILS-DI interfaces for standard ILSs as well before long (whether they’re provided by ILS vendors or library developers working on top of vendor interfaces.) We have some interest in having the interfaces on top of our Voyager catalog, though that would take a while longer to implement. The Online Books Pages implementation, though, shows how the interfaces aren’t just for ILS’s, but can also use data and services from other online digital collections.

If the recommended interfaces become sufficiently widely and uniformly supported, a discovery application could draw on a wide range of sources, both in a local library and beyond it, and let its users discover resources from any or all of them in a largely seamless fashion. Which I think is a great way to help readers take full advantage of the library resources we all make available for them.

In the meantime, I hope you find this example implementation useful. I’ll be happy to hear and answer questions and comments about it, and about the ILS-DI work in general.

Everybody’s repositories (first of a series)

The library where I work has decided to think long and hard about its digital repository strategy. Your library may be doing this too, or may have recently done so and is now working on carrying out that strategy. If it’s not, it probably should be.

Libraries have for a long time hosted repositories of content in paper form; indeed, such repositories account for a large portion of both the budget and the floor space of many libraries. But many of them have been slow to take on responsibility for digital repositories, or have only done so in a very limited way, compared to their physical repository investments.

But while established libraries have often hesitated in taking up digital repositories, the rest of the world has not. As folks in research libraries have known for a while, a lot of the money we now spend on content pays for electronic resources held in publisher repositories. In typical arrangements, libraries no longer own this content (as they owned the print content the electronic versions supplant) but lease it. And even if a library has a “perpetual access” contract that lets it download publisher content after ending a subscription, for practical purposes many libraries are not ready to host it or make it available as readily and seamlessly as their patrons have grown to expect.

However, even if publisher repositories, or scholar-run discipline repositories like the social scientists’ SSRN, aren’t directly run by traditional libraries, those libraries are among their primary customers. Therefore, the folks who run those repositories have incentives to provide the kinds of services that those libraries need to carry our their missions (at least, if the libraries know to ask for them).

Increasingly, though, people are using new kinds of repositories that have little or no connection to traditional libraries. Some of these repositories are on their users’ own computers– their digital music collection and photo library, managed by programs like ITunes, IPhoto, and Picasa. Some of these repositories are on Internet sites like YouTube, Flickr, Google Docs and Google Base, and the various WikiMedia sites. We often don’t think of all of these as “repositories”, but that’s how people are using them: to manage and provide access to information in a stable way, potentially over a long period of time.

I’m not using “repository” here to mean just “glorified filesystem or website”. The everyday repositories I mention above typically put substantial effort into managing metadata, supporting discovery, providing for access control (and often backup and version control), and supporting long-term access and use of the content. They tend to do all these things much more quietly and unobtrusively than the repositories typically designed for and marketed to libraries, but that’s a feature, not a bug. We who work in research libraries need to consider these “repositories for everybody” very carefully. A lot of the digital content that libraries will want to include in our own collections will come out of those repositories. And those repositories can potentially teach us a lot about how to design and run our own.

That’s one big reason why I want to discuss my library’s strategic thinking about repositories in open forums like this one. True, the Penn Libraries don’t have exactly the same uses and needs for repositories as other people and groups. But I think there are a lot of repository issues where we and many others share common interests, or have common questions we all need to answer. Over a series of posts, I hope to discuss repository purposes, infrastructure, technologies, ingest, workflow, labor allocation, lifecycles, legal concerns, integration, policy, and community, all of which are relevant to our repository plans. The strategies and issues most salient for Penn may or may not be the same as yours. But if repositories matter to you, I hope that discussing our issues in a broader context will give you useful things to think about for your own situation. And I hope that we will learn from you as well.

Lots of other people have already written thoughtfully on repositories. I hope to stealreuse and build on their ideas wherever I can. A good introduction to many of the issues can be found at JISC’s Repository Support Project, a website to help institutions planning repositories, starting from “What is a repository, anyway?” and working from there. (It’s not a given, by the way, that libraries should always run their own repositories for their digital content– but more on that later.)

Repository planners should be familiar with both the theory and practice of repositories. You don’t have to know all the details of the OAIS reference model, for instance, but it’s helpful to know the general principles it sets out, both for issues to think about in running a repository over a long term, and for a conceptual vocabulary for understanding and interacting with other repository initiatives. Likewise it helps to at least be conversant with standard metadata schemas, protocols, recommended procedures, and the like. But you also very much need to know how repositories are working, or not working, in practice. The JISC site I mentioned earlier has an interesting case studies section, where folks who have run repositories describe their experiences, and how they may have differed from expectations. Some repository managers also run blogs where they talk about their day-to-day experiences with repositories, good and bad. Les Carr’s RepositoryMan and Dorothea Salo’s Caveat Lector are two blogs that I find must-reads, for keeping track of new developments repository maintainers can use and practical problems that repository planners can’t afford to ignore.

Future installments in this series will be posted under the “repositories” category. In the meantime, if you’re interested in these issues, I recommend you check out the resources above. And I’d be very interested in hearing about particular issues that should be discussed here.