Everybody's Libraries

Chances to stop and think about the future of library catalogs

Posted on January 14, 2009 by John Mark Ockerbloom

OCLC yesterday announced that it is putting its new catalog record usage policy on hold while its Members Council convenes a group to review the proposed policy and suggest changes. I’m very happy to hear this, as the policy was due to take effect next month, and has been subject to extensive criticism online (both from me and from many others).

Whatever one thinks about the policy, it’s clear that changing the basic terms of sharing our knowledge about library resources– either to tighten the restrictions on use or open them up– is a fundamental amendment to the cooperative that has built up the information base of WorldCat. It’s akin to a constitutional amendment in the political sphere. And like the political unions of people in states, and of states in the United States, the members of the union should have the opportunity not only to suggest, but also to approve or reject, fundamental changes to the basic agreement.

The policy review group will initially meet in late January, according to OCLC’s press release, with the goal of producing a report that can be discussed in the member’s council meeting in May. I’d imagine they would welcome thoughts and suggestions from anyone who’s involved with or thinking about the future of catalogs.

The upcoming ALA Midwinter meeting in Denver will provide a number of opportunities to think and talk about the future of catalogs, and how we manage knowledge about our knowledge. I’ll be out there, albeit briefly, for a couple of events.

One is an ALCTS forum “Creating and Sustaining Communities Around Shared Library Data”, which will include a discussion of the data sharing policies of OCLC and other organizations. I’ll be speaking at this forum (which will be on Monday at 8am in the Denver Convention Center), along with Karen Calhoun of OCLC, Brian Schottlaender of UC San Diego, and Peter Murray of OhioLINK. I’m hoping to give an accessible presentation of the concerns of library practitioners, application developers, and metadata hackers, and would be happy to hear suggestions on issues I should take into account.

I’ll also be talking at the LITA Next Generation Catalogs interest group about the ILS-Discovery interface recommendations I worked on along with various other library professionals for DLF. (This will be on Sunday at 4pm, also in the Denver Convention Center, and will also include a presentation from the BiblioCommons folks.)

If you’re around for the Midwinter meeting, you might also want to check out biblios.net, a freely shared database of catalog records (and an open source editor for working with them) that LibLime will be demonstrating during the meeting. I haven’t played around with it yet myself, but it suggests some interesting new ways of working with and sharing library metadata. Midwinter will also include sessions on RDA, an updated system for describing library resources that’s now looking for feedback and testers to see if it’s ready to become the next standard for cataloging.

From WorldCat to Google, the way we use catalogs and other metadata services is changing rapidly. I hope we’ll have a chance during ALA, and during OCLC’s policy review period, to think carefully and creatively about how we should change these services to meet the needs of today’s and tomorrow’s information seekers. And then I hope we’ll make those changes happen.

Posted in libraries, metadata, open access, sharing | Comments Off

Repository services, Part 1: Galleries vs. self-storage units

Posted on January 13, 2009 by John Mark Ockerbloom

Back near the start of my occasional series on repositories, I noted that we had not just one but a number of repositories, each serving different purposes.

In tight budgetary times, this approach might seem questionable. Right now, we’re putting up a new repository structure (in addition to our existing ones) to keep our various digitized special collections and make them available for discovery and use. We hope this will make our digital special collections more uniformly manageable, and less costly to maintain.

At the same time, we’re continuing to maintain an institutional repository of our scholars’ work on a completely different platform, one for which we pay a subscription fee annually. I’ve heard more than one person ask “Well, once our new repository is up, can’t we just move the existing institutional repository content into it, and drop our subscription?”

To which I generally answer: “We might do that at some point, but right now it’s worth maintaining the subscription past the opening date of our new repository.” The basic reason is that the two repositories not only have different purposes, but also, at least in their current uses, support very different kinds of interactions, with different kinds of audiences.

The interactions we need initially for the repository we’re building for our special collections are essentially internal ones. Special collections librarians create (or at least digitize) a thematic set of items, give them detailed cataloging, and deposit them en masse into the collection. The items are then exposed via machine interfaces to our discovery applications, that then let users find and interact with the contents in ways that our librarians think will best show them off.

The repository itself, then, can work much like a self-storage unit. Every now and then we move in a bunch of stuff, and then later we bring it out into a nicer setting when people want to look at it. Access, discovery, and delivery are built on top of the repository, in separate applications that emphasize things like faceted browsing, image panning and zooming, and rare book page display and page turning.

Our institutional repository interacts with our community quite differently. Here, the content is created by various scholars who are largely outside the library, who may deposit items bit by bit whenever they get around to it (or when library staff can find the time to bring in their content). They want to see their work widely read, cited, and appreciated. They don’t want to spend more time than they have to putting stuff in– they’ve got work to do– and they want their work quickly and easily accessible. And they’d like to know when their work is being viewed. In short, they need a gallery, not just a self-storage unit. They want something that lets them show off and distribute their work in elegant ways.

Our institutional repository applications, bundled with the repository, thus emphasize things like full text search and search-engine openness, instant downloads of content, and notification of colleagues uploading and downloading papers.

We could in theory build similar applications ourselves, and layer them on top of the same “self-storage” repository structure we use for special collections. (Museums likewise often have their exhibit galleries literally on top of the bulk of their collection kept in their basements, or other compact storage areas.) But it would take us a while to build the applications we need, so for now we see it as a better use of our resources to rely on the applications bundled with our institutional repository service.

(An alternative, of course, would be to see if an existing open source application would serve our needs. I hope to talk more about open source repository software in a future post, but we haven’t to date decided to run our institutional repository that way.)

I hope I’ve at least made it clear that for a viable institutional repository, you need quite a bit more than just “a place to put stuff”: you need a suite of services that support its purposes. In Part 2, I’ll enumerate some of the specific services that we need or find useful in our institutional scholarship repository.

Posted in repositories | 1 Comment

Public Domain Day 2009: Freeing the libraries

Posted on January 1, 2009 by John Mark Ockerbloom

In many countries, January 1 isn’t just the start of a new year: it’s the time when a new year’s worth of works are welcomed into the public domain. As I noted in last year’s Public Domain Day post, countries that use the copyright terms specified by the Berne Convention bring works into the public domain on the first January 1 that’s more than 50 years after the death of their authors. So today, most works by authors who died in 1958 join the public domain in those countries. This page at authorandbookinfo.com lists many such authors, and their books. Some of the more notable names include James Branch Cabell, Rachel Crothers, Dorothy Canfield Fisher, C. M. Kornbluth, Mary Roberts Rinehart, Robert W. Service, and Ralph Vaughan Williams.

Many countries, however, have extended their copyright terms in recent years. Most European Union countries, for instance, took 20 years worth of works out of the public domain in the 1990s when the EU mandated that copyright terms be extended to run for the life of the author plus 70 years. This year, they come a little bit closer to recovering their lost public domain, welcoming back works by authors who died in 1938, including people like Karel Capek, Zona Gale, Georges Melies, Constantin Stanislavsky, Osip Mandelstam, Owen Wister, and Thomas Wolfe.

In some other countries, very little is entering the public domain today. Here in the US, we’re midway through a freeze on most copyright expirations, resulting from a term extension enacted in 1998. We now have 10 years to go until copyrights on published works start expiring again due to age. (By 1998, all works copyrighted prior to 1922 had entered the public domain. Remaining copyrights from 1923 are scheduled to expire at the start of 2019.) Some special interests would like to make copyright terms even longer (even “forever less one day”, as Congresswoman Mary Bono requested on behalf of the movie industry). Those of us who value the public domain will need to ensure that it is not further eroded, and that copyrights are allowed to expire on schedule. This is in keeping with the intents of the country’s founders, who specified in the Constitution that copyrights were meant to last only for “limited times”.

But even though few works are entering the public domain in the US today, many more works are now freely and easily available to the public today than a year ago. Much of this is thanks to initiatives like Google Books and the Open Content Alliance, which are digitizing books and other works that libraries have acquired and preserved. Many of the digitized works are in the public domain, and these projects have been making them freely readable and downloadable when they can confirm their public domain status. And now that Google has negotiated a settlement with book publisher and author groups, they plan to be more proactive about identifying and releasing public domain works, including works published after 1922 that are out of copyright (but are not so easy identified as public domain as older books are).

These works have been part of the public domain for years, but when they were simply sitting on the shelves of a few research libraries, they weren’t doing the public much good. Once they’re digitized, though, and their digitizations and descriptions are shared online, they can be much more easily found, read, adapted, and reused by anyone online. By opening up the treasure trove of public domain expression that libraries have preserved, we magnify its value. When libraries share their intellectual endowment, they better fulfill their mission to bring art and knowledge to readers, and make it easy for readers to learn, build on, and be enriched by this knowledge.

I wish I could say that libraries always acted with this understanding. Unfortunately, all too often libraries and affiliated organizations have been resistant or slow to share the information they compile and control. The effective value of what libraries offer has been significantly diminished as a result.

Sometimes libraries simply have not moved as quickly as they could. The Copyright Office has long provided online access to copyright records, but only from 1978 onward. I started digitizing older copyright records over 10 years ago, and a few libraries started doing so as well, but many older records have not yet been publicly digitized, though they’re available in printed form in many government depository libraries. These records can make it much easier to verify public domain status of many works, and then make them available to the public.

Sometimes libraries and affiliated organizations put up their own restrictions on sharing information they already have in digital form. I had a series of posts in November, for instance, criticizing OCLC‘s newly revised restrictions on sharing and reusing catalog records that libraries have contributed to WorldCat, the largest shared cataloging resource for libraries. The data in WorldCat can be the basis for many useful and innovative applications to direct readers towards useful information resources, and information about those resources. And in December, an extremely useful downloadable semantic web representation of Library of Congress subject headings, the basis for information discovery applications like this one, was ordered taken down by LC administrators.

In the new year, I hope to encourage libraries to be more open in sharing their knowledge resources (and to support partners that also enable such openness). My gifts to the public domain this year are in that spirit.

The first one, dedicated immediately to the public domain, is the start of a simple, free decimal classification system, intended to be reasonably compatible with certain existing library standards, but freely available and usable by anyone for any purpose. (I created this after someone requested such a system for their institutional repository, and found out that the current Dewey Decimal system is subject to usage restrictions based on copyright and trademark.) While this is more of a proof of concept than something I expect libraries to adopt in great numbers, I hope it inspires further open sharing of library metadata and standards.

Also, as I did last year, I’m dedicating another year’s worth of copyrights that I can control, this time from 1994, to the public domain, so that they follow the initial 14 year copyright term originally prescribed by this country’s founders. These copyrights include the first versions of Banned Books Online, and the first database-driven versions of The Online Books Page. Versions of these resources from 1994 and earlier are now given to the public domain.

I hope readers find value in these, and all the other public domain and freely licensed works they can enjoy and use online. Happy Public Domain Day!

Update: See also the Public Domain Day posts at Creative Commons, and the Center for Internet and Society.

Posted in copyright, discovery, open access | 3 Comments

Revised ILS-Discovery interface recommendation released

Posted on December 9, 2008 by John Mark Ockerbloom

I’ve just sent the following announcement out to the ILS-Discovery Interface Google Group:

The Digital Library Federation’s ILS-DI task group has officially released revision 1.1 of their recommendation for standard interfaces for integrating the data and services of the Integrated Library System (ILS) with new applications supporting user discovery.

Our initial official release (“revision 1.0”) was made in June, and included a recommendation of a basic level of interoperability (the Basic Discovery Interfaces, or “Level 1” interoperability) that was agreed to by many ILS vendors in the “Berkeley Accord“.

In August, the DLF convened an implementor’s meeting in Berkeley that was attended by a number of developers and vendors of ILS and discovery software. In the meeting, we agreed to make certain changes to clarify the requirements of the basic level of compliance, and to make them more useful for discovery applications. A revised draft that included these changes was made available for comment at the end of October. We now release the final version.

We hope that this revision will be useful for people implementing ILS’s, ILS interaction layers, and discovery applications, and enable easier interoperation between ILS’s (existing and planned) and innovative discovery applications of all kinds. We look forward to seeing implementations of these recommendations (some of which are already in progress), and further progress towards interoperability and improved discovery of the knowledge resources of libraries.

I’d like to re-echo my thanks I made on the release of our “1.0 revision” back in the summer, and thank everyone who helped write, comment on, and support this recommendation.

And now, I think I’ve got some implementation work to do…

Posted in architecture, discovery, libraries | Comments Off

Drawing a line in the sand, Part 3: How to respond?

Posted on November 24, 2008 by John Mark Ockerbloom

In my previous two posts, I discussed how open library metadata is becoming increasingly important for the future of library content, and how OCLC’s new catalog policy works against it. By asserting proprietary rights over the records in WorldCat, OCLC risks relegating libraries’ painstaking descriptions of their resources to the sidelines of Web-based innovation and aggregation. In the long term, this threatens to marginalize libraries, and their missions.

In this last post of the series, I suggest responses that may improve the situation. OCLC’s new policy won’t go into effect until February, and it’s evolved since it first hit the Net a few weeks ago. Commenting on the policy, OCLC vice-president Karen Calhoun has said

We believe that libraries, the wider library-archives-museum community, and those they serve will benefit from the updated policy without placing our shared investment in WorldCat at peril.

I’ll take her word that OCLC believes that they’re acting in the best interests of the members of the WorldCat cooperative. I just don’t think they actually are acting in libraries’ best interests in their current proposal. But ultimately the libraries themselves are the most appropriate ones to determine that, not I, or OCLC’s officials. So my first suggestion is:

Get informed, and speak up. There’s been a lot of discussion about OCLC’s new policy, and you can find pointers to much of the public online debate here. So far, I’ve mostly just seen OCLC officials defending the policy, and various individuals inside and outside libraries criticizing it. The libraries in the WorldCat cooperative need to make their voices heard as well. Whether they’re for or against the policy, they should have a say on the fate of the metadata corpus they’ve worked to support and populate.

OCLC’s new policy doesn’t just affect traditional library catalog applications. The MARC records OCLC manages can hold lots of important additional information about library materials, including digitizations, copyright, and preservation. I’ve advocated cooperative information sharing in these areas (and participated in planning discussions for some of them), and I’ve been pleased to see OCLC’s leadership in these areas. I don’t want to see these initiatives held back by restrictive sharing policies attached to their records.

Know (and assert) your rights. The main legal justifications for restricting the reuse and sharing of catalog records are copyright law and contract law. Contract law may sometimes override rights that one would normally have under copyright (such as fair use and public domain), but generally requires some sort of specific agreement between parties. If you’re working with OCLC, you might want to make sure you don’t sign away rights you might otherwise have to your own cataloging, or to that of other libraries. I’m not a lawyer, and I can’t give you advice on how best to do this, but adding explicit statements of your rights, and modifying or striking out questionable concessions in OCLC contracts, might be in order.

It’s worth noting that OCLC’s policy, in its current draft, specifically says that it does not restrict simply associating an OCLC control number with an otherwise independent record, or a library reusing and redistributing WorldCat records marked as its own original cataloging. Libraries may want to explicitly affirm these prerogatives. (And third parties can freely attach OCLC numbers to their own records, and could make arrangements with libraries that have done original cataloging to free their records.)

The extent to which copyright can apply to catalog records is not a settled matter, to my knowledge, but the US Supreme Court’s Feist decision makes it clear that, under US law, simply compiling facts without “requisite originality” is not sufficient for copyright. It follows straightforwardly, in my opinion, that basic factual citation information in a catalog record (such as title, author, and edition) and other objective facts (such as page counts and physical dimensions) are no more copyrightable than the names, addresses and telephone numbers in the Feist telephone directory case were. In addition, works originally created by federal government agencies (like the Library of Congress) are not subject to US copyright. In other countries, though, copyright or other restrictions might apply to objective facts; and US copyright might apply to more subjective parts of a catalog record (like prose descriptions or subject analysis). If you want to remove doubt about the reusability of your metadata, and prevent it from being made proprietary, I suggest:

Attach a Creative Commons Attribution-ShareAlike license to your records. (Update: But see the discussion in the comments below about alternatives, and pros and cons of different approaches, before deciding to go this route. – JMO) As I’ve noted previously, the Attribution-ShareAlike license allows any sort of reuse of a record, but requires that its source be credited, and that any copies or derivatives of the record remain shareable. Creative Commons licenses specifically reaffirm all the normal rights that users have under copyright law, so it should be safe to add them even to records that might be public domain, without compromising their status. That’s the license I use for my Online Books Page records.

The license should cover the record itself, and records that derive from it. It should not apply to the entire contents of any database that contains the record (as some have suggested). A database-wide viral license has both philosophical and practical problems. (There are lots of reasons why I might want to have an ILS or union catalog that contains both proprietary and Share-Alike records. For instance, I might have various electronic collections for which I’ve previously bought proprietary records, completely independent from my open records.)

OCLC can use Attribution-ShareAlike records in WorldCat and in any of the services they sell, just as any other commercial or noncommercial organization can. (They could also include them alongside proprietary records.) But they can’t add their own restrictions to the records — which I’ve noted previously are incompatible with ShareAlike — without violating the Creative Commons terms, or implicitly admitting that the record is public domain. Thus, it’s possible to pre-emptively “inoculate” records contributed to WorldCat, or to any other aggregation that includes the records, against their being made proprietary.

OCLC could decide to refuse records with Attribution-ShareAlike licenses. But they’d have to go out of their way to do this, and they’d be acting against the stated wishes of the contributor. I think it’s worth seeing if they’d push back like this. If they did, it’d be worth bringing to the public’s attention, and may also be worth distributing the records outside of OCLC. Which brings me to my last suggestion:

Consider alternative methods for sharing library metadata. Just as OCLC is not the creator of most of the library metadata in WorldCat, it is not the only possible coordinator of library metadata. A number of other organizations are also aggregating descriptions of library resources, some for specific applications, like Google’s Book Search, some for social networking businesses, like LibraryThing, some for preservation, like Hathi Trust, some for free-for-all sharing, like Open Library. Other data hubs may arise as well. (Indeed, an open WorldCat could remain a vital hub that enriches those other aggregations, and is in turn enriched from the information aggregated at the other hubs.) Libraries might contribute records to multiple aggregators.

It’s important to remember that any broad-based data cooperative is not likely to completely satisfy all its members. An alternative to OCLC will not necessarily be more open, unless its members hold it to that standard. Reliability, quality control, and seamless interaction are not easy to provide, and participants in alternative networks will have to put time and effort into getting them right.

New cataloging cooperatives could also provide places to experiment with better representations and workflows for library metadata. Implementors need to be careful about getting sidetracked here; past experiences with RDA and other proposed changes to cataloging infrastructure show that new initiatives can be argued over for years without much progress. But if new union catalogs are compatible with existing catalog systems (such as by providing and accepting standard MARC records), and support efficient workflows, they can potentially represent metadata internally in new structures that might be more informative and easier to maintain. This could improve library cooperation and sharing across the board (and maybe improve WorldCat itself in the process).

This ends my discussion, at least for now, of OCLC’s new policy and its conflict with open library metadata. I hope it’s helped inform and advise readers about the debate, and the issues at stake. And I hope it will help readers determine where they stand, and how they should respond.

Posted in copyright, discovery, libraries, metadata, open access, sharing | 4 Comments

Drawing a line in the sand, Part 2: Problems with OCLC’s catalog policy

Posted on November 21, 2008 by John Mark Ockerbloom

As I mentioned in my last post, OCLC’s new policy for the use of catalog records from WorldCat (a policy now in at least its third revision) has generated substantial controversy, particularly among library bloggers.

The controversy focuses on statements that prohibit reuse (without express OCLC approval) of records contributed to WorldCat, unless the use is both “noncommercial” and “reasonable”. What does “reasonable” mean here? OCLC’s policy statement (as of November 20) specifically excludes any use that

a. discourages the contribution of bibliographic and holdings data to WorldCat, thus
damaging OCLC Members’ investment in WorldCat, and/or

b. substantially replicates the function, purpose, and/or size of WorldCat

But OCLC makes it clear that this is not an exhaustive list, and that “OCLC has the sole discretion to determine whether any Use and/or Transfer of WorldCat Records complies with this policy”. OCLC has stated in its FAQ that starting in February, it will add notices to every record downloaded from WorldCat stating that use of the record is subject to OCLC’s new usage policy.

In other words, OCLC claims the right to decide whether any application using WorldCat-deposited catalog data should be allowed to exist.

OCLC admits that their policy is intended to protect their revenue, and claims that they are acting to protect its subscriber libraries. But as plenty of other commenters have pointed out, the policy puts a big OCLC-sized bottleneck in the way of innovative use and reuse of library metadata. This further promotes the marginalization of library resources, which could easily hurt those same subscriber libraries. Ultimately, I think this policy is self-defeating even for OCLC itself, because the policy itself “discourages the contribution of… data to WorldCat”. At least it does for anyone who sees open library metadata as important, and doesn’t want confusion or licensing fights to erupt when OCLC’s terms of use are inserted into the records they contribute.

OCLC Vice President Karen Calhoun likened the new OCLC policy to the Creative Commons Attribution-NonCommercial-ShareAlike licenses, but it’s incompatible in both letter and spirit with that license, and with any other ShareAlike license. The Creative Commons license allows enhancement, adaptation and reuse for any “function, purpose, or size”. (At least in the noncommercial domain. If, as I recommend, the NonCommercial aspect is not used, then both commercial and noncommercial applications– including any that OCLC sells– can draw on and enhance the content.) The terms are straightforward, and not subject to later reinterpretation or revocation if the grantor decides that their business is threatened. Users do not have to apply for permission to innovate.

The ShareAlike aspect, furthermore, prohibits any downstream organization from trying to lock up or impose additional restrictions, such as OCLC’s, on what was given to them freely. Open data is kept open.

Opening WorldCat data might make OCLC a leaner organization that it is now; Microsoft and Elsevier show that quasi-monopolies can sustain bigger margins than most other businesses. But OCLC also sells lots of services on metadata that bring in money, and it employs smart people who have deep knowledge of library practices and needs, and who continue to bring out innovative and useful services. (To take just one example from OCLC’s many services, WorldCat Local could attract lots of subscription fees from libraries that no longer want to run their own ILS, if the service is well enough designed and run.) In the information technology world, companies like Red Hat and IBM do quite well selling services on free software, and I’m confident that folks at OCLC also have the expertise to sustain themselves with services on open data.

As Stefano Mazzocchi points out, OCLC’s new policy might buy them some time, but the Web isn’t going away, and the people (and companies) who might find library metadata useful aren’t going away either. OCLC is hardly the only organization that can create library metadata. (Indeed, they largely get it from their own subscribers.) But OCLC’s ShareAlike-incompatible license effectively draws a line in the sand: a catalog record can support the agenda of open library data, or support OCLC’s agenda, but not both.

If forced to take sides, the choice is clear to me: open library data is ultimately the best way to support libraries and their missions. If possible, though, I’d rather work with OCLC than work around them. In Part 3 of this series (coming soon now posted), I’ll suggest some possibly useful ways to respond between now and February, when OCLC’s new policy is scheduled to take effect.

Posted in copyright, discovery, libraries, metadata, open access, sharing | 3 Comments

Drawing a line in the sand, Part 1: The importance of open library metadata

Posted on November 19, 2008 by John Mark Ockerbloom

(First of a series of 3 posts. See also Part 2: Problems with OCLC’s catalog policy, and Part 3: How to respond?)

A new bibliographic record distribution policy from OCLC threatens to split the library community from the increasingly large and valuable data sharing resources and applications on the Web, if it doesn’t simply fracture the library community internally. In this post, I’ll try to explain the basic issues. I hope to follow this with specific critiques of OCLC’s policy, and suggestions of useful responses. (If you’d just like a concise overview of the controversy, this Inside Higher Ed article is a good place to start.)

Librarians often lament that people all too often bypass the well-written, informative resources collected by libraries in favor of material on the open Web. One reason for this practice is that Web content is widely harvested, indexed, and annotated by a variety of entities. Some are well-known, large-scale services like Google, Technorati, or Delicious. Some are smaller services that specialize in a particular niche. Each of these services reuses openly accessible data to make Web content easy to find, search, and annotate. Information about what’s in libraries (“metadata”, in library parlance) has also been compiled with great detail and effort, but it tends to be locked up behind individual libraries’ online public access catalogs (OPACs) that tend to be less usable, and less visible, than services like Google.

It doesn’t have to be this way. If you follow this blog, for instance, you’re aware of the DLF ILS Discovery Interfaces recommendation, which is meant to free library metadata from the constraints of the OPAC and make it available to a wide variety of discovery applications. And a growing number of Web services like Open Library, LibraryThing, WorldCat.org, and Google Book Search aggregate metadata from a variety of libraries and other knowledge sources. They open up interesting new ways for people to find, use, and annotate books and other knowledge sources. Bringing library resources into the light through services like these helps readers find the best information, and helps libraries fulfill their missions.

These services, as well as libraries themselves, rely on aggregating the metadata they need from a wide variety of sources. Creating catalog records for books is a laborious and painstaking process, one that would be too labor intensive for most libraries acting on their own. So librarians long ago agreed to partition the work, exchange their records, and enhance them jointly, though the use of shared cataloging and union catalogs that combined the different libraries’ records. Union catalogs were first devised well before the Web, when libraries mainly traded information just among themselves. Union catalog participants typically contribute their own catalog records, and pay a subscription fee for the right to retrieve and reuse records from the union catalog. One current union catalog, OCLC’s WorldCat, has absorbed other union catalogs over time and is currently much larger than any other of its kind. The WorldCat.org website gives free public access to some, but not all, of the information in the subscription-based WorldCat.

In effect, libraries are paying their staff to create catalog records, giving them to OCLC, and then paying to get them back. This isn’t necessarily a bad thing if you’re paying a reasonable price for a useful service, such as OCLC provides with its WorldCat service. But such arrangements can turn exploitative or obstructionist over time. We in libraries are all too aware of this when we see the invoices for the journals we subscribe to, where the articles are written and refereed for free by our faculty. The subscription fee in some cases can cost as much as a new car just for one year of a single journal. And the scholars who wrote the articles typically sign away their rights to get them published, and then can be surprised to discover that they are restricted from redistributing or reusing what they themselves wrote.

An alternative advocated by many library professionals (myself included) is open access, where intellectual content can be freely shared and reused. We have lots of arguments about how open access can lower costs, increase visibility, and promote the global spread of knowledge. The arguments are not just about economics and philanthropy, but about improving scholarship. For instance, when data is freely shared, it can be fruitfully be reused, repurposed, remixed, and reanalyzed in new scholarship and teaching. Yet, even while libraries have promoted open access, open access has not been the principal ways in which we have shared and distributed our own cataloging.

At Palinet’s future of cataloging forum I attended earlier this year, I heard folks in various parts of libraries start to speak up about opening up access to our own cataloging data. My own talk at that forum argued for opening access to catalog data, and recommended doing so via Creative Commons Attribution-ShareAlike licenses. There were also OCLC staff at the same forum, and while they did not promise anything specific, I got the sense that they were planning on opening up access to their WorldCat records.

The new policy does clarify how individual researchers and libraries can reuse and repurpose WorldCat records in some useful ways. Unfortunately, it also explicitly asserts OCLC control over these records, in a way that threatens to dampen much of the sharing and independent collective action that can make our library metadata much more visible and useful. In a followup post, I’ll summarize the problems I see in the policy, and then suggest some things that we might do to help free our library metadata for the benefit of our users.

Posted in copyright, discovery, libraries, metadata, open access, sharing | Comments Off

DLF ILS Discovery Interfaces: Revised recommendation draft open for comments

Posted on October 30, 2008 by John Mark Ockerbloom

Today we released a draft of “revision 1.1” of the ILS Discovery Interfaces recommendation. As I discussed in my previous post, this revision is intended to clarify the implementation of the Basic Discovery Interfaces recommended for integrated library systems (ILS’s), and make them more useful for discovery applications.

On the DLF ILS Discovery Interfaces web site, you’ll find the revision draft and the accompanying schema, along with the initial official recommendation (or “revision 1.0”). My last post included a summary of the major changes from version 1.0.

We’d like to give folks a chance to comment on the changes before we make them official. We’ll take comments until November 18, shortly after the end of the DLF Fall Forum, so folks wanting to go to our birds of a feather session on implementing the recommendations can talk with us there and still have some time to send in written comments. (Or, you can send them in ahead of time so we can think on them at the forum.) Comments may be emailed to me, and I will pass them along to the rest of the task group. There’s also still the open Google Group for discussions.

I’m hoping we’ll start to see Basic Discovery Interfaces implementations, clients, and test suites soon based on the new recommendations and schema. They’re not that different from version 1.0, but should be more useful. I’m working on revising my example implementation now, and hope to see more implementations in the not too distant future. And I look forward to hearing interested people’s thoughts and comments as well.

Posted in architecture, discovery, libraries | Comments Off

Update on ILS-Discovery Interface work

Posted on October 20, 2008 by John Mark Ockerbloom

It’s been a while since I posted about the official release of the Digital Library Federation’s ILS Discovery interface recommendation. Marshall Breeding recently posted a useful update on the further development of the interfaces at Library Technology Guides. As the chair of the ILS-DI task group, which is now charged with some followup work described in Marshall’s article, I’d like to add some further updates.

As Marshall mentions, the DLF convened a meeting in August inviting potential developers of the ILS-Discovery interfaces to discuss implementations of recommendations of the DLF’s ILS-Discovery Interface task group. In the course of the discussion, a few changes were suggested and generally agreed upon by the participants. Updating the recommendation was not the main purpose of the meeting, but as we discussed things, it became clear that some clarifications and small updates to the recommendation would be helpful for producing more consistent and useful implementations of the Basic Discovery Interfaces, the interoperability “Level 1” that was agreed to in the Berkeley Accord.

The ILS-DI task group is therefore preparing a slight revision, to be known as “version 1.1” of the recommendation. A draft of this revision will be released for comment shortly, and will include the following changes, summarized here to give developers some idea of what to expect:

For the HarvestBibliographicRecords and HarvestExpandedRecords functions, it will be clarified that the function should return the records that are available for discovery. (That is, suppressed records and others that might be in the ILS but aren’t intended for discovery will not be shown, except possibly as deleted records as described below).
Support for the OAI-PMH binding for these functions will be noted as required. (That is, it must be supported for full ILS-BDI compliance; other bindings can be supported too.) It will also be noted that Dublin Core is a minimum requirement for returned records (as it is for OAI-PMH in general), and that if MARC records exist in the ILS (or are produced by it), MARC XML should also be available.
We also will require some level of support for deleted records (which includes records no longer available for discovery), to make it feasible for discovery apps to keep in sync with the ILS’s records via incremental harvesting. We’ll note that ILSs should document how long they keep deleted-record information.
For GetAvailability, the simple availability schema defined in the document will be noted as required. (That is, it should be returned for full ILS-BDI compliance; other schemas can be supported too if asked for and supported.) There was some talk at the August meeting about completely dropping the alternative NCIP and ILS-Holdings schemas as replies to GetAvailability, because of their complexity. The draft at this point doesn’t go that far, but it will specify the simple availability schema as the default, and the required, schema to support in the ILS-BDI profile.
That simple availability schema will also be augmented slightly to include an optional location element, distinct from the availability-message element. Location was the one specific data field that many implementors said was essential to include that wasn’t in the original schema.
We will also add a request parameter to GetAvailability for specifying whether bib or item-level availability is desired if a bib. identifier is given. (Formerly the server had the option of choosing the level in that case; there was a strong sentiment in discussions that the client to be able to specify this.)
We expect to leave GoToBibliographicRequestPage alone.

The new draft will be released shortly, and be open to public comment for at least a couple of weeks before we make a last edit for an official release. Feedback is welcome and encouraged, and public discussion can take place in the ILS-DI Google Group, among other places

The new draft will be accompanied by a revised XML schema. The current schema, reflecting the original or “version 1.0” official recommendation, can be found here. For the location of the new one (which is not yet posted), substitute “1.1” for “1.0” in the schema URL. (We intend to keep the old schema up for a good while after the new one is posted, for compatibility with implementations based on the original recommendation.)

I will also be leading a Birds of a Feather session at the upcoming Digital Library Federation fall forum in Providence next month. This will be an opportunity for developers of interfaces implementing the DLF’s ILS-Discovery interface recommendations to present their work to others, ask and answer questions about the recommendations and their implementations, and discuss further development initiatives and coordination. If you’d like us to set aside some time to show or discuss a particular initiative or project you’re working on, let me know.

Watch this space and the ILS-DI Google Group for further developments. And if you can come to the session at DLF in November, I hope we’ll have an interesting and enlightening discussion there as well.

(Update, Oct. 30: The draft of the revision is now out for comment.)

Posted in architecture, discovery, libraries | 3 Comments

What repositories do: The OAIS model

Posted on October 13, 2008 by John Mark Ockerbloom

(Another post in an ongoing series on repositories.)

In my previous post, I mentioned the OAIS reference model as an influential framework for thinking about and planning repositories intended for long-term preservation. If you’re familiar with some of the literature or marketing for digital repositories, you may well have seen OAIS mentioned, or seen a particular system marketed as “OAIS compliant”. You may have also noticed remarks that it’s not always clear in practice what OAIS compliance means. The JISC Standards Catalogue notes “The [OAIS] documentation is quite long and complex and this may prove to be a barrier to smaller repositories or archives.” A common impression I’ve heard of OAIS is that it’s a nice idea that one should really try to pay more attention to, but complex enough that one will have to wait for some less busy time to think about it. Perhaps, one might think, if we just pick a repository system whose marketing says it’s OAIS compliant, we can be spared thinking about it ourselves.

I think we can do better than that, even in smaller projects. The basics of the OAIS model can be understood without having to be conversant with all 148 pages of the reference document. Those basics can help you think about what you need to be doing if you’re planning on preserving information for a long term (as most libraries do). The basics of OAIS also make it clear that following the model isn’t just a matter of installing the right product, but of having the right processes. It’s made very explicit that repository curators need to work with the people who produce and use the information in the repository, and make sure that the repository acquires all the information necessary for its primary audience to use and understand this information far into the future.

To help folks get oriented, here’s a quick introduction to OAIS. It won’t tell you everything about the model, but it should let you see why it’s useful, how you can use it, and what else you might need to consider in your repository planning.

What OAIS is and isn’t

First, let’s start with some basics: OAIS is a reference model for Open Archival Information Systems (whose initials make up the OAIS), that’s now an ISO standard, but is also freely available. It was developed by NASA’s Consultative Committee for Space Data Systems, who have had to deal with large volumes of data and other records generated by decades of space missions and observations, so they’ve had to think hard about how to manage and preserve it. To develop OAIS, they had open discussions with lots of other people and groups (like the National Archives) who were also interested in long-term preservation. OAIS is called “Open” because of the open process that went into creating it. It does not require that the archives are open access, or have open architecture, and it has no direct relation to the similarly-acronymed Open Archives Initiative (OAI). (Though all of these things are also useful to know about in their own right.) An “archival information system” or “archive” can simply be thought of as a repository that’s responsible for long-term preservation of the information it manages.

Unlike many standards, OAIS specifies no particular implementation, API, data format, or protocol. Instead, it’s an abstract model that provides four basic things:

A vocabulary for talking about common operations, services, and information structures of a repository. (This alone can provide very useful common ground for different people who use and produce repositories to talk to each other.) A glossary of this vocabulary can be found in section 1 of the reference model.
A simple data model for the information that a repository takes in (or “ingests”, to use the OAIS vocabulary), manages internally, and provides to others. This information is assumed to be in distinct, discrete packages known as Submission Information Packages (SIPs) for ingestion, Archival Information Packages (AIPs) for internal management, and Dissemination Information Packages (DIPs) for providing the information to consumers (or to other repositories). These packages include not just raw content, but also metadata and other information necessary for interpreting, preserving, and packaging this content. They have different names because the information they contain can take different forms as it goes into, through, and out of the archive. They are described in more detail in sections 2 and 4 of the reference model.
A set of required responsibilities of the archive. In brief, the archive (or its curators) must negotiate with producers of information to get appropriate content and contextual information, work with a designated community of consumers to make sure they can independently understand this information, and follow well-defined and well-documented procedures for obtaining, preserving, authenticating, and providing this information. Section 3 of the model goes into more detail about these responsibilities, and section 5 discusses some of the basic methodologies involved in preservation.
A set of recommended functions for carrying out the archive’s required responsibilities. These are broken up into 6 functional modules: ingest, data management, archival storage, access, administration, and preservation planning. The model describes about half a dozen functions in each model (ingest, for example, includes things like “receive submission”, “quality assurance”, and “generate AIP”) and data flows and dependencies that might exist between the functions. Some of these functions are automated, some (like “monitor technology”), are carried out by humans, and some may involve a combination of human oversight and automated assistance. The functions are described in more detail in section 4 of the model (with issues of multi-archive interoperability discussed in Section 6.)

OAIS conformance and usage

It is important to note that OAIS compliance simply requires fulfilling the required responsibilities, and supporting the basic OAIS data model of information packages. A repository is not required to implement all the functions recommended in the OAIS model, or replicate the detailed internal data flows, to be OAIS compliant. But it can be very useful to look through the functions in any case, both to make sure that your repository is doing everything it needs to do, and to see how the big problem of reliable data preservation can be broken down into smaller, more manageable operations and workflows.

You may also find the functions a useful reference point for detailed descriptions of the exact formats and protocols your repository uses for ingesting and storing information, providing content to users, and migrating it to other repositories. Although the OAIS model does not itself provide specific formats or protocols to use, it makes it clear that a repository provider needs to specify these so it can receive information from producers and make it clearly understandable to consumers.

The OAIS model has been used to help construct more detailed criteria for trusted repositories, as well as checklists for repository audit and certification. In most cases, repositories will operate perfectly well without satisfying every last criterion or checklist item. At the Partnerships in Innovation symposium I attended last week, Don Sawyer, one of the main people behind OAIS, remarked that the archives where he worked satisfied about 80% of the trusted repository checklist items. But he still found it useful to go through the whole list to verify that certain functions were not relevant or required for their repository needs, as well as to spot aspects of the repositories (like disaster recovery or provenance tracking) that might need more attention. Similarly, you can go through the recommended OAIS functions and data-model breakdowns to evaluate what’s important to have in your repository, what can be safely omitted, and what might need more careful attention or documentation.

What else you need to think about

Although the OAIS model includes examples of various kinds of repositories that might use it, it’s at its heart a fairly generic, domain-independent model, largely concerned with preservation needs. It doesn’t say a whole lot about how a repository needs to interact with specific communities to fulfill its purposes. For instance, in the talk I gave last week, I stressed the importance of designing the architecture of repositories to support rich discovery mechanisms. As Ken Thibodeau noted in later conversation, the access model of OAIS is more primitive than the architectures I described. OAIS is not incompatible with those architectures, but designing the right kinds of discovery architectures requires going beyond the criteria of OAIS itself.

You’ll also need to think carefully about the needs of the communities you’re collecting from and serving. The OAIS model notes this requirement, but doesn’t pursue it in depth. I can understand why it doesn’t, since those needs are highly dependent on the domain you’re working in. A repository intended to preserve static, published text documents for possible use in legal deposition will need to interact with its community very differently from, say, a repository intended to manage, capture, and ultimately preserve works in progress used in ongoing research and teaching. They both have preservation requirements that OAIS may well address effectively, but designing effective repositories for these disparate needs may require going well beyond OAIS, doing detailed requirements analyses, and assessing benefits and costs of various options.

I’ll talk more about requirements for particular kinds of repositories in later posts. But I hope I’ve made it clear how the OAIS model can be useful for general thinking and planning what a repository needs to do to manage and preserve its content. If it sounds promising, you can download the full OAIS model as a PDF. A revised document that will clarify some of the terminology and recommendations, but will not substantially change the model, is expected to be released in early 2009.

Posted in preservation, repositories | 2 Comments

Everybody's Libraries

Repository services, Part 1: Galleries vs. self-storage units

Public Domain Day 2009: Freeing the libraries

Revised ILS-Discovery interface recommendation released

DLF ILS Discovery Interfaces: Revised recommendation draft open for comments

Update on ILS-Discovery Interface work

What repositories do: The OAIS model

Pages

Recent Posts

Recent Comments

Archives

Access for all

Copyrights and wrongs

General library-related news and comment

Interesting folks

Metadata and friends

Shiny tech

Tales from the repository

Writing and publishing