Among the hats I wear at my place of work is that of LOCKSS cache administrator. LOCKSS is a useful distributed preservation system built around the principle “Lots of copies keep stuff safe” (whose initials give the system its name). The idea is that, with the cooperation of publishers, a bunch of libraries each harvest copies of selected online content, and keep backups on our own LOCKSS caches, which are hooked up to local library proxy services. Then, if the material ever becomes inaccessible from the publisher, our users will automatically be routed to our local copies. Each LOCKSS cache also periodically checks with other LOCKSS caches to ensure that our copies are still in good shape, and to repair or replace copies that have been lost or damaged. (Various security features protect against leaks of restricted content, or unauthorized revisions of content.)
LOCKSS is open source software that runs on commodity hardware. It was originally envisioned to run virtually automatically. As Chris Dobson described the ideal in a 2003 Searcher article, “Take a computer a generation past its prime…. Hook it up to the Internet and put it in a closet. Stick in the LOCKSS CD-ROM and boot it up. Close the closet door.” And then presumably walk away and forget about it.
Of course, it’s not that simple in practice, particularly if your library is proactive about its preservation strategy. The thing about preservation at scale is there’s always something that needs attention. It might be something technical, or content-related, or planning-related, but preserving a growing collection requires ongoing thought. And if you want to think as clearly and sensibly as you can, you’ll want to collaborate.
Right now, for instance, I’m trying to get my cache to harvest the full run of a journal that’s just been made available for LOCKSS harvesting, where we hope to provide post-cancellation access through LOCKSS. Someone at Stanford just gave me a useful tip on how to give this journal priority over the other volumes I’ve got queued up for harvest. Unfortunately, I can’t try it out until I get my cache back up after it failed to reboot cleanly after a power failure. While I wait to hear back instructions about how best to remedy this, I wonder whether switching to a new Linux-based version of LOCKSS might make such operating system-level problems easier to deal with. But it would be useful to hear from folks who are running that version to see what their experience has been.
Meanwhile, we’re wondering how best to approach new publishers who have content that our bibliographers would like to preserve via LOCKSS. Our special collections folks wonder whether we should preserve some of our own home-grown content via a private LOCKSS network. I’m also doing some ongoing monitoring and testing of our LOCKSS cache’s behavior (some of which I’ve reported on earlier), and would be interested in knowing if others are seeing some of the same kinds of things that I see on the cache I administer.
In short, there are a lot of things to think about, when LOCKSS plays a significant role in a preservation plan. And a lot of the issues I’ve mentioned above are ones that others may be thinking about as well. So let’s talk about them. As the LOCKSS group has said, “”A vibrant, active, and engaged user community is key to the success of Open-Source efforts like LOCKSS.”
One thing you need for such an engaged community is a forum for them to talk to each other. As it turns out, the LOCKSS group at Stanford tell me they created a LOCKSS Forum mailing list a while back, but I haven’t yet seen it publicized. Its information page is at https://mailman.stanford.edu/mailman/listinfo/lockss-forum . (Currently, archived email messages are not visible on the open web, though this may change in the future.) If you’re interested in talking with others about how you use or might use LOCKSS to preserve access to digital content, I invite you to sign up and help get the conversation going.