Everybody's Libraries

April 29, 2013

December 30, 2012

Persistence

Filed under: failure,online books,people,preservation — John Mark Ockerbloom @ 6:02 pm

The week between Christmas and New Years is mostly time off for me– I’ve added no new listings to The Online Books Page this past week, for instance– but even on vacation, as long as I have a working Internet connection I still tend to fix bad links as I hear about them from readers’ reports.  I try to draw from a variety of free online book sources, instead of just a few big ones; that’s worthwhile to me because it increases the diversity of titles and editions on the site.  But the tradeoff is that many of these sites disappear, reorganize, or otherwise have links go bad over time.  I’m grateful to my readers for reporting bad links to me, and I can often fix other bad links to the same site when I fix the one reported to me.

The links and sites that persist, and those that don’t, often aren’t the ones you might expect.  Who’d have thought, for instance, that a shoestring-budget project that didn’t even maintain its own website until fairly recently would have the longest-lived (and still one of the largest) electronic book collections in common use, outlasting many better-funded or more systematically planned projects (as well as its own doggedly persistent original champion)?  Although the links to Project Gutenberg’s ebooks have changed over the years, the persistence of their etext numbers, and the proliferation of Gutenberg sites and mirrors, has made it relatively easy for me to keep links working for their more than 40,000 ebooks.

Some library-sponsored sites use persistent link redirection technologies, such as PURLs, to keep their links working.  But technology alone isn’t sufficient for persistence.  I recently had to update all of my links going to a PURL-based library consortium site.  I’m sure the people who worked at the organization hosting the site would have kept the links working if they could, but the organization itself was defunded by the state, and its functions were taken over by a new agency that didn’t preserve the links.

Fortunately, the failure had a couple of graceful aspects that eased recovery.  First of all, the old links didn’t stop working altogether, but redirected to the front page of a digital repository in which people could search for the titles they were looking for.  Second, the libraries in the consortium still maintained their own websites, and the old links included a serial number unique to each text (similar to Gutenberg’s etext numbers) that was also used by member libraries.  I found that in most cases I could automatically rewrite my links, using that serial number, so that they would point to a copy at a contributing library’s website.  This made it easier for me to rewrite my links, even though they go to new sites, than it’s often been for me to update links to sites that persist but reorganize.   (For instance, I’ve seen sites change to new content management systems that used completely different URLs from their old design, and then had to manually relocate and verify each link one at a time.)

Sometimes I have to replace links that still “work”, technically.  I used to have thousands of links to a Canadian consortium that provided free access to scanned public domain books and pamphlets from that country’s history.  Not long ago, I discovered that while my links still work, the site had gone to a subscription model where readers have to pay for access beyond the first dozen or so pages of each text. Given the precarious state of Canadian library funding, I’m sure the people running the site were simply doing what they thought necessary to ensure the persistence of the sponsoring organization (which continues to provide new electronic texts and services).  Personally, however, I was more concerned about the persistence of free access to the digitized texts I’d pointed to.  Fortunately, a number of the consortium’s member libraries had also uploaded copies of their scans to the Internet Archive, using the same serial numbers used on the Canadian consortium’s website.  As a result, I was able to quickly update most of my links to point to the Internet Archive’s copies.  I intend to track down working alternative links to the 200 or so remaining texts, or post requests seeking other copies of these texts, when time permits.  (I’ve also sent along a donation to the Internet Archive, in part to thank them for continuing to provide access to texts like these.)

It’s been said in digital library literature that persistence of identifiers is more a matter of policy than technology.  Based on the experiences I’ve related above, the practical persistence of links is even more a matter of will than of policy: the will (and ability) to keep maintaining access through changing conditions; the willingness to consider alternatives to specific organizational structures or policies if the original ones turn out not to be tenable; the willingness to pick things up again, or let others pick them up, after a failure.

It’s also clear from my experience that practically speaking, failure is not the main enemy of persistence.   More of a threat is not recovering from failure, or being so worried about failure that one doesn’t even begin to sustain the thing or the purpose that should persist.  To riff off a famous G. K. Chesterton quote, if it’s worth doing something, it’s worth being willing and ready to fail at doing it.  And then, to be willing to pick up again where you left off, or to make it easy for someone else to pick it up, and try something new.

That’s persistence.  That’s what’s ultimately gotten the dissertation rewritten, the estates settled, the blog picked up again, the books put and kept online for the world to read, and many other things I’ve found worthwhile, despite difficulties, anxieties, and setbacks.  I value that persistence, and I hope you value it as well, for the things you find worthwhile. I look forward to seeing where it takes us in the year to come.

October 7, 2011

My mother’s orphan

Filed under: copyright,findingada,online books,open access,people,preservation,sharing,teaching — John Mark Ockerbloom @ 5:06 pm

Before my mother was pregnant with me, she was working on a book.

The book had begun its gestation at least a year before. She had been teaching math in Massachusetts, and was involved with the Madison Project, one of the initiatives that arose from the “new math” movement of the 1960s.  What excited her, and what I caught from her not long after I was born, was the sense of discovery and play that was encouraged in the Madison teaching style.  The primary focus wasn’t so much on imparting and drilling facts and rules, or on mundane applications, but on finding patterns, solving puzzles, and figuring out the secrets of numbers and geometry and the other mathematical constructs that underlie our world. Some project participants planned a series of books that would help bring out this sense of discovery and exploration in math classes.

Two small children in the house may have delayed my mother’s ambitions, but we didn’t stop her.  When I was in kindergarten, the piles of papers in my parents’ bedroom went away, and my mother proudly showed me her new book.  The book, Discoveries in Essential Mathematics, was co-written with Ramon Steinen, and published by Charles E. Merrill. Though the textbook was written for middle schoolers, I remember reading through the book after my mother showed it to me, solving the simpler problems, and smiling when I saw my name or my sister’s in an example.

She got small royalty checks for a few years, but the book was out of print by the late 1970s, never reaching a second edition.  We kept some copies in our basement, but I didn’t know of any library that held it.  When I visited the Library of Congress as a middle schooler, wrongly convinced that they had every book ever published, I remember my disappointment when I couldn’t find Mom’s book in their card catalog.

My mother eventually retired from teaching, and the enthusiasm and talent I’d gotten from Mom for math shifted into computing, and then into digital libraries.  And when my kids reached school age, I decided to try putting her book online.  In an era of large classes, detailed state standards, and high-stakes standardized tests, it might not be a viable standard textbook any more, but I think it’s still great for curious kids who show an interest in math.

Mom thought that was a great idea.  But she didn’t know if she could grant permission on her own.  Although long out of print, the book’s copyright had automatically renewed in 2000 under US copyright law, and she wasn’t sure if she had to get the consent of her publisher or co-author before she could give me the go-ahead. She didn’t know how to reach her co-author, and her old imprint was long gone.  Even its acquirer had itself been acquired by a large conglomerate some time ago.  So I let the idea drop, thinking I’d come back to it later when I had a little time to research the copyright.

But not long after, she started a long slide into dementia, and was soon in no position to give permission to anyone.  If her book had been practically an “orphan work” before, due to uncertainty over rights, it was even more so now.  There was no trouble locating the author; but no way of getting valid permission from someone definitely known to hold the rights.

Mom died this past winter, four years after my Dad had reluctantly moved her into the nursing home for good, and four weeks after he’d made his usual daily visit, gone back home, and had a fatal heart attack.  After we paid the last of the bills, and threw out the contents of the basement (where a burst pipe ruined all the books, papers, and other things they kept down there), what remained of what they had would now go to me and my siblings.

I still had a copy at home of the teacher’s edition of Mom’s book that she had once given to Grandma.  And between my mother’s funeral and the burst pipe, I’d taken a student edition out of their basement for my kids to read.  But any faint hope of finding publishing contracts or rights assignment documents was obliterated after the pipe burst.  The basic questions were: had Mom signed her rights to the book away, as many academic authors do? If so, had she gotten them back at some point?  Or had she never had the rights in the first place, as sometimes happens with textbook authors under “work for hire” contracts?

The copyright page of the book, and the record in the 1972 Catalog of Copyright Entries, show the publisher as the copyright claimant, so I couldn’t assume she had the rights.   But I also doubted whether I could get a clear answer, or reasonable licensing terms, from the company that had eventually acquired the assets of Mom’s original publisher.

I eventually found what I needed to know on a trip to Washington, DC.  While attending a meeting on digital format registries, I realized that I was in the same building as the Copyright Office.   So after the meeting, I got a reader’s card, went upstairs, and consulted the librarians there.  We confirmed that, under the automatic renewal laws of the time, the copyright to Mom’s book would have reverted in 2000 to whoever had been declared the “author” in the book in the original registration record.   Moreover, in the absence of any contrary arrangement, any co-owner of a copyright can authorize publication, as long as they split any proceeds with the other copyright owners.

Since I was planning just to put the book online for free, the only question remaining was: who was listed as the author on the original registration: the publisher who claimed the copyright, or my mother and Dr. Steinen?  It’s not clear from the Catalog of Copyright Entries, but the original registration certificate would state it.  And the one copy known to exist of that certificate was in the archives of the Copyright Office where I was sitting.

Twenty minutes later, I had the certificate in front of me.  The name on the “claimant” line was indeed the publisher’s, but the names on the “author” line were Steinen and Ockerbloom.  My mother’s orphan was mine to claim.

There are a lot more books out there like hers.  Since I added records for Hathi Trust‘s public domain books to The Online Books Page, I’ve gotten requests to curate hundreds of out of print, largely forgotten books that are still meaningful to readers online.  Many of the people who opt to leave contact information  live in places where  books tend to be hard to get or pay for. Many others, judging from their names, seem to be related to the authors of the books they suggest. These readers have found the books after Hathi, or Google, or the Internet Archive, has resurfaced them online, and the readers want these books to live on.  If there were an easy, inexpensive, uncontroversially legal way to also bring back books that are still in copyright, but no longer commercially exploited, I’m sure I could fulfill a lot of requests for those books too.

For now, though, I’ll bring back the one orphan book I’ve been given. And I thank my mother for writing it, and the other women and men who have poured so much of their energy and teaching into their books, and the librarians of all kinds who help ensure those books stay accessible to readers who value them.  I’ll try my best to keep your legacies alive.

September 27, 2011

Libraries: Be careful what your web sites “Like”

Filed under: crimes and misdemeanors,data,libraries,people,privacy — John Mark Ockerbloom @ 6:15 pm

Imagine you’re working in a library, and someone with a suit and a buzz cut comes up to you, gestures towards a patron who’s leaving the building, and says “That guy you were just helping out; can you tell me what books he was looking at?”

Many librarians would react to this request with alarm.  The code of ethics adopted by the American Library Association states “We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”  Librarians will typically refuse to give such information without a carefully-verified search warrant, and many are also campaigning against the particularly intrusive search demands authorized by the PATRIOT Act.

Yet it’s possible that the library in this scenario is routinely giving out that kind of information, without the knowledge or consent of librarians or patrons, via its web site.  These days, many sites, including those of libraries, invoke a variety of third-party services to construct their web pages.  For instance, some library sites use Google services to analyze site usage trends or to display book covers.  Those third party services often know what web page has been visited when they’re invoked, either through an identifier in the HTML or Javascript code used to invoke the service, or simply through the Referer information passed from the user’s web browser.

Patron privacy is particularly at risk when the third party also knows the identity of users visiting sensitive pages (like pages disclosing books they’re interested in).  The social networking sites that many library patrons use, for instance, can often track where their users go on the Web, even after they’ve left the social sites themselves.

For instance, if you go to the website of the Farmington Public Library (a library I used a lot when growing up in Connecticut), and search through their catalog, you may see Facebook “Like” buttons on the results.  On this page, for example, you may see that four people (possibly more by the time you read this) have told Facebook they Liked the book Indistinguishable from Magic.  Now, you can probably easily guess that if you click the Like button, and have a Facebook account, then Facebook will know that you liked the book too.  No big surprise there.

But what you can’t easily tell is that  Facebook is informed you’ve looked at this book page, even if you don’t click on anything.  If you’re a Facebook user and haven’t logged out– and for a while recently, even if you have logged out– Facebook knows your identity.  And if Facebook knows who you are and what you’re looking at, it has the power to pass along this information. It might do it through a “frictionless sharing” app you decided to try.  Or it might quietly provide it to organizations that it can sell your data to as permitted in its frequently changing data use policies.  (Which for a while even included tracking non-members.)

For some users, it might not be a big deal if it’s generally known what books they’re looking at online. But for others it definitely is a big deal, at least some of the time.  The problem with third-party inclusions like the Facebook “Like” button in catalogs is that library patrons may be denied the opportunity to give informed consent to sharing their browsing with others.  Libraries committed to protecting their patron’s privacy as part of their freedom to read need to carefully consider what third party services they invite to “tag along” when patrons browse their sites.

This isn’t just a Facebook issue.  Similar issues come up with other third-party services that also track individuals, as for instance Google does.  Libraries also have good reasons to partner with third party sites for various purposes.  For some of these purposes, like ebook provision, privacy concerns are fairly well understood and carefully considered by most libraries.  But librarians might not keep as close track of the development of their own web sites, where privacy leaks can spring up unnoticed.

So if any of your web sites (especially your online catalogs or other discovery and delivery services) use third party web services, consider carefully where and how they’re being invoked.  For each third party, you should ask what information they can get from users browsing your web site, what other information they have from other sources (like the “real names” and exact birthdates that sites like Facebook and Google+ demand), and what real guarantees, if any, they make about the privacy of the information.  If you can’t easily get satisfactory answers to these questions, then reconsider your use of these services.

June 15, 2011

A digital public library we still need, and could build now

Filed under: citizen librarians,copyright,libraries,people,sharing — John Mark Ockerbloom @ 12:39 pm

It’s been more than half a year since the Digital Public Library of America project was formally launched, and I’m still trying to figure out what the project organizers really want it to be.  The idea of “a digital library in service of the American public” is a good one, and many existing digital libraries already play that role in a variety of ways.  As I said when I christened this blog, I’m all for creating a multitude of libraries to serve a diversity of audiences and information needs.

At a certain point after an enthusiastic band of performers says “Let’s put on a show!”, though, someone has to decide what their show’s going to be about, and start focusing effort there.  So far, the DPLA seems to be taking an opportunistic approach.  Instead of promulgating a particular blueprint for what they’ll do, they’re asking the community for suggestions, in a “beta sprint” that ends today.   Whether this results in a clear distinctive direction for the project, or a mishmash of ideas from other digitization, aggregation, preservation, and public service initiatives, remains to be seen.

Just about every digital project I’ve seen is opportunistic to some extent.   In particular, most of the big ones are opportunistic when it comes to collection development.  We go after the books, documents, and other knowledge resources that are close to hand in our physical collections, or that we find people putting on the open web, or that our users suggest, or volunteer to provide on their own.

There are a number of good reasons for this sort of opportunism.  It lets us reuse work that we don’t have to redo ourselves.  It can inform us of audience interests and needs (at least as far as the interests of the producers we find align with the interests of the consumers we serve).  And it’s cheap, and that’s nothing to sneer at when budgets are tight.

But the public libraries that my family prefers to use don’t, on the whole, have opportunistically built collections.  Rather, they have collections shaped primarily by the needs of their patrons, and not primarily by the types of materials they can easily acquire.   The “opportunistic” community and school library collections I’ve seen tend to be the underfunded ones, where books in which we have yet to land on the Moon, the Soviet Union is still around, or Alaska is not yet a state may be more visible than books that reflect current knowledge or world events.  The better libraries may still have older titles in their research stacks, but they lead with books that have current relevance to their community, and they go out of their way to acquire reliable, readable resources for whatever information needs their users have.  In other words, their collections and services are driven by  demand, not supply.

In the digital realm, we have yet to see a library that freely provides such a digital collection at large scale for American public library users.   Which is not to say we don’t have large digital book collections– the one I maintain, for instance, has over a million freely readable titles, and Google Books and lots of other smaller digital projects have millions more.  But they function more as research or special-purpose collections than as collections for general public reference, education, or enjoyment.

The big reason for this, of course, is copyright.  In the US, anyone can freely digitize books and other resources published before 1923, but providing anything published after that requires copyright research and, usually, licensing, that tends to be both complex and expensive.  So the tendency of a lot of digital library projects is to focus on the older, obviously free material, and have little current material.  But a generally useful digital public library needs to be different.

And it can be, with the right motivation, strategy, and support.  The key insight is that while a strong digital public library needs to have high-quality, current knowledge resources, it doesn’t need to have all such resources, or even the most popular or commercially successful ones.  It just needs to acquire and maintain a few high-quality resources for each of the significant needs and aptitudes of its audience. Mind you, that’s still a lot of ground to cover, especially when you consider all the ages, education levels, languages, physical and mental abilities, vocational needs, interests, and demographic backgrounds that even a midsized town’s public library serves.  But it’s still a substantially smaller problem, and involves a smaller cost, than the enticing but elusive idea of providing instant free online access to everything for everyone.

There are various ways public digital libraries could acquire suitable materials proactively.  The America.gov books collection provides one interesting example.  The US State Department wanted to create a library of easy-to-read books on civics and American culture and history for an international audience.  Some of these books were created in-house by government staff.  Others were commissioned to outside authors.  Still others were adapted from previously published works, for which the State Department acquired rights.

A public digital library could similarly create, commission, solicit, or acquire rights to books that meet unfilled information needs of its patrons.  Ideally it would aim to acquire rights not just to distribute a work as-is, but also to adapt and remix into new works, as many Creative Commons licenses allow.  This can potentially greatly increase the impact of any given work.  For instance, a compellingly written,  beautifully illustrated book on dinosaurs might be originally written for 9-12 year old English speakers, and be noticeably obsolete due to new discoveries after 5 or 10 years.  But if a library’s community has reuse and adaptation rights, library members can translate, adapt, and update the book, so it becomes useful to a larger audience over a longer period of time.

This sort of collection building can potentially be expensive; indeed, it’s sobering that America.gov has now ceased being updated, due to budget cuts.  But there’s a lot that can be produced relatively inexpensively.  Khan Academy, for example, contains thousands of short, simple educational videos, exercises, and assessments created largely by one person, with the eventual goal of systematically covering the entire standard K-12 curriculum.  While I think a good educational library will require the involvement of many more people, the Khan example shows how much one person can get accomplished with a small budget, and projects like Wikipedia show that there’s plenty of cognitive surplus to go around, that a public library effort might usefully tap into.

Moreover, the markets for rights to previously authored content can potentially be made much more efficient than they are now.  Most books, for instance, go out of print relatively quickly, with little or no commercial exploitation thereafter.  And as others have noted, just trying to get permission to use  a work digitally, even apart from any royalties, can be very expensive and time-consuming.  But new initiatives like Gluejar aim to make it easier to match up people who would be happy to share their book rights with people who want to reuse them. Authors can collect a small fee (which could easily be higher than the residual royalties on an out-of-print book); readers get to share and adapt books that are useful to them.   And that can potentially be much cheaper than acquiring the rights to a new work, or creating one from scratch.

As I’ve described above, then, a digital public library could proactively build an accessible collection of high-quality, up to date online books and other knowledge resources, by finding, soliciting, acquiring, creating, and adapting works in response to the information needs of its users.  It would build up its collection proactively and systematically, while still being opportunistic enough to spot and pursue fruitful new collection possibilities.  Such a digital library could be a very useful supplement to local public libraries, would be open any time anywhere online, and could provide more resources and accessibility options than a local public library could provide on its own.  It would require a lot of people working together to make it work, including bibliographers, public service liaisons, authors, technical developers, and volunteers, both inside and outside existing libraries.  And it would require ongoing support, like other public libraries do, though a library that successfully serves a wide audience could also potentially tap into a wide base of funds and in-kind contributions.

Whether or not the DPLA plans to do it, I think a large-scale digital free public library with a proactively-built, high-quality, broad-audience general collection is something that a civilized society can and should build.  I’d be interested in hearing if others feel the same, or have suggestions, critiques, or alternatives to offer.

March 24, 2010

Erin McKean and inclusive entrepreneurship

Filed under: people — John Mark Ockerbloom @ 11:33 pm

Today is Ada Lovelace Day, a day celebrating the achievements of women in science and technology.  There are all kinds of ways to be a scientist or a technologist, and just in the fields of computing and information technology I can think of a number of first-class inventors, investigators, developers, teachers, and integrators who are women.

When it comes to entrepreneurs, though, the list of women who come to my mind gets much shorter.  And that’s unfortunate, because in computing and information technology, as in many other technical fields, entrepreneurs play a major role in bringing the fruits of technology to the world at large.   If you mention Steve and SteveBill and Paul, or Larry and Sergey, lots of people will know who you’re talking about, and also know the stories of the companies they founded and their world-changing products.  Women tech-company founders, though, have not been so noticeable.  Jessica Livingston, one of the founders of the tech-startup catalyst Y Combinator, reports that nearly all of their applicants have Y chromosomes; only about 7 percent have been women. And in mid-2008, the San Jose Mercury News reported that there were no female CEOs in the top 150 Silicon Valley companies.

Despite these discouraging statistics, you can find women pioneering new technological businesses.  One who I find particularly notable (and not just because the woman technologist closest to me works for her) is Erin McKean, co-founder and CEO of Wordnik, a company that’s reinventing the dictionary for the Internet age. If you watch this 15-minute video of a 2007 TED talk, or just read the About and FAQ pages at Wordnik and play around with the site some, you’ll pick up the general ideas.   There are a few aspects of her venture that I think are worth special note:

Rethinking the familiar. Dictionaries have been around in print for centuries, and even computerized dictionaries have existed for decades.  But as Erin makes clear in the TED video, once you have the Internet, and lots of data and computing power, you can discard many of the limitations of prior conceptions of the dictionaries, and do lots of interesting new things.  You can include all the words in a language, not just the ones that pass some sort of notability test.  You can show examples from a huge array of  formal, informal, and ephemeral sources.  You can do statistical analysis to track word usage over different times, places, and genres.  In short, you can make a reference that meets a known need in new and useful ways.

Risk-taking. It’s a truism that startups are risky ventures.   You spend years of your life working long hours, often with low pay and few benefits at the start, for a project that may make you rich, but statistically is more likely to come to nothing.   And some believe that online dictionaries, in particular, may be obsolete, and vulnerable to the same sorts of disruptive markets that blew the encylopedia business to bits.  But thus far, Erin’s risk-taking seem to be paying off; the site was named one of PC Magazine’s Top 100 products of 2009 (one of only a few websites to be included in their list), and it continues to attract funding even in the midst of a severe recession.

Inclusion. One of the key distinguishing features of Wordnik, alluded to above, is its inclusiveness.  By collecting all the words and examples that it can, from a wide variety of sources, it stands out from other dictionaries.  And along with the usual definitions, pronunciations, and example sentences for words, Wordnik also brings in many other unconventional sources — Flickr photo streams, Scrabble scoring references, and user tags and comments, to name a few– to enhance understanding and enjoyment of words.  It remains to be seen which of these sources will prove most useful or popular in the long run, but openness to new information and ideas is often a crucial part of inventing and improving new technologies.   It also helps ventures evaluate and adapt their technologies and their business models, so that they can thrive rather than perish under changing conditions.

Inviting collaboration. If inclusiveness is important to you, it’s not enough just to wait for things to come to you; you need to go out and invite people to work with you.  The Wordnik site does this to a certain extent by design, encouraging people to contribute notes, tags, lists of favorite words, and other information.  (Our 7-year-old daughter was an ardent contributor of Pokemon character names and pronunciations during the site’s alpha test.)  More recently, Wordnik has partnered with the Internet Archive and a variety of publishers and publishing sites to develop Smartwords, a forthcoming open standard for querying and embedding word information on demand as people read online, or communicate via social software.  Erin and her collaborators hope that this will extend the reach of Wordnik’s services into many more contexts than just “going to look up a word in the dictionary”.

While the details above are specific to Wordnik, the four basic qualities they embody – rethinking the familiar, risk-taking, inclusion, and inviting collaboration — have general applicability, and are especially worth consideration on Ada Lovelace Day.  The low numbers quoted earlier for women tech founders and CEOs suggests to me that women may not always be seriously considered (by themselves or others) for those roles.  Thinking in terms of the qualities I’ve mentioned makes it easier to envision women in those positions.  These are also qualities that can help both men and women take a more entrepreneurial approach to the technologies (and the libraries) they develop, so that they can have a lasting, positive effect on the world.

March 23, 2010

Lots of conversation keeps stuff sustainable

Filed under: libraries,people,preservation,sharing — John Mark Ockerbloom @ 10:12 pm

Among the hats I wear at my place of work is that of LOCKSS cache administrator. LOCKSS is a useful distributed preservation system built around the principle “Lots of copies keep stuff safe” (whose initials give the system its name).  The idea is that, with the cooperation of publishers, a bunch of libraries each harvest copies of selected online content, and keep backups on our own LOCKSS caches, which are hooked up to local library proxy services.  Then, if the material ever becomes inaccessible from the publisher, our users will automatically be routed to our local copies.  Each LOCKSS cache also periodically checks with other LOCKSS caches to ensure that our copies are still in good shape, and to repair or replace copies that have been lost or damaged.  (Various security features protect against leaks of restricted content, or unauthorized revisions of content.)

LOCKSS is open source software that runs on commodity hardware.  It was originally envisioned to run virtually automatically.  As Chris Dobson described the ideal in a 2003 Searcher article, “Take a computer a generation past its prime…. Hook it up to the Internet and put it in a closet. Stick in the LOCKSS CD-ROM and boot it up. Close the closet door.”  And then presumably walk away and forget about it.

Of course, it’s not that simple in practice, particularly if your library is proactive about its preservation strategy.  The thing about preservation at scale is there’s always something that needs attention.  It might be something technical, or content-related, or planning-related, but preserving a growing collection requires ongoing thought.  And if you want to think as clearly and sensibly as you can, you’ll want to collaborate.

Right now, for instance, I’m trying to get my cache to harvest the full run of a journal that’s just been made available for LOCKSS harvesting, where we hope to provide post-cancellation access through LOCKSS.  Someone at Stanford just gave me a useful tip on how to give this journal priority over the other volumes I’ve got queued up for harvest.  Unfortunately, I can’t try it out until I get my cache back up after it failed to reboot cleanly after a power failure. While I wait to hear back instructions about how best to remedy this, I wonder whether switching to a new Linux-based version of LOCKSS might make such operating system-level problems easier to deal with.  But it would be useful to hear from folks who are running that version to see what their experience has been.

Meanwhile, we’re wondering how best to approach new publishers who have content that our bibliographers would like to preserve via LOCKSS. Our special collections folks wonder whether we should preserve some of our own home-grown content via a private LOCKSS network.  I’m also doing some ongoing monitoring and testing of our LOCKSS cache’s behavior (some of which I’ve reported on earlier), and would be interested in knowing if others are seeing some of the same kinds of things that I see on the cache I administer.

In short, there are a lot of things to think about, when LOCKSS plays a significant role in a preservation plan.  And a lot of the issues I’ve mentioned above are ones that others may be thinking about as well.  So let’s talk about them.  As the LOCKSS group has said, “”A vibrant, active, and engaged user community is key to the success of Open-Source efforts like LOCKSS.”

One thing you need for such an engaged community is a forum for them to talk to each other.  As it turns out, the LOCKSS group at Stanford tell me they created a LOCKSS Forum mailing list a while back, but I haven’t yet seen it publicized.   Its information page is at https://mailman.stanford.edu/mailman/listinfo/lockss-forum .  (Currently, archived email messages are not visible on the open web, though this may change in the future.)  If you’re interested in talking with others about how you use or might use LOCKSS to preserve access to digital content, I invite you to sign up and help get the conversation going.

October 5, 2009

Remember this

Filed under: people,preservation,sharing — John Mark Ockerbloom @ 12:40 am

I am eating a sandwich at the end of Pier 14 in San Francisco.  The sun has set behind the downtown skyscrapers, and the colors in the sky are slowly fading to grey.  I’m not the only diner out here.  Pelicans soar close off the pier, about 100 feet above the water, and one by one dive straight down with a loud splash, resurfacing in a moment, ruffling their feathers and jerking their beaks to get down the fish they’ve caught.  Other splashes in the water come from seals surfacing for air.  As an orange-tinted full moon comes up over the East Bay hills and under the span of the Bay Bridge, I see a pair of seals surface side by side, with their mouths meeting as they float at the water’s surface for a few seconds.  I am delighted to see all this, so different from what I usually see at home, and at the same time I wish I could be back there with the people I love instead of alone here.

I don’t have a camera right now, or anything to draw with, so I can only record this scene in words and in memory.   When there was still sun shining low on Yerba Buena Island and the coastline to the east, there were several people out here with tripods and light umbrellas, photographing human couples standing against the pier railings, in each other’s arms.  Judging from the clothing and the poses, I suspect these shots are for wedding or engagement albums.  And I can understand the motivation.   When Mary and I were married, 14 years ago this month, we too had pictures taken of us against a striking background, in our case the bright orange and yellow trees of a Pennsylvania fall.  I see one of those pictures every time I return home. Remember this, the picture says, and it brings back memories of the vows we made to each other that day.  The words we said, and the way we looked when we said them, were not recorded in fixed form, but, God willing, will stay in our hearts as long as we live.

There are more memories recorded out on the pier.  Plaques along the rails quote lines of poetry by Lawrence Ferlinghetti and Thomas Lovell Beddoes about the bay I’m looking out on.  Ceramic tile art depicts boats that have plied its waters, from the early days of European exploration to the present.  A display on the sidewalk in front relates the history of the pier, the ferries that ran (and still run, in smaller numbers) from the terminal nearby, the freeway that was built and then removed again from the water’s edge, and some of the people who played a part in all of these developments.  Remember this, they say, and I bring bits back with me to record in words.

It’s a basic need that we have, as intelligent, reflective, and social creatures, to remember the things we’ve experienced, seen, and learned about.  We make records of these things in various forms, to help us remember, and to prompt others to remember as well.  They help us go beyond and above what’s immediately in front of us, telling us things we need to know, people we can relate to, pasts that were different, futures that can be better.

Technology can make it easier for us to record these things– and sometimes easier to lose them.  We took many pictures of our kids on digital cameras as they grew up, and kept hundreds of them on my laptop, which let me easily recall them and show them to friends and family when I traveled.  Then one day I was robbed of my laptop, without my having backed up my photo collection, and most of those pictures were lost.   I’ve also seen  many other personal and family memoirs posted on the Web, stay for a few years, and then vanish with the demise of the web site they were on.    I kept paper tapes of early BASIC programs I wrote in middle school for years after I had access to any device that could read them.  They’re gone now; I presume they were thrown out when my parents cleaned house sometime after I left home.

I know better now how to keep what remains.  Apple’s Time Machine makes it easy for me to incrementally back up my laptop every time I come home from work and plug a cheap external drive into my USB port.  The pictures of my kids that survived the laptop theft were mostly the ones that I had shared with others (either by copying them onto prints, or by putting them up on the Web). And the older family pictures that are most meaningful to us are ones where we know what the pictures represent, either because we are in them, or because others have told us, in person or in writing, who is in the pictures and the context in which they were taken.

I am here in San Francisco for Ipres 2009, a conference promoting the preservation of digital content.  There are a lot of smart, dedicated people scheduled to speak, and I hope to learn about new technologies and methods to help us preserve the content we want our libraries and their users to remember.

While some of these techniques may be complex, many of them are essentially elaborations on basic principles I’ve touched on in what I’ve related above: Help people record what’s important to them.  Make it easy for them to preserve these records in their everyday activity.  Encourage them to copy and share what they record, and allow others to build on them.  Make what they record easy to interpret, through informative description and straightforward formats.  And finally, try to understand and appreciate the connection between the record and the people for whom the record is important.

Which is why I sit now with my laptop in my hotel room, looking out on a bay that is now as dark as the night sky overhead, and trying to connect my experiences with the preservation challenges and proposals to come. Remember this, I mean to say.  It’s important.

May 27, 2009

Getting bugs out of our systems

Filed under: crimes and misdemeanors,people — John Mark Ockerbloom @ 10:21 am

Very soon after we start learning to program, we start learning to deal with bugs.   Folks who have programmed for a while might forget that effective bug handling, like effective programming, is a skill that doesn’t come entirely naturally.

Many of us instinctively avoid criticism, ignore it, minimize it, or even argue against our critics.  But our programs will almost invariably include bugs, and to handle them, we have to go against the grain of our instincts.  If we’re smart, we make it as easy as possible to report bugs to us, so we minimize their impact.  We respect and listen carefully to what our clients tell us, to understand the problems they’re encountering with our product.  After we fix the bugs, we often  review our code and our practices to avoid similar problems in the future.

It helps a lot if we can keep our egos out of the bug-fixing process.  I know that my work will sometimes have bugs, and that a bug report should not be taken as a personal attack.  Rather, I try to make it an opportunity to improve my products and my future work.

Bugs exist at various levels.  Bugs that cause crashes are often the easiest to deal with: it’s clear that something is going wrong, and it usually isn’t hard to figure out what to do about it.  But less obvious bugs can be worse.  One product our library uses, for example, implemented boolean searches incorrectly, omitting important results. This kind of bug can mislead lots of people who never notice the problem.   (And it can also take longer to address.  I had to send multiple emails and examples to the developers of this product before they admitted that their implementation was buggy.)

Bugs at the overall system level can be the worst.  The reservation system with interminable holds, the customer support service that never returns our calls, the open source effort that repels key constituencies it should be attracting: all of these are buggy systems, and they can drive people away just as surely as a crashing program.  As Michael Bolton puts it, “a bug is something that bugs somebody who matters.”  System-level bugs can be challenging to fix, but they can be the most essential to repair.

I hope none of these principles seems new or controversial.  But I’ve recently seen a few bug reports concerning the Ruby on Rails community that drew many responses that ignored them.  The reports concerned buggy systems, not buggy code.  In particular, they noted a professional developer conference that attracted very few women, and an accepted presentation at that conference that included blatantly unprofessional themes, themes that one could easily predict would put off many of the people who could benefit from the talk.  (They would be particularly problematic if you were one of the few women there, but I found it distinctly off-putting as well.)

The comments on those two posts include plenty of examples of denial, minimization, rationalization, and attacking the reporters of the bugs.  (Indeed, some read as if they were cribbed right from this checklist of cliched defenses of in-group privilege.)

Assuming that the respondents are active members of the Ruby community, the responses suggest that there are still serious social bugs in that community.  I recently came back from another open source-focused conference (one that had a significantly higher proportion of women, though still far from 50%), where there were some good things said about using Ruby on Rails for library application development.  I like open source projects with good technical bases, but if I’m going to rely on a technology, I want its developer community to be healthy.  Healthy communities generally provide more reliable, long lasting development support, and can be much easier and more pleasant to work with.

It can be positively uncomfortable for many of us to confront social problems, particularly ones in our own communities that we might be partly responsible for.  (And Ruby is not the only community that’s had this kind of problem.)  Perhaps if we get used to thinking of these problems as bugs, welcoming and paying close attention to reports, and getting our egos out of the way, we’ll find it easier to fix them.

Gender inequities are bugs in our systems.  Bugs happen.  But they can be fixed.  As my library considers involvement in various community source development projects, I want to find out more about what these communities are doing, going forward, to fix and prevent these sorts of bugs.

April 23, 2009

David Reed: Some extracts from his life and letters

Filed under: online books,people — John Mark Ockerbloom @ 11:36 pm

Last summer I was looking for a particular book. I couldn’t find it in any library in my State. Went interlibrary loans and found one copy at the library of Congress. Only one copy in the whole country. One of the best stories I ever [heard] about this is one when one of my professors was working on a trash pile of papyrus sheets and came across one that said [it] was the works of Meander. He went through that pile of papyrus with a fine tooth comb. He didn’t find anything but that single piece. He said that it felt as though he was looking across the centuries and saying, “Somewhere out there are the works of Meander.” [Friends,] this is how things get lost forever.

David Reed, 1997

Today, there are thousands of important books that will likely never share that fate as long as civilization lasts, because they were digitized and sent all over the world.  Many of these books were first put online by Project Gutenberg.  And many of the Project Gutenberg texts are online thanks to the work of David Reed.

I scanned and released Gibbon’s Decline and Fall of the Roman Empire and hardly a day goes by when I don’t get an email from someone thanking me for releasing it on the web. At one site I know that it has been downloaded 1800+ times in all six volumes.

David Reed, 2001

In the mid-1990s, Project Gutenberg had an outlandish-sounding goal: to make 10,000 books freely available online by the start of the 21st century.  They’d only managed to put a couple hundred online by then.  Authors like Clifford Stoll were skeptical that they, or anyone else, would ever reach such a goal.

But Gutenberg was soon publishing more and more texts every month, at an ever-increasing pace.   Lots of those texts had David Reed’s name on them.  Working persistently with his own scanner, well before the era of well-funded mass digitization, he digitized and proofread long works that few other people at the time would have taken on: Gibbon’s Decline and Fall; Shakespeare’s First Folio;  Josephus’ Antiquities of the Jews; Frazer’s Golden Bough; Tocqueville’s Democracy in America.  He also scanned numerous works weighty and light from authors like Rudyard Kipling, Louisa May Alcott, Robert Frost, James Joyce, and the US government.

Some critics in academia complained that the books David and others put up for Gutenberg were not up to the standards of scholarly editions.  David didn’t begrudge the work of scholars, but he wanted to put up more works, more quickly, to reach a broader audience.  As he put it in 1999:

[I] think that [it’s] important to remember that we do all this work because we like to read and we like to share our discoveries with others…. I see no reason why the text specialists can’t have the specialist collections and the general people (like myself) have the general collections. There is room enough on the web for all of us. The real enemy are those who want to lock up all the books in the world. The real enemy are those who don’t read a single book.

David was fighting another enemy besides illiteracy, one closer to home. He had diabetes, and in the last few years of his life his health slowly worsened from complications of that disease. He didn’t mention it in this post (nor, as far as I can remember, in any of the posts he made to the Book People mailing list, from which these quotations are taken). But even while his health was failing, he continued to put books online, like this emergency childbirth manual that was posted this past October.  He was working to fulfill a dream that he described back in his 1999 post:

I dream of the day when we have 50,000 and 100,000 etext libraries on the web. Where there are 100 new etexts being released a week or every couple of days. When I can’t keep up with reading every etext that pops up on the Online Book Page or that Project Gutenberg releases. . I appreciate all the work that you are all doing. I love reading the work that you are all doing.

David died on April 21, 2009, according to the email his son Chris sent to David’s contacts list.  By then, Google Books and the Internet Archive’s book collection had made over 1 million books freely available online, the various Gutenberg projects had posted just over 30,000 books, and many smaller projects had posted numerous unique titles as well.  He lived long enough to see his dream come true, thanks in part to his own pioneering work and dedication.

I have dedicated etexts in honor of my daughter, my sons, my wife, parents and in honor of my companies I work for, even in honor of myself.

David Reed, 2001

Out there all over the Net, in millions of replicas, are the works of David Reed, transcribing many of the great authors that have also passed on.  In some sense, all of those works are dedicated  to him.  Through them, I hope his name lives on for generations to come.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 87 other followers