Promoting access to the best literature of the past

Last week saw widespread observance of Open Access Week 2009 .  The week primarily focused on opening access to current research and scholarship (though there’s also been a growing community working on opening access to teaching and learning content).  You can find lots of open access resources at the Open Access Directory.

Current scholarship is not spontaneously generated from the brain or lab of the writer.  Useful scholarship must understand and interpret past work, to be effective in the present.  In many fields, and not just the classical humanities, the relevant past work may stretch back hundreds or even thousands of years.  Current scholarship and study will be more effective if its source material is also made openly accessible, and if proper attention is drawn to the most useful sources.  And now is an especially opportune time for scholars of all sorts, professional and amateur, to get involved in the process.

This may seem a strange thing to say at a time when the digitization of old books and other historic materials is increasingly dominated by large-scale projects like Google and the Internet Archive.  With mass digitizers putting millions of public domain book and journal volumes online, and with a near-term possibility of millions more copyrighted volumes going online as well, how much of a role is left for individual scholars and readers?

A very important role, as it turns out.  Mass digitization projects can quickly produce large scale aggregations of pass content, but as many have pointed out, aggregation is not the same as curation, and as aggregations grow larger, being able to find the right items in a growing collection becomes increasingly important.  That’s what curation helps us do, and the large-scale digitizers are not doing a very effective job of it themselves.  Google’s PageRank algorithm may take advantage of implicit curation of web pages (through the choices of authors’ page links), but Google and other aggregators have had a much harder time drawing attention to the most useful books, scholarly articles, or other works created without built-in hyperlinks.

Sometimes this is because they haven’t digitized them, even as they’ve digitized inferior substitutes.  Over three years after Paul Duguid lamented the republication of a bowdlerized translation of Knut Hamsun’s Pan by Project Gutenberg, that version remains the only freely available one of this book available there, or at Google Books, or anywhere else online that I’ve found.   Even though an unexpurgated version of this translation was published before the bowdlerized version, no digitizer that I know of has gotten around to finding and digitizing it; and countless readers may have used the existing online copies without even knowing that they’ve been censored.  Extra bibliographic and copyright research may be necessary to determine whether a better resource is available for digitization, as it in this case.

Sometimes the content is digitized, but can’t be found easily.  Geoff Nunberg’s post on Google Books’ “metadata train wreck” shows plenty of examples of how difficult it can be to find and properly identify a particular edition in Google Books, much less figure out which edition is the best one to use.  I’ve commented in the past about the challenges of finding multi-volume works in that corpus.  And Peter Jacso has pointed out Google’s problems indexing current scholarship.  If you can’t find the paper or book you need for your research, your work will be no better than it would be if the source had never existed.

This is where scholars can potentially play a useful role.  We don’t individually digitize books by the thousands, but we do individually find, cite, and recommend useful sources, down to the particular edition, as we find them and use them in our own writings and teaching.  These citations and recommendations now often go online, in various locations.  It would be very useful to have these recommendations made more visible, and tied to freely available online copies of the sources cited, whenever legally possible. Sometimes, we also create or digitize our own editions of past works, with useful annotations, for our classes or our own work.  It would be very useful to have these made visible and persistent as well, whenever appropriate.

I hope that large resource aggregations will make it easier for scholars and others to curate the collections to make them more useful to their readers.  In the meantime, we can start with resources we have.  For example, on The Online Books Page, my catalog entry for Hamsun’s Pan notes its limitations.  My public requests page includes information on a better edition that could be digitized, by someone who has access to the edition and has some time to spare.  And my suggestion form is ready to accept links to better editions of this book, or to other online books that merit special attention.  Indeed, most of the books that I now add to my catalog derive from submissions made by various readers on this form, and I invite scholars to suggest the freely accessible books and serials that they find most useful for my catalog.

As the Little Professor notes in a recent post, the sort of bibliographic work I’ve described can be time-consuming but vitally important for making effective use of old sources, and that work has often not been done by anyone for many books outside the usual classical canons.  Yet it’s the sort of thing that scholars do, bit by bit, as part of their everyday work.  The aggregate effect of their curation and digitization, appropriately harnessed in open-access form, could greatly improve our ability to build upon the work of the past.

Remember this

I am eating a sandwich at the end of Pier 14 in San Francisco.  The sun has set behind the downtown skyscrapers, and the colors in the sky are slowly fading to grey.  I’m not the only diner out here.  Pelicans soar close off the pier, about 100 feet above the water, and one by one dive straight down with a loud splash, resurfacing in a moment, ruffling their feathers and jerking their beaks to get down the fish they’ve caught.  Other splashes in the water come from seals surfacing for air.  As an orange-tinted full moon comes up over the East Bay hills and under the span of the Bay Bridge, I see a pair of seals surface side by side, with their mouths meeting as they float at the water’s surface for a few seconds.  I am delighted to see all this, so different from what I usually see at home, and at the same time I wish I could be back there with the people I love instead of alone here.

I don’t have a camera right now, or anything to draw with, so I can only record this scene in words and in memory.   When there was still sun shining low on Yerba Buena Island and the coastline to the east, there were several people out here with tripods and light umbrellas, photographing human couples standing against the pier railings, in each other’s arms.  Judging from the clothing and the poses, I suspect these shots are for wedding or engagement albums.  And I can understand the motivation.   When Mary and I were married, 14 years ago this month, we too had pictures taken of us against a striking background, in our case the bright orange and yellow trees of a Pennsylvania fall.  I see one of those pictures every time I return home. Remember this, the picture says, and it brings back memories of the vows we made to each other that day.  The words we said, and the way we looked when we said them, were not recorded in fixed form, but, God willing, will stay in our hearts as long as we live.

There are more memories recorded out on the pier.  Plaques along the rails quote lines of poetry by Lawrence Ferlinghetti and Thomas Lovell Beddoes about the bay I’m looking out on.  Ceramic tile art depicts boats that have plied its waters, from the early days of European exploration to the present.  A display on the sidewalk in front relates the history of the pier, the ferries that ran (and still run, in smaller numbers) from the terminal nearby, the freeway that was built and then removed again from the water’s edge, and some of the people who played a part in all of these developments.  Remember this, they say, and I bring bits back with me to record in words.

It’s a basic need that we have, as intelligent, reflective, and social creatures, to remember the things we’ve experienced, seen, and learned about.  We make records of these things in various forms, to help us remember, and to prompt others to remember as well.  They help us go beyond and above what’s immediately in front of us, telling us things we need to know, people we can relate to, pasts that were different, futures that can be better.

Technology can make it easier for us to record these things– and sometimes easier to lose them.  We took many pictures of our kids on digital cameras as they grew up, and kept hundreds of them on my laptop, which let me easily recall them and show them to friends and family when I traveled.  Then one day I was robbed of my laptop, without my having backed up my photo collection, and most of those pictures were lost.   I’ve also seen  many other personal and family memoirs posted on the Web, stay for a few years, and then vanish with the demise of the web site they were on.    I kept paper tapes of early BASIC programs I wrote in middle school for years after I had access to any device that could read them.  They’re gone now; I presume they were thrown out when my parents cleaned house sometime after I left home.

I know better now how to keep what remains.  Apple’s Time Machine makes it easy for me to incrementally back up my laptop every time I come home from work and plug a cheap external drive into my USB port.  The pictures of my kids that survived the laptop theft were mostly the ones that I had shared with others (either by copying them onto prints, or by putting them up on the Web). And the older family pictures that are most meaningful to us are ones where we know what the pictures represent, either because we are in them, or because others have told us, in person or in writing, who is in the pictures and the context in which they were taken.

I am here in San Francisco for Ipres 2009, a conference promoting the preservation of digital content.  There are a lot of smart, dedicated people scheduled to speak, and I hope to learn about new technologies and methods to help us preserve the content we want our libraries and their users to remember.

While some of these techniques may be complex, many of them are essentially elaborations on basic principles I’ve touched on in what I’ve related above: Help people record what’s important to them.  Make it easy for them to preserve these records in their everyday activity.  Encourage them to copy and share what they record, and allow others to build on them.  Make what they record easy to interpret, through informative description and straightforward formats.  And finally, try to understand and appreciate the connection between the record and the people for whom the record is important.

Which is why I sit now with my laptop in my hotel room, looking out on a bay that is now as dark as the night sky overhead, and trying to connect my experiences with the preservation challenges and proposals to come. Remember this, I mean to say.  It’s important.