Last week saw widespread observance of Open Access Week 2009 .  The week primarily focused on opening access to current research and scholarship (though there’s also been a growing community working on opening access to teaching and learning content).  You can find lots of open access resources at the Open Access Directory.

Current scholarship is not spontaneously generated from the brain or lab of the writer.  Useful scholarship must understand and interpret past work, to be effective in the present.  In many fields, and not just the classical humanities, the relevant past work may stretch back hundreds or even thousands of years.  Current scholarship and study will be more effective if its source material is also made openly accessible, and if proper attention is drawn to the most useful sources.  And now is an especially opportune time for scholars of all sorts, professional and amateur, to get involved in the process.

This may seem a strange thing to say at a time when the digitization of old books and other historic materials is increasingly dominated by large-scale projects like Google and the Internet Archive.  With mass digitizers putting millions of public domain book and journal volumes online, and with a near-term possibility of millions more copyrighted volumes going online as well, how much of a role is left for individual scholars and readers?

A very important role, as it turns out.  Mass digitization projects can quickly produce large scale aggregations of pass content, but as many have pointed out, aggregation is not the same as curation, and as aggregations grow larger, being able to find the right items in a growing collection becomes increasingly important.  That’s what curation helps us do, and the large-scale digitizers are not doing a very effective job of it themselves.  Google’s PageRank algorithm may take advantage of implicit curation of web pages (through the choices of authors’ page links), but Google and other aggregators have had a much harder time drawing attention to the most useful books, scholarly articles, or other works created without built-in hyperlinks.

Sometimes this is because they haven’t digitized them, even as they’ve digitized inferior substitutes.  Over three years after Paul Duguid lamented the republication of a bowdlerized translation of Knut Hamsun’s Pan by Project Gutenberg, that version remains the only freely available one of this book available there, or at Google Books, or anywhere else online that I’ve found.   Even though an unexpurgated version of this translation was published before the bowdlerized version, no digitizer that I know of has gotten around to finding and digitizing it; and countless readers may have used the existing online copies without even knowing that they’ve been censored.  Extra bibliographic and copyright research may be necessary to determine whether a better resource is available for digitization, as it in this case.

Sometimes the content is digitized, but can’t be found easily.  Geoff Nunberg’s post on Google Books’ “metadata train wreck” shows plenty of examples of how difficult it can be to find and properly identify a particular edition in Google Books, much less figure out which edition is the best one to use.  I’ve commented in the past about the challenges of finding multi-volume works in that corpus.  And Peter Jacso has pointed out Google’s problems indexing current scholarship.  If you can’t find the paper or book you need for your research, your work will be no better than it would be if the source had never existed.

This is where scholars can potentially play a useful role.  We don’t individually digitize books by the thousands, but we do individually find, cite, and recommend useful sources, down to the particular edition, as we find them and use them in our own writings and teaching.  These citations and recommendations now often go online, in various locations.  It would be very useful to have these recommendations made more visible, and tied to freely available online copies of the sources cited, whenever legally possible. Sometimes, we also create or digitize our own editions of past works, with useful annotations, for our classes or our own work.  It would be very useful to have these made visible and persistent as well, whenever appropriate.

I hope that large resource aggregations will make it easier for scholars and others to curate the collections to make them more useful to their readers.  In the meantime, we can start with resources we have.  For example, on The Online Books Page, my catalog entry for Hamsun’s Pan notes its limitations.  My public requests page includes information on a better edition that could be digitized, by someone who has access to the edition and has some time to spare.  And my suggestion form is ready to accept links to better editions of this book, or to other online books that merit special attention.  Indeed, most of the books that I now add to my catalog derive from submissions made by various readers on this form, and I invite scholars to suggest the freely accessible books and serials that they find most useful for my catalog.

As the Little Professor notes in a recent post, the sort of bibliographic work I’ve described can be time-consuming but vitally important for making effective use of old sources, and that work has often not been done by anyone for many books outside the usual classical canons.  Yet it’s the sort of thing that scholars do, bit by bit, as part of their everyday work.  The aggregate effect of their curation and digitization, appropriately harnessed in open-access form, could greatly improve our ability to build upon the work of the past.

Author: John Mark Ockerbloom

I'm a digital library architect and planner at the University of Pennsylvania.

2 thoughts on “Promoting access to the best literature of the past”

  1. John, this looks like an excellent suggestion as far as it goes. You note (by subtle implication) the shortcomings of breaking the online citation-sharing services (like Connotea, CiteULike and Zotero) into “social islands”. The same problem, as far as I can see, might well apply in even worse form to your idea of sharing information on good editions through local catalogue activities. It’s not naturally social and it’s even more fragmented. Librarians might do something through some derivative of Worldcat, but perhaps not scholars. Maybe it needs some spark of genius we haven’t seen yet?

  2. Chris, thanks for your comment. I think there’s always going to be a multiplicity of places where people look for and share stuff. What I would like to see is that good curation and recommendation services be included in all sites that are used to look for resources, especially those sites that get a lot of use and/or a wide audience.

    Exactly how to make this best work will require some more sparks of genius, as you imply. But I think a good start is for existing communities to accommodate this, and for those communities to share their information freely and easily. They should also import such information when it makes sense to do so. We need to do more “local” cataloguing that pulls from and contributes to a wide commons, and less local cataloguing that stays local.

    I have been thinking about this in terms of my own catalogue, as well. I already export my data in a couple of XML formats, which anyone is free to take and build on under a CC license, and I may broaden these offerings in the future. And I’ve been considering ways to import more outside information once I have the infrastructure better set up for handling it.

    In the meantime, for scholars, readers, and librarians already working regularly with a particular community’s catalogue or corpus; your contributions can help the other members of the community that also use it; and if your community opens up to work with others, it can potentially help a much larger audience as well.

