Last week saw widespread observance of Open Access Week 2009 . The week primarily focused on opening access to current research and scholarship (though there’s also been a growing community working on opening access to teaching and learning content). You can find lots of open access resources at the Open Access Directory.
Current scholarship is not spontaneously generated from the brain or lab of the writer. Useful scholarship must understand and interpret past work, to be effective in the present. In many fields, and not just the classical humanities, the relevant past work may stretch back hundreds or even thousands of years. Current scholarship and study will be more effective if its source material is also made openly accessible, and if proper attention is drawn to the most useful sources. And now is an especially opportune time for scholars of all sorts, professional and amateur, to get involved in the process.
This may seem a strange thing to say at a time when the digitization of old books and other historic materials is increasingly dominated by large-scale projects like Google and the Internet Archive. With mass digitizers putting millions of public domain book and journal volumes online, and with a near-term possibility of millions more copyrighted volumes going online as well, how much of a role is left for individual scholars and readers?
A very important role, as it turns out. Mass digitization projects can quickly produce large scale aggregations of pass content, but as many have pointed out, aggregation is not the same as curation, and as aggregations grow larger, being able to find the right items in a growing collection becomes increasingly important. That’s what curation helps us do, and the large-scale digitizers are not doing a very effective job of it themselves. Google’s PageRank algorithm may take advantage of implicit curation of web pages (through the choices of authors’ page links), but Google and other aggregators have had a much harder time drawing attention to the most useful books, scholarly articles, or other works created without built-in hyperlinks.
Sometimes this is because they haven’t digitized them, even as they’ve digitized inferior substitutes. Over three years after Paul Duguid lamented the republication of a bowdlerized translation of Knut Hamsun’s Pan by Project Gutenberg, that version remains the only freely available one of this book available there, or at Google Books, or anywhere else online that I’ve found. Even though an unexpurgated version of this translation was published before the bowdlerized version, no digitizer that I know of has gotten around to finding and digitizing it; and countless readers may have used the existing online copies without even knowing that they’ve been censored. Extra bibliographic and copyright research may be necessary to determine whether a better resource is available for digitization, as it in this case.
Sometimes the content is digitized, but can’t be found easily. Geoff Nunberg’s post on Google Books’ “metadata train wreck” shows plenty of examples of how difficult it can be to find and properly identify a particular edition in Google Books, much less figure out which edition is the best one to use. I’ve commented in the past about the challenges of finding multi-volume works in that corpus. And Peter Jacso has pointed out Google’s problems indexing current scholarship. If you can’t find the paper or book you need for your research, your work will be no better than it would be if the source had never existed.
This is where scholars can potentially play a useful role. We don’t individually digitize books by the thousands, but we do individually find, cite, and recommend useful sources, down to the particular edition, as we find them and use them in our own writings and teaching. These citations and recommendations now often go online, in various locations. It would be very useful to have these recommendations made more visible, and tied to freely available online copies of the sources cited, whenever legally possible. Sometimes, we also create or digitize our own editions of past works, with useful annotations, for our classes or our own work. It would be very useful to have these made visible and persistent as well, whenever appropriate.
I hope that large resource aggregations will make it easier for scholars and others to curate the collections to make them more useful to their readers. In the meantime, we can start with resources we have. For example, on The Online Books Page, my catalog entry for Hamsun’s Pan notes its limitations. My public requests page includes information on a better edition that could be digitized, by someone who has access to the edition and has some time to spare. And my suggestion form is ready to accept links to better editions of this book, or to other online books that merit special attention. Indeed, most of the books that I now add to my catalog derive from submissions made by various readers on this form, and I invite scholars to suggest the freely accessible books and serials that they find most useful for my catalog.
As the Little Professor notes in a recent post, the sort of bibliographic work I’ve described can be time-consuming but vitally important for making effective use of old sources, and that work has often not been done by anyone for many books outside the usual classical canons. Yet it’s the sort of thing that scholars do, bit by bit, as part of their everyday work. The aggregate effect of their curation and digitization, appropriately harnessed in open-access form, could greatly improve our ability to build upon the work of the past.