Everybody's Libraries

June 11, 2010

Journal liberation: A primer

Filed under: copyright,libraries,open access,publishing,sharing — John Mark Ockerbloom @ 10:07 am

As Dorothea Salo recently noted, the problem of limited access to high-priced scholarly journals may be reaching a crisis point.  Researchers that are not at a university, or are at a not-so-wealthy one, have long been frustrated by journals that are too expensive for them to read (except via slow and cumbersome inter-library loan, or distant library visits).  Now, major universities are feeling the pain as well, as bad economic news has forced budget cuts in many research libraries, even as further price increases are expected for scholarly journals.  This has forced many libraries to consider dropping even the most prestigious journals, when their prices have risen too high to afford.

Recently, for instance, the University of California, which has been subject to significant budget cuts and furloughssent out a letter in protest of Nature Publishing Group’s proposal to raise their subscription fees by 400%.  The letter raised the possibility of cancelling all university subscriptions to NPG, and having scholars boycott the publisher.

Given that Nature is one of the most prestigious academic journals now publishing, one that has both groundbreaking current articles and a rich history of older articles, these are strong words.  But dropping subscriptions to journals like Nature might not be as as much of a hardship for readers as it once might have been.  Increasingly, it’s possible to liberate the research content of academic journals, both new and old, for the world.  And, as I’ll explain below, now may be an especially opportune time to do that.

Liberating new content

While some of the content of journals like Nature is produced by the journal’s editorial staff or other writers for hire, the research papers are typically written by outside researchers, employed by universities and other research institutions.  These researchers hold the original copyright to their articles, and even if they sign an agreement with a journal to hand over rights to them (as they commonly do), they retain whatever rights they don’t sign over.  For many journals, including the ones published by Nature Publishing Group, researchers retain the right to post the accepted version of their paper (known as a “preprint”) in local repositories.  (According to the Romeo database, they can also eventually post the “postprint”– the final draft resulting after peer review, but before actual publication in the journal– under certain conditions.)  These drafts aren’t necessarily identical to the version of record published in the journal itself, but they usually contain the same essential information.

So if you, as a reader, find a reference to a Nature paper that you can’t access, you can search to see if the authors have placed a free copy in an open access repository. If they haven’t, you can contact one of them to encourage them do do so.  To find out more about providing open access to research papers, see this guide.

If a journal’s normal policies don’t allow authors to share their work freely in an open access repository, authors  may still be able to retain their rights with a contract addendum or negotiation.  When that hasn’t worked, some academics have decided to publish in, or review for, other journals, as the California letter suggests.  (When pushed too far, some professors have even resigned en masse from editorial boards to start new journals that are friendlier to authors and readers.

If nothing else, scholarly and copyright conventions generally respect the right of authors to send individual copies of their papers to colleagues that request them.  Some repository software includes features that make such copies extremely easy to request and send out.  So even if you can’t find a free copy of a paper online already, you can often get one if you ask an author for it.

Liberating historic content

Many journals, including Nature, are important not only for their current papers, but for the historic record of past research contained in their back issues.  Those issues may be difficult to get a hold of, especially as many libraries drop print subscriptions, deaccession old journal volumes, or place them in remote storage.  And electronic access to old content, when it’s available at all, can be surprisingly expensive.  For instance, if I want to read this 3-paragraph letter to the editor from 1872 on Nature‘s web site, and I’m not signed in at a subscribing institution, the publisher asks me to pay them $32 to read it in full.

Fortunately, sufficiently old journals are in the public domain, and digitization projects are increasingly making them available for free.  At this point, nearly all volumes of Nature published before 1922 can now be read freely online, thanks to scans made available to the public by the University of Wisconsin, Google, and Hathi Trust.  I can therefore read the letters from that 1872 issue, on this page, without having to pay $32.

Mass digitization projects typically stop providing public access to content published after 1922, because copyright renewals after that year might still be in force.  However, most scholarly journals– including, as it turns out, Nature — did not file copyright renewals.  Because of this, Nature issues are actually in the public domain in the US all the way through 1963 (after which copyright renewal became automatic).  By researching copyrights for journals, we can potentially liberate lots of scholarly content that would otherwise be inaccessible to many. You can read more about journal non-renewal in this presentation, and research copyright renewals via this site.

Those knowledgeable about copyright renewal requirements may worry that the renewal requirement doesn’t apply to Nature, since it originates in the UK, and renewal requirements currently only apply to material that was published in the US before, or around the same time as, it was published abroad.  However, offering to distribute copies in the US counts as US publication for the purposes of copyright law.  Nature did just that when they offered foreign subscriptions to journal issues and sent them to the US; and as one can see from the stamp of receipt on this page, American universities were receiving copies within 30 days of the issue date, which is soon enough to retain the US renewal requirement.  Using similar evidence, one can establish US renewal requirements for many other journals originating in other countries.

Minding the gap

This still leaves a potential gap between the end of the public domain period and the present.  That gap is only going to grow wider over time, as copyright extensions continue to freeze the growth of the public domain in the US.

But the gap is not yet insurmountable, particularly for journals that are public domain into the 1960s.  If a paper published in 1964 included an author who was a graduate student or a young researcher, that author may well be still alive (and maybe even be still working) today, 46 years later.  It’s not too late to try to track authors down (or their immediate heirs), and encourage and help them to liberate their old work.

Moreover, even if those authors signed away all their rights to journal publishers long ago, or don’t remember if they still have any rights over their own work, they (or their heirs) may have an opportunity to reclaim their rights.  For some journal contributions between 1964 and 1977, copyright may have reverted to authors (or their heirs) at the time of copyright renewal, 28 years after initial publication.  In other cases, authors or heirs can reclaim rights assigned to others, using a termination of transfer.  Once authors regain their rights over their articles, they are free to do whatever they like with them, including making them freely available.

The rules for reversion of author’s rights are rather arcane, and I won’t attempt to explain them all here.  Terminations of transfer, though, involve various time windows when authors have the chance to give notice of termination, and reclaim their rights.  Some of the relevant windows are open right now.   In particular, if I’ve done the math correctly, 2010 marks the first year one can give notice to terminate the transfer of a paper copyrighted in 1964, the earliest year in which most journal papers are still under US copyright.  (The actual termination of a 1964 copyright’s transfer won’t take effect for another 10 years, though.)  There’s another window open now for copyright transfers from 1978 to 1985; some of those terminations can take effect as early as 2013.  In the future, additional years will become available for author recovery of copyrights assigned to someone else.  To find out more about taking back rights you, or researchers you know, may have signed away decades ago, see this tool from Creative Commons.

Recognizing opportunity

To sum up, we have opportunities now to liberate scholarly research over the full course of scholarly history, if we act quickly and decisively.  New research can be made freely available through open access repositories and journals.  Older research can be made freely available by establishing its public domain status, and making digitizations freely available.  And much of the research in the not-so-distant past, still subject to copyright, can be made freely available by looking back through publication lists, tracking down researchers and rights information, and where appropriate reclaiming rights previously assigned to journals.

Journal publishing plays an important role in the certification, dissemination, and preservation of scholarly information.  The research content of journals, however, is ultimately the product of scholars themselves, for the benefit of scholars and other knowledge seekers everywhere.   However the current dispute is ultimately resolved between Nature Publishing Group and the University of California, we would do well to remember the opportunities we have to liberate journal content for all.

May 6, 2010

Making discovery smarter with open data

Filed under: architecture,discovery,online books,open access,sharing,subjects — John Mark Ockerbloom @ 9:06 am

I’ve just made a significant data enhancement to subject browsing on The Online Books Page.  It improves the concept-oriented browsing of my catalog of online books via subject maps, where users explore a subject along multiple dimensions from a starting point of interest.

Say you’d like to read some books about logic, for instance.  You’d rather not have to go find and troll all the appropriate shelf sections within math, philosophy, psychology, computing, and wherever else logic books might be found in a physical library.  And you’d rather not have to think of all the different keywords used to identify different logic-related topics in a typical online catalog. In my subject map for logic, you can see lots of suggestions of books filed both under “Logic” itself, and under related concepts.  You can go straight to a book that looks interesting, select a related subject and explore that further, or select the “i” icon next to a particular book to find more books like it.

As I’ve noted previously, the relationships and explanations that enable this sort of exploration depend on a lot of data, which has to come from somewhere.  In previous versions of my catalog, most of it came from a somewhat incomplete and not-fully-up-to-date set of authority records in our local catalog at Penn.  But the Library of Congress (LC) has recently made authoritative subject cataloging data freely available on a new website.  There, you can query it through standard interfaces, or simply download it all for analysis.

I recently downloaded their full data set (38 MB of zipped RDF), processed it, and used it to build new subject maps for The Online Books Page.   The resulting maps are substantially richer than what I had before.  My collection is fairly small by the standards of mass digitization– just shy of 40,000 items– but still, the new data, after processing, yielded over 20,000 new subject relationships, and over 600 new notes and explanations, for the subjects represented in the collection.

That’s particularly impressive when you consider that, in some ways, the RDF data is cruder than what I used before.  The RDF schemas that LC uses omit many of the details and structural cues that are in the MARC subject authority records at the Library of Congress (and at Penn).  And LC’s RDF file is also missing many subjects that I use in my catalog; in particular, at present it omits many records for geographic, personal, and organizational names.

Even so, I lost few relationships that were in my prior maps, and I gained many more.  There were two reasons for this:  First of all, LC’s file includes a lot of data records (many times more than my previous data source), and they’re more recent as well.  Second, a variety of automated inference rules– lexical, structural, geographic, and bibliographic– let me create additional links between concepts with little or no explicit authority data.  So even though LC’s RDF file includes no record for Ontario, for instance, its subject map in my collection still covers a lot of ground.

A few important things make these subject maps possible, and will help them get better in the future:

  • A large, shared, open knowledge base: The Library of Congress Subject Headings have been built up by dedicated librarians at many institutions over more than a century.  As a shared, evolving resource, the data set supports unified searching and browsing over numerous collections, including mine.  The work of keeping it up to date, and in sync with the terms that patrons use to search, can potentially be spread out among many participants.  As an open resource, the data set can be put to a variety of uses that both increase the value of our libraries and encourage the further development of the knowledge base.
  • Making the most of automation: LC’s website and standards make it easy for me to download and process their data automatically. Once I’ve loaded their data, and my own records, I then invoke a set of automated rules to infer additional subject relationships.  None of the rules is especially complex; but put together, they do a lot to enhance the subject maps. Since the underlying data is open, anyone else is also free to develop new rules or analyses (or adapt mine, once I release them).  If a community of analyzers develops, we can learn from each other as we go.  And perhaps some of the relationships we infer through automation can be incorporated directly into later revisions of LC’s own subject data.
  • Judicious use of special-purpose data: It is sometimes useful to add to or change data obtained from external sources.  For example, I maintain a small supplementary data file on major geographic areas.  A single data record saying that Ontario is a region within Canada, and is abbreviated “Ont.”, generates much of my subject map for Ontario.  Soon, I should also be able to re-incorporate local subject records, as well as arbitrary additional overlays, to fill in conceptual gaps in LC’s file.  Since local customizations can take  a lot of effort to maintain, however, it’s best to try to incorporate local data into shared knowledge bases when feasible.  That way, others can benefit from, and add on to, your own work.

Recently, there’s been a fair bit of debate about whether to treat cataloging data as an open public good, or to keep it more restricted.  The Library of Congress’ catalog data has been publicly accessible online for years, though until recently only you could only get a little a time via manual searches, or pay a large sum to get a one-time data dump.  By creating APIs, using standard semantic XML formats, and providing free, unrestricted data downloads for their subject authority data, LC has made their data much easier for others to use in a variety of ways. It’s improved my online book catalog significantly, and can also improve many other catalogs and discovery applications.  Those of us who use this data, in turn, have incentives to work to improve and sustain it.

Making the LC Subject Headings ontology open data makes it both more useful and more viable as libraries evolve.  I thank the folks at the Library of Congress for their openness with their data, and I hope to do my part in improving and contributing to their work as well.

April 7, 2010

Copyright information is busting out all over

Filed under: copyright,sharing — John Mark Ockerbloom @ 3:43 pm

Like the crocuses and daffodils now coming up all over our front garden, new copyright registration information has been popping up all over the net lately.  As I’ve described in various previous posts, this information can be extremely useful for folks who want to revive, disseminate, or reuse works from the past.

Here’s a summary of the some of the recent highlights:

Copyright renewals for maps and commercial prints are now all online, and join what is now a complete set of renewals of active copyrights for still images.  The scanning was done here at the Penn Libraries by me and by the Schoenberg Center for Electronic Text and Image, from microfilms and volumes loaned by the Free Library of Philadelphia.  I thank all the folks who helped out with this project.

With this addition of this latest set of records, you can now find copyright renewals online for nearly anything you’d find in a book, if they’re recent enough to still be in force.  The only active copyright renewals of any sort not yet online at this point to my knowledge are renewals for most music prior to 1978, and a few small sets of pre-1978 renewals for film (about 2 years’ worth in all).

Original copyright registrations are also going online at a rapid rate.   The biggest publicly accessible set of original registrations from 1923 onward (the date of the oldest copyrights still in force) is at Hathi Trust, and consists of digitized volumes that have been scanned by Google for Hathi member libraries.  I’ve include them in a list of registration volumes organized by year and type of work on my Catalog of Copyright Entries Page, which has now been reorganized to combine all the original and renewal registrations known to be available online.  I’ve also added direct page links to renewal and other important sections of the volumes, so that researchers looking for those can go to them directly.  In many cases, the renewal sections can be downloaded for offline use.  I’ve also brought out statistics from the volumes, to help give readers a sense of the rate of registrations and renewals.

Google is making enhanced versions of book copyright registration volumes available online. Specifically, they’ve digitized the full set of original and renewal registrations for books from 1922-1977, in a set of scans that are of generally higher quality than the ones at Hathi Trust.  You can search the full text of the entire set at once, or search or browse individual volumes.

These scans were done specially for copyright research purposes, and seem to involve more careful scanning than the normal mass-book-digitization procedures Google used for the Hathi Trust volumes.  They aren’t entirely free of problems– I identify a few trouble spots in my listings– and they also don’t include registrations for other types of work, which has apparently confused some folks who have contacted me.  But they’re quite high quality overall, and could be a very good basis for structured data records of these copyright registrations.  Google has previously made such records available for book copyright renewals; I hope we’ll see a release of records based on these new scans before long as well.

Also in the pipeline: Based on conversations I’ve had with others interested in copyright issues, we may well see a complete set of copyright registrations and renewals online (at least in the form of page images from the Catalog of Copyright Entries) by the end of this year.  And a number of projects are working on making this digitized information more useful for practical copyright clearance.  Today, for instance, I heard about the Durationer project, being presented at the Copyright@300 conference at Berkeley later this week.  The project is developing a tool to help people determine the copyright status of specific works in specific jurisdictions, based on copyright registrations and other relevant information.

Some possible future directions: As I described in more detail in a 2007 paper, a thorough determination of a work’s copyright status depends not just on registration information, but on various other kinds of information, much of which can be found in a work’s bibliographic records.  Copyright registration data can also be used to build new bibliographic data structures.  Therefore, the interests of copyright clearance and the interests of access to bibliographic data tend to converge.  I elaborate on this idea in a guest blog post for the Open Knowledge Foundation, who I’ve started to work with in these areas.  (For folks following the debate over OCLC’s WorldCat, this convergence is also worth keeping in mind when reading the just-released WorldCat Rights and Responsibilities draft, which I hope to comment on in the not-too-distant future.)

I hope you find this new copyright information useful.  And I’m very interested in hearing what you’re doing with it, or would like to do with it.

March 23, 2010

Lots of conversation keeps stuff sustainable

Filed under: libraries,people,preservation,sharing — John Mark Ockerbloom @ 10:12 pm

Among the hats I wear at my place of work is that of LOCKSS cache administrator. LOCKSS is a useful distributed preservation system built around the principle “Lots of copies keep stuff safe” (whose initials give the system its name).  The idea is that, with the cooperation of publishers, a bunch of libraries each harvest copies of selected online content, and keep backups on our own LOCKSS caches, which are hooked up to local library proxy services.  Then, if the material ever becomes inaccessible from the publisher, our users will automatically be routed to our local copies.  Each LOCKSS cache also periodically checks with other LOCKSS caches to ensure that our copies are still in good shape, and to repair or replace copies that have been lost or damaged.  (Various security features protect against leaks of restricted content, or unauthorized revisions of content.)

LOCKSS is open source software that runs on commodity hardware.  It was originally envisioned to run virtually automatically.  As Chris Dobson described the ideal in a 2003 Searcher article, “Take a computer a generation past its prime…. Hook it up to the Internet and put it in a closet. Stick in the LOCKSS CD-ROM and boot it up. Close the closet door.”  And then presumably walk away and forget about it.

Of course, it’s not that simple in practice, particularly if your library is proactive about its preservation strategy.  The thing about preservation at scale is there’s always something that needs attention.  It might be something technical, or content-related, or planning-related, but preserving a growing collection requires ongoing thought.  And if you want to think as clearly and sensibly as you can, you’ll want to collaborate.

Right now, for instance, I’m trying to get my cache to harvest the full run of a journal that’s just been made available for LOCKSS harvesting, where we hope to provide post-cancellation access through LOCKSS.  Someone at Stanford just gave me a useful tip on how to give this journal priority over the other volumes I’ve got queued up for harvest.  Unfortunately, I can’t try it out until I get my cache back up after it failed to reboot cleanly after a power failure. While I wait to hear back instructions about how best to remedy this, I wonder whether switching to a new Linux-based version of LOCKSS might make such operating system-level problems easier to deal with.  But it would be useful to hear from folks who are running that version to see what their experience has been.

Meanwhile, we’re wondering how best to approach new publishers who have content that our bibliographers would like to preserve via LOCKSS. Our special collections folks wonder whether we should preserve some of our own home-grown content via a private LOCKSS network.  I’m also doing some ongoing monitoring and testing of our LOCKSS cache’s behavior (some of which I’ve reported on earlier), and would be interested in knowing if others are seeing some of the same kinds of things that I see on the cache I administer.

In short, there are a lot of things to think about, when LOCKSS plays a significant role in a preservation plan.  And a lot of the issues I’ve mentioned above are ones that others may be thinking about as well.  So let’s talk about them.  As the LOCKSS group has said, “”A vibrant, active, and engaged user community is key to the success of Open-Source efforts like LOCKSS.”

One thing you need for such an engaged community is a forum for them to talk to each other.  As it turns out, the LOCKSS group at Stanford tell me they created a LOCKSS Forum mailing list a while back, but I haven’t yet seen it publicized.   Its information page is at https://mailman.stanford.edu/mailman/listinfo/lockss-forum .  (Currently, archived email messages are not visible on the open web, though this may change in the future.)  If you’re interested in talking with others about how you use or might use LOCKSS to preserve access to digital content, I invite you to sign up and help get the conversation going.

February 8, 2010

Shedding light on images in the public domain

Filed under: copyright,sharing — John Mark Ockerbloom @ 3:05 pm

For years, I’ve regularly gotten requests from authors and publishers for licenses to reproduce images in books listed on The Online Books Page, or included in the local collection of A Celebration of Women Writers.  Sometimes these requests relate to copyrighted books that I list but don’t control rights for; in those cases, I do my best to refer the request to the book’s copyright holder.  But often, they’re for images in our own collections, from books published over 100 years ago.  In those cases, I respond that the image is in the public domain (and our digitization, which adds no originality, is also in the public domain), so no license is necessary or appropriate.

Usually that response receives a thankful reply, sometimes with signs of surprise that an image can be reused without permission.  But sometimes I’ll get back a more alarmed reply.  “My publisher says I need a license for every image in my book, or I can’t use it,” it might say, followed by a plea for help in tracking down some long-defunct 19th century publisher.

I wish I could say this was an atypical anecdote.   But, if you look around the Web, you’ll find that there are huge numbers of historic images– paintings, photographs, figures, and the like– that are behind access barriers, or closed off altogether from online access, when they don’t have to be.  Artstor has over a million images of thousands of years of art that you can’t look at unless you’re at an institution that has a subscription.  The fine arts image catalog at my own library has over 100,000 digital images, none of which can be seen online by the public outside of Penn, except in thumbnails.  Neither Artstor nor Penn want to keep art away from the public; both are nonprofit educational institutions. But clearing images for free public access on a large scale has to date been impractical for these institutions.

Restrictions on images also create holes in other works.  For instance, under the proposed Google Books settlement, images in books that might be under copyright would be blanked out unless the rightsholder to the book also asserted they held the rights in the images.  These sorts of omissions can cut the heart out of many works.  In a recent New Republic article, “For the Love of Culture“, Lawrence Lessig described how a critical table was omitted in an otherwise free article about his daughter’s possible illness, due to rights-clearance issues.  “I could not believe that we were this far down the path to insanity already,” he wrote of the incident.

Part of the insanity is that many of these images from our cultural heritage are actually in the public domain.  Many people are aware that copyrights prior to 1923 have expired in the US.  But so have many copyrights from later in the 20th century.  Pre-1964 copyrights generally had to be renewed 28 years after the start of their term, or they would expire.   (Exceptions and further details are described here.)  But most copyrights were never renewed; and that’s especially true for images.

In 1923, there were copyright registrations for 3,059 works of art, 1,149 scientific and technical drawings, 7,533 photographs, and 11,289 prints and pictorial illustrations, making a total of  23,030 copyright registrations for these classes of image.  In 1951, 28 years later, there were 198 copyright renewals for all of these image classes combined.  This represents a renewal rate of less than 1%.

We have just completed posting scans that make all active copyright renewals for artwork viewable online.  In fact, once we finish scanning one last batch of renewals for maps and for commercial prints (meaning images created for product packaging and promotion) all active copyright renewals for any type of still image will be viewable online.   In later years, the number of image copyright renewals grows slightly, but not by much. But the number of images published in those years grows substantially.

Images without a copyright registration of their own might still be under copyright if they were first published as part of a copyrighted book, newspaper, magazine, or other larger work.  Fortunately, we have complete online renewal records for those kinds of works too.  It becomes much easier to establish the public domain status of a newspaper photograph, for instance, if you know (as I previously revealed) that no newspaper outside New York renewed copyright for any issue published before the end of World War II.

Having copyright renewals online for artwork is an important step towards freeing the public domain in images.  But there’s more needed to make copyright clearance practical at a large scale.  Putting scanned renewal records into a searchable database (perhaps combined with fair use image thumbnails) will make it easier to find any copyright renewals that might exist for a particular image.  (A similar database for book renewals already exists, and there are more book renewals than image renewals.)  Making original copyright registrations available as well (as we now have for artwork through 1949, and soon will have for later years) lets us determine when the copyright for an image began, and whether it was renewed in time to prevent it from expiring.

Furthermore, establishing the history and provenance of images will let us determine when unregistered artwork enter the public domain.  Registered or not, the copyright to an image created before 1964 began no later than its first US publication, and the copyright for many such images therefore ended after 28 years due to a lack of renewal.  And the mostly-frozen American public domain still includes more work each year that was never published before 2003.  On Public Domain Day last month, all such work by artists who died in 1939 entered the public domain in the US.  (I won’t get now into the rather baroque rules for establishing “publication” of an artwork, but you can determine it if  the history of the image is documented.)

So we have a rich treasure trove of images in the public domain that’s been largely buried under presumptions and uncertainties about copyright.  By finding and sharing information about their copyrights, we can protect and enjoy these images in the commons of the public domain, where they can be viewed freely, included in new works, and reused in any way we can imagine.  If you find this prospect intriguing, I hope you’ll help bring these images to light.

January 28, 2010

Every book its libraries: or, Taking care in withdrawal

Filed under: preservation,sharing — John Mark Ockerbloom @ 1:42 pm

The question of when to withdraw materials from libraries has gotten heightened attention lately.  Everyday readers may not always realize it, but most libraries get rid of books and other materials on a regular basis.  Libraries typically have limited space, but keep acquiring new materials to serve their audience’s needs.   As they acquire new materials, they typically make room by getting rid of materials that no longer serve their audience as well; this is variously known as “withdrawing”, “deaccessioning”, or “weeding”.

Some libraries weed more aggressively than others.  School and public libraries tend to turn over their collections more quickly than academic research libraries.  There’s not much value a middle-schooler can get out of an outdated science book, for instance, compared to a current one.  And a public library user looking for a book on how to use their new Windows 7 computer shouldn’t have to wade through stacks clogged with TRS-80 programming guides and the like.  You can find amusing anecdotes about books that have outlived their usefulness in these kinds of collections in the blog Awful Library Books, one of the blogs on LISNews’ 10 Librarian Blogs to Read in 2010.

Academic libraries typically don’t weed as aggressively.  The larger research libraries aim to have a broad selection of thought on subjects from various points in history, as well as whatever happens to be of current interest.   A book on science that no longer reflects current scientific understanding may still be useful for researchers that want to look at the history of science, or at how science interacted with culture at the time.  Even the peripheral details can be of interest; for instance the photographs in an obsolete computer guide can tell us what what the computers looked like, and how they were expected to be used.  The most interesting aspect of many old periodicals nowadays is often the advertisements, rather than the editorial content.

Especially when they’re digitized, large corpuses can also be of major interest even when the individual items might not be particularly noteworthy.  They can help you track the use and evolution of language, for instance, or quash unwarranted patents.  I’ve talked before about the great potential of Google Books and similarly comprehensive corpuses.

Even so, research libraries still get rid of materials, or move them to offsite warehouses, when space is short.  As more users access materials online instead of print, we often ship out print volumes that have online surrogates.  Recently Ithaka published a report called What To Withdraw that recommends gives guidelines for withdrawing materials that are online in sustainable archives (such as Ithaka’s own JSTOR), and that have a few physical copies in print archives somewhere.  Doing this responsibly may help many research libraries grow their collections, or repurpose their spaces, in useful ways.  Selling particularly valuable items to more appropriate libraries can also help fund additional library acquisition and activity.

Carefully considered, then, withdrawal can greatly benefit libraries and their users.  But libraries need to think not only about their own collection’s purposes, but about the systemic risks of individual library collection decisions.  For instance, many of the “Awful Library Books” justifiably withdrawn from public libraries might still be of historical research interest to someone.  Even if academic research libraries would keep them, many of the books intended for popular or specialized non-academic audiences were not collected by academic libraries in the first place.  If all the public libraries with these books simply throw them out, and no copy gets transferred to a library or archive with a longer-term interest, the materials may disappear forever.

Online access, as an alternative to retaining print copies, may not be as reliable as one expects.  Recently, the archives of many popular magazines that were available through various subscription databases became part of an exclusive deal from one database vendor.  This is likely to raise the costs of access to many libraries, both because they may have to subscribe to a new database to keep providing these magazines, and because the price of the new exclusive bundle is likely to increase.  But even if vendors keep prices reasonable, libraries’ own situations may change.  Here in Pennsylvania, funding to libraries has been cut severely enough that many now have to cancel subscriptions to heavily-used databases. The linked story has a heartbreaking quote from one of the public librarians that’s had to drop their formerly free Power Library subscription: “I got rid of [our old magazines] because everything was in the database.”

How can we insure against these sorts of cultural loss, even as we withdraw items?  A key principle is replication.  In the words of one well-known digital preservation program, “Lots of Copies Keep Stuff Safe”.  When we consider withdrawing something, we stop to think if some other library or institution might find it of value. If we’re considering dropping print originals for digital surrogates, we check to see if other institutions we trust are keeping the originals safe, or would be willing to do so.  We also make digital copies of print materials that may be at risk, and we try to spread around these copies as widely as practicality and copyright law allows.  And we develop and support efficient inter-library transfer networks so that we can quickly move locally deaccessioned materials to where they’re needed or valued.

Many librarians have a philosophy of public service that draws on Ranganathan’s famous set of Five Laws of Library Science, which includes principles like “every reader his book” and “every book its reader”.   As we try to preserve our broad cultural heritage in the midst of withdrawal, loss, and replication, a related principle, “Every book its libraries”, is a useful one to keep in mind.

[Edited slightly 4:12pm Jan 28, in response to a comment below: deleted struck-through text, and added italicized text]

January 15, 2010

October 5, 2009

Remember this

Filed under: people,preservation,sharing — John Mark Ockerbloom @ 12:40 am

I am eating a sandwich at the end of Pier 14 in San Francisco.  The sun has set behind the downtown skyscrapers, and the colors in the sky are slowly fading to grey.  I’m not the only diner out here.  Pelicans soar close off the pier, about 100 feet above the water, and one by one dive straight down with a loud splash, resurfacing in a moment, ruffling their feathers and jerking their beaks to get down the fish they’ve caught.  Other splashes in the water come from seals surfacing for air.  As an orange-tinted full moon comes up over the East Bay hills and under the span of the Bay Bridge, I see a pair of seals surface side by side, with their mouths meeting as they float at the water’s surface for a few seconds.  I am delighted to see all this, so different from what I usually see at home, and at the same time I wish I could be back there with the people I love instead of alone here.

I don’t have a camera right now, or anything to draw with, so I can only record this scene in words and in memory.   When there was still sun shining low on Yerba Buena Island and the coastline to the east, there were several people out here with tripods and light umbrellas, photographing human couples standing against the pier railings, in each other’s arms.  Judging from the clothing and the poses, I suspect these shots are for wedding or engagement albums.  And I can understand the motivation.   When Mary and I were married, 14 years ago this month, we too had pictures taken of us against a striking background, in our case the bright orange and yellow trees of a Pennsylvania fall.  I see one of those pictures every time I return home. Remember this, the picture says, and it brings back memories of the vows we made to each other that day.  The words we said, and the way we looked when we said them, were not recorded in fixed form, but, God willing, will stay in our hearts as long as we live.

There are more memories recorded out on the pier.  Plaques along the rails quote lines of poetry by Lawrence Ferlinghetti and Thomas Lovell Beddoes about the bay I’m looking out on.  Ceramic tile art depicts boats that have plied its waters, from the early days of European exploration to the present.  A display on the sidewalk in front relates the history of the pier, the ferries that ran (and still run, in smaller numbers) from the terminal nearby, the freeway that was built and then removed again from the water’s edge, and some of the people who played a part in all of these developments.  Remember this, they say, and I bring bits back with me to record in words.

It’s a basic need that we have, as intelligent, reflective, and social creatures, to remember the things we’ve experienced, seen, and learned about.  We make records of these things in various forms, to help us remember, and to prompt others to remember as well.  They help us go beyond and above what’s immediately in front of us, telling us things we need to know, people we can relate to, pasts that were different, futures that can be better.

Technology can make it easier for us to record these things– and sometimes easier to lose them.  We took many pictures of our kids on digital cameras as they grew up, and kept hundreds of them on my laptop, which let me easily recall them and show them to friends and family when I traveled.  Then one day I was robbed of my laptop, without my having backed up my photo collection, and most of those pictures were lost.   I’ve also seen  many other personal and family memoirs posted on the Web, stay for a few years, and then vanish with the demise of the web site they were on.    I kept paper tapes of early BASIC programs I wrote in middle school for years after I had access to any device that could read them.  They’re gone now; I presume they were thrown out when my parents cleaned house sometime after I left home.

I know better now how to keep what remains.  Apple’s Time Machine makes it easy for me to incrementally back up my laptop every time I come home from work and plug a cheap external drive into my USB port.  The pictures of my kids that survived the laptop theft were mostly the ones that I had shared with others (either by copying them onto prints, or by putting them up on the Web). And the older family pictures that are most meaningful to us are ones where we know what the pictures represent, either because we are in them, or because others have told us, in person or in writing, who is in the pictures and the context in which they were taken.

I am here in San Francisco for Ipres 2009, a conference promoting the preservation of digital content.  There are a lot of smart, dedicated people scheduled to speak, and I hope to learn about new technologies and methods to help us preserve the content we want our libraries and their users to remember.

While some of these techniques may be complex, many of them are essentially elaborations on basic principles I’ve touched on in what I’ve related above: Help people record what’s important to them.  Make it easy for them to preserve these records in their everyday activity.  Encourage them to copy and share what they record, and allow others to build on them.  Make what they record easy to interpret, through informative description and straightforward formats.  And finally, try to understand and appreciate the connection between the record and the people for whom the record is important.

Which is why I sit now with my laptop in my hotel room, looking out on a bay that is now as dark as the night sky overhead, and trying to connect my experiences with the preservation challenges and proposals to come. Remember this, I mean to say.  It’s important.

May 16, 2009

May 15, 2009

« Previous PageNext Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.