Everybody's Libraries

June 16, 2014

October 4, 2013

April 29, 2013

March 22, 2013

Updates on library linking, Wikipedia, and what you can do

Filed under: discovery,libraries,sharing — John Mark Ockerbloom @ 4:55 pm

I’m gratified for the positive response I’ve been getting to the Forward To Libraries service I first introduced last month.  It really took off when I announced the templates for linking to libraries from Wikipedia a couple of weeks ago.   They’ve been written up in places like Boing Boing and in Wikipedia’s own Signpost newsletter.   The service now includes more than 150 libraries throughout the English-speaking world.  Various Wikipedia editors are also adding the link templates to various articles–  besides the handful I added myself, more than 450 have been added by other editors at this writing.  And I’ve heard from numerous librarians who now want to start editing Wikipedia themselves, both to add library links and to otherwise improve articles.  (Here’s how to become a Wikipedia editor.)

So far, I’ve largely provided this service on my own, with support from the University of Pennsylvania Libraries.   But I’d like to make the service more useful, and could use some help.  If you’re interested, here are some things you might want to know:

Some libraries are easier to link than others.   If you’re using one of many standard library catalogs or discovery systems, and you haven’t made substantial modifications to it, it’s easy for me to add your system. I basically just record what software you’re using and where on the Web the service runs, run some test searches to verify your system, and you’re good to go.  If you’re using a more customized, obscure, or home-grown system, I might still be able to add links to it, but it may take me more effort to figure out how to make useful search links into the system.  Any information you can provide would be helpful.  There are also certain off-the-shelf systems that I have problems with.  Many Polaris systems, for example, will give a “session timed out” message the first time you try to follow a search link into the system.   (Back up and try the link again, and everything will be fine for some time afterwards.)  Some other systems don’t seem to support deep search links in any consistent way that I’ve been able to determine, and not just some very old session-based systems, but also EBSCO’s fairly new EDS discovery platform.

I’ve determined ways to link into these various systems from reading various documentation files I’ve found on the public Internet, along with some reverse-engineering of public web sites.  If you know of better ways to link to some of these systems that I haven’t yet figured out myself, and this information can be made public, let me know.

For now, I’m declining to list libraries that don’t have many English-language subject or Library of Congress name headings, because the results of English searches in those libraries will be misleadingly incomplete.  But I’m considering ways to include translated searches, where the data to support this is available, for a wider range of countries.  (VIAF already provides much relevant data for names.)

The most popular new Wikipedia Library resource template is also controversial, and might be modified or deleted.   I provide a number of different templates for linking from Wikipedia to libraries, including the inlined text templates “Library resources about” and “Library resources by“, and the all-in-one sidebar template “Library resources box“. By far the most used of these templates has been the Library resources box.   It’s easy to spot in an article, it organizes links clearly, and it’s easy for editors to recognize as a template that they can add to articles they find of interest.  But some Wikipedians, including at least one Wikipedia admin, have objected to the template.  They cite style guidelines that say external link templates should not use boxes or other graphical elements, but only appear as inlined text.  I’ve defended the boxes, noted how other library-related external links commonly appear in boxes, and proposed ways to address various Wikipedian concerns.   But it’s ultimately up to the Wikipedia community to determine whether or how library links will appear in Wikipedia articles.  To find out more about the issues, see the Library resources box talk page.  And if you’re a Wikipedia editor or user, feel free to weigh in on that page or other relevant forums.

I’m exploring ways to make it easier for readers to get to our libraries.  For one, I’m starting to record IP ranges for some institutions, so that local network users can follow “resources in your library” links straight to the institution’s library, without having to first register a preference.  (Users can still register a different preference if they want.)  IP-based routing is an experimental service, initially being provided to a limited number of institutions, and I may modify or withdraw it in the future.  If you’d like me to consider it for your institution, you can submit a request, with the relevant IP ranges (preferably in CIDR format) in the “anything we should know?” field.  Note that the IP ranges you submit will be published as part of the library data I’m sharing for this project.

I’m starting to share my work on Github.  There is now a Github repository with selected data and code for the FTL project.  In it, you’ll find the data I use to link to the libraries enrolled in the service, and you’ll also see the code for the main CGI script used to forward readers to those libraries.   You can’t yet run the service out of the box yourself with the code and data provided so far, but I hope that what’s there will help people understand how the service works, and possibly implement similar services themselves if they’re so inclined.  The data’s released under CC0, so you can reuse it however you like; and the code is open-source licensed under the Educational Community License 2.0.  I hope to add more data and code over time, and I’m happy to hear suggestions for enhancements and improvements.

I’m hoping that as more people get involved, the service will improve, library resources will become more reachable online, and Wikipedia will become a more useful resource as well.  If you’d like to get involved yourself, I’d love to hear what you’re up to, and what suggestions you might have.

March 4, 2013

From Wikipedia to our libraries

Filed under: citizen librarians,discovery,libraries,online books,subjects — John Mark Ockerbloom @ 4:44 pm

I’ve heard the lament in more than one library discussion over the years.  “People aren’t coming to our library like they should,” librarians have told me.  “We’ve got a rich collection, and we’ve expended lots of resources on an online presence, but lots of our patrons just go to Google and Wikipedia without checking to see what we have.”  The pattern of quick online information-finding using search engines and Wikipedia is well-known enough that it has its own acronym: GWR, for Google -> Wikipedia -> References.  (David White gives a good description of that pattern in the linked article.)

Some people I’ve talked to think we should break this pattern.  With the right search tool or marketing plan, some say, we can get patrons to start with us first, instead of Google or Wikipedia.  This idea seems to me both futile and beside the point.  Between them, Google and Wikipedia cover a vast array of online information, more than librarians could hope to replicate or index ourselves in that medium.  Also, if we truly have better resources available in our libraries than can be found on the open Web, it’s less important that our researchers start from our libraries’ websites than that they end up finding the knowledge resources our libraries make available to them.

Looked at the right way, Wikipedia can be a big help in making online readers aware of their library’s offerings.  One of the things we spend a lot of time on in libraries is organizing information into distinct, conceptual categories.  That’s what Wikipedia does too: so far,  their English edition has over 4 million concepts identified, described, and often populated with reference links.  And Wikipedia has encouraged people to add links to relevant digital library collections on various topics, through programs like Wikipedia Loves Libraries and Wikipedian in Residence programs.  But while these programs help bring some library resources online, and direct people to those selected resources, there’s still a lot of other relevant library material that users can’t get to via Wikipedia, but can via the libraries that are near them.

So how do we get people from Wikipedia articles to the related offerings of our local libraries?  Essentially we need three things: First, we need ways to embed links in Wikipedia to the libraries that readers use.  (We can’t reasonably add individual links from an article to each library out there, because there are too many of them– there has to be a way that each Wikipedia reader can get to their own favored libraries via the same links.)  Second, we need ways to derive appropriate library concepts and local searches from the subjects of Wikipedia articles, so the links go somewhere useful.  Finally, we need good summaries of the resources a reader’s library makes available on those concepts, so the links end up showing something useful.  With all of these in place, it should be possible for researchers to get from a Wikipedia article on a topic straight to a guide to their local library’s offerings on that topic in a single click.

I’ve developed some tools to enable these one-click Wikipedia -> library transitions.  For the first thing we need, I’ve created a set of Wikipedia templates for adding library links. The documentation for the Library resources box template, for instance, describes how to use it to create a sidebar box with links to resources about (or by) the topic of  a Wikipedia article in a reader’s library, or in another library a reader might want to consult.  (There’s also an option for direct links to my Online Books Page, if there are relevant books online; it may be easier in some cases for readers to access those than to access their local library’s books.)

For the links to work, we need to know about the reader’s preferred library.  Users can register their preferred library (which will set a cookie in their browser recording that choice), or select it for each individual search.  We know how to link to several dozen libraries so far, and can add more libraries on requestWorldcat.org, which includes holdings of thousands of libraries worldwide, is also an option.  Besides the “Library resources box” template, I’ve also provided templates for in-text links to library resources, if those work better in a given article.  Links to these templates can be found at the end of the “Library resources box” documentation.

For the second thing we need, I’ve created a library forwarding service (“Forward to Libraries”, or FTL– catchier name suggestions welcome) that transforms links from Wikipedia into searches for appropriate  headings or keywords in local libraries.  This is the same service I describe in my “From my library to yours” blog post from last month, but it now supports links from Wikipedia as well as to Wikipedia.

Thanks to information included in the Library of Congress’ Authorities and Vocabularies datasets, OCLC’s VIAF data feeds, Wikipedia’s database downloads, and my own metadata compiled at The Online Books Page, FTL already knows how to link directly to over 240,000 distinct authority-controlled headings known to the Library of Congress from their corresponding Wikipedia articles.   (Library of Congress headings are used in most sizable US libraries, and many English-language libraries outside the US also use similar headings.)

For other articles, FTL by default will try a general keyword search based on the Wikipedia article’s title, which will often turn up useful results at the destination library.  Alternatively, my templates allow Wikipedia editors to determine a specific Library of Congress heading to use in library links, if appropriate.  I’m hoping to incorporate suggested headings into FTL’s own knowledge base as I detect them showing up in Wikipedia articles.  I also plan to publish FTL’s data sets under open access terms, so that others can use and improve on them as well.

The third part of this solution– displaying relevant resources at the destination library– can be implemented differently at each library.  For most of the libraries in FTL’s current knowledge base, links go to searches in the library’s regular online catalog.  But with some libraries, I’ve linked to another discovery system, if it seems to be the main search promoted at that library, and it seems to produce useful results.  The Online Books Page’s subject map displays also have features that I think will be useful to Wikipedia subject researchers arriving at my site, such as also showing related subjects and books filed under those subjects.  I hope in future posts to talk more about other useful guideposts and contextual information we could be providing to readers arriving from Wikipedia.

But if you’ve read this far, you probably want to see how this all works in practice.  So I’ve added some example library resources boxes in a few Wikipedia articles that seemed particularly relevant this month, including those for Women’s history, Elizabeth Cady Stanton, and Flannery O’Connor.  Look down in the “External links” or “Further reading” sections of those articles for the boxes, and view the page source of the articles to see how those boxes are constructed.

As with most things related to Wikipedia, this service is experimental, and subject to change (and, hopefully,  improvement) over time.  I’d love to hear thoughts and suggestions from users and maintainers of Wikipedia and libraries.  And if you find creating these sort of links from Wikipedia useful, and need help getting started, I’d be happy to help you bring them to your favorite Wikipedia topics and local libraries, as time permits.

February 11, 2013

From my library to yours

Filed under: libraries,online books — John Mark Ockerbloom @ 8:32 am

Even with well over one and half million books and serials, the collection I maintain at The Online Books Page is far from comprehensive.  The gaps in coverage are not hard to notice at sites like mine, because most material published under copyright– which can be as much as 90 years old at this point– is not made freely available online.  But all libraries, no matter how large or well-provisioned, have their gaps.  No one can collect everything, and a persistent reader or researcher will eventually find that their questions and interests go beyond the bounds of any particular collection.

However, there are lots of libraries out there, as well as lots of online information and literature that hasn’t been collected into an institutional library.  A good library, of whatever size, serves its users well by collecting the most useful materials it can get for their needs, and helping them get whatever else they need in other places.  Jeff Jarvis expressed this basic idea well a few years ago when discussing news organizations: “Cover what you do best.  Link to the rest.”

Many libraries already do this, in certain ways.  The inter-library loan system helps library users who know they want a particular title their own library doesn’t have.  Many libraries also maintain links to websites on various topics from their own library website or catalog.  But these links, often maintained separately by each library, can only cover so much ground, as librarians have limited time to collect and maintain links.  Even consortially maintained collections of links struggle to go beyond fairly generalized or particular-niche focuses, and stay current.

Libraries can do more, though.  People coming to a library often have a particular topic in mind that they want to learn or read more about.  They’re often looking for something they can pick up quickly, and for free.  Knowing what that topic is, we should be able to point them towards useful literature they can quickly and freely obtain, whether or not it’s a title they already had in mind, and whether or not it’s in our own collection or something we link to directly.  That’s the purpose of some new links now available on The Online Books Page.

For example, say you’re a high school student looking for books on the Underground Railroad.  If you browse to this subject on the Online Books Page, you’ll find a number of free online books I list on this topic, and related topics.  As before, you can explore those related topics, if you’re interested (maybe checking out fugitive slave biographies, for instance); or you can try digging deeper for books specifically on the Underground Railroad via the extended shelves.

But most of what you’ll find on my site will be 19th century and early 20th century materials.  Your local library is likely to have books you can freely read as well, reflecting more up-to-date historical research, as well as books that might be more accessible to a high school student.  There might also be useful research materials online that you can look at for free.

That’s why there’s a new “See also…” note just under the big “Underground Railroad” heading.  If you click on the words “your library” in that note, you’ll be referred to your regular library, if we know about it, to see what they have on the Underground Railroad.  (If you haven’t already told us which local library you want to use regularly, we give you a list of choices.  It’s a pretty small list to start with, but I’m taking requests for more libraries to add.  Or you can opt for OCLC’s Worldcat.org- they cover lots of libraries throughout North America and beyond.)  Even after you register a preferred library, you’re not stuck with only using that one.  You can click on the “elsewhere” link in the note to try a different library or service from the one you usually check– like maybe the university library that’s near your public library (or vice versa).

You might also want to find online research resources that aren’t books.  For some of those, try clicking on the Wikipedia link provided for this subject.  While the quality and reliability of Wikipedia articles themselves can vary, most mature Wikipedia entries include a rich set of useful links to more information.  (I’ve discussed previously how useful Wikipedia is as a concept-oriented catalog.)  The references and external links on Wikipedia’s Underground Railroad article, for instance, cover a wide range of informational websites, contemporary and current books, and digital library collections.

Similarly, if you’re looking at a list of online books by a particular author (like, say, W. E. B. Dubois), you’ll find a link at the bottom of the page to find more books by the author in libraries, as well as links to online books or Wikipedia articles about the author near the top.  There are also links to find library copies of a particular book on its detailed catalog page; see for instance, the links at the bottom of our catalog entry for The Souls of Black Folk.  This can be useful for people who want a print copy, or a different edition from the ones we list.

So far, I’ve added links from The Online Books Page to Wikipedia for more than 17,000 subjects, and links to library catalogs for millions of subjects, authors, and titles.  (My thanks to OCLC, the Library of Congress, and Wikipedia for providing bulk access to the data that makes it possible to do much of this automatically.)  I’ll be developing this service further, and doing more things with this data, in ways that I hope to describe here shortly.  But I hope this first step is a useful demonstration of ways that different kinds of libraries and catalogs– online and local, academic and public, institutional and informal– can support each other through user-directed, context-sensitive, concept-level links between collections.

January 1, 2012

Public Domain Day 2012: Five things we can do in the US

Filed under: copyright,libraries,online books,open access — John Mark Ockerbloom @ 10:24 am

It’s New Year’s Day again, and in much of the world, this means another year’s worth of works enter the public domain.  That’s a cause for celebration, as Europe and many other countries that have “life+70 years” copyright terms welcome works by James Joyce, Virginia Woolf, Jelly Roll Morton, and Elizabeth von Arnim into the public domain.  The Communia Project’s Public Domain Day website focuses on works by these and many other authors that are entering (in many cases, re-entering) the public domain in “life+70 years” countries.  Meanwhie, folks in Canada, New Zealand, and other countries that have held the line at the “life+50 years” terms of the Berne Convention can now freely enjoy the works of people like James Thurber, Ernest Hemingway, and H.D.

There’s not so much excitement about Public Domain Day in the US, where no published works are scheduled to enter the public domain for another 7 years, due to a 20-year copyright extension enacted in 1998.  But Americans don’t have to simply sigh and contemplate what might have been if our copyright terms hadn’t been extended.  The new year still provides a number of important opportunities for Americans to improve access to the public domain.

1. Find and free newly public domain unpublished works

Some works are going into the public domain in the US today: works never published prior to 2003 (or copyrighted under US law prior to 1978) by authors who died in 1941– the same authors whose published works go into the public domain in Europe today.

But who would care about such obscure works? one might ask.  Well, if you’re at all interested in understanding the dense, allusion-laden fiction of Joyce, or the psychology of Woolf, or the jurisprudential thinking of Louis Brandeis, or the inner lives of any of the rest of the “class of 1941″, having the right to freely access, publish, and build on their unpublished works can be crucial.

Up until now, for instance, scholars studying James Joyce have often been frustrated by sharp restrictions and legal threats made by the administrator of Joyce’s literary estate.  In 2008, Rebecca Ganz characterized the administrator thus: “[His] primary purpose is to quell any scholarship that he finds distasteful or an invasion of his family’s privacy. He has a history of harassing authors and artists until they buckle under the strain of trying to obtain legal rights to quote from the late author’s writings.”  Scholars wishing to invoke Joyce’s unpublished works in their work have either had to undertake multi-year legal battles, or cut back on the lines of inquiry they might otherwise pursue.

American libraries and archives have many illuminating papers by authors who died in 1941– even non-US authors like Joyce and Woolf.  US digitizers, librarians, and archivists can open up and publicize these works.   In some cases, we’re uniquely positioned to do so, since their unpublished works may still be under copyright in some other countries.

2. Increase worldwide availability of public domain works

Many of the millions of digitized books on the Internet are hosted in the US, in large-scale repositories like Google Books, HathiTrust, and the Internet Archive.  Many of these services give limited access to non-US readers or materials.  Google and HathiTrust, for instance, limit non-US access by default to books published as long as 140 years ago, to avoid falling afoul of “life+70 years” copyright terms abroad.  JSTOR likewise limits access to non-US journal volumes published in 1870 or later.

With another year’s worth of copyrights expiring in “life+70 years” countries, it should be safe for these US-based services to also open up worldwide access to another year’s worth of works, further freeing up the public domain.  HathiTrust is also willing to manually review copyrights on specific books to open up access.  If you come across any books in HathiTrust solely by authors who died in 1941 (or before) that are currently labeled only as “public domain in the United States”, you can request that they review it for opening up access worldwide.  Just use the “Feedback” button at the bottom of the book’s HathiTrust page, or the suggestion form on my Online Books Page; and make sure you ask specifically for non-US access.

3. Restore access to obscure copyrighted works from 1936 (and earlier)

After libraries and archives expressed concerns about the fate of obscure works under longer copyright terms, Congress included a special exemption in their 1998 copyright tem extension.  The exemption, codified as section 108(h) of the copyright law, states that “during the last 20 years of any term of copyright of a published work, a library or archives, including a nonprofit educational institution that functions as such, may reproduce, distribute, display, or perform in facsimile or digital form a copy or phonorecord of such work, or portions thereof, for purposes of preservation, scholarship, or research”, under certain conditions.  In particular, if the institution finds, after a reasonable investigation, that such a work is not “subject to normal commercial exploitation” (such as by being in print) and cannot “be obtained at a reasonable price”, and no rightsholder has filed a claim otherwise, the work qualifies for this special exemption.  As of this year’s Public Domain Day, qualifying publications from 1936 join what is now 14 years of works in this category.

So far, I have found very little digitized content online where this exemption is explicitly invoked.  (There are advantages to explicitly doing so, both because it helps clarify the right to use the material, and helps prevent inadvertent unauthorized propagation of the works, such as the commercial reprints of digitized books that are now common on many large bookselling sites.)  Yet many of the works in HathiTrust’s (currently suspended) orphan works initiative, and in the Internet Archive’s lending library, and more besides, could well qualify for this treatment– and unlike orphan works, where legislation has yet to be passed, the exemption for these materials is already explicitly authorized by statute.

Providing online access for these works is not without controversy.  A 2002 article by lawyer Mary Minow details some of the potential possibilities and risks.   While she concludes that libraries can put such works on the Web, the recent Author’s Guild complaint in its lawsuit against HathiTrust includes some push-back against this idea. But as the public domain in the US recedes further into history, and digital library projects increasingly look for ways to make our cultural heritage available online, American libraries would do well to proactively establish and exercise these rights for older works now languishing in obscurity.

4. Strengthen and sustain coalitions for reasonable copyright limits

The curtailment of the public domain is just one aspect of the overreach of copyright law in the US and elsewhere.  Right now, Congress is considering two bills, the Stop Online Piracy Act (SOPA) and the PROTECT IP Act (PIPA), whose enforcement provisions threaten to disrupt the core structures of the Internet and enable far-reaching censorship, in the name of stopping piracy.  Supporters of these bills hoped to have them passed by Christmas, but opposition from both “left” and “right” sides of the political spectrum has slowed the process down, caused some companies to withdraw support, and led to the proposal of less harmful alternatives for fighting piracy.

It’s still quite possible that SOPA and PIPA will pass, though.   Public Domain Day provides an opportunity for Americans to reflect on some of the good reasons for limiting the power and scope of copyright enforcement, and to redouble efforts to keep those limits reasonable.  Moreover, a coalition that can stop SOPA and PIPA can also work to prevent further extensions of copyright terms.  This can ensure that Americans will have more to celebrate in Public Domain Days to come– especially starting in 2019, when the remaining 1923 copyrights should finally expire in the US.

5. Give copyrights of your own to the public domain

Of course, those wishing to maximize public access and use of their works don’t have to wait for their copyrights to expire on their own.  They can dedicate them to the public domain any time they want.  Public Domain Day is a particularly auspicious time to make such gifts, no matter what country you’re in.  And with tools like the CC0 declaration, it’s easier than ever to do so.

A few years ago, I started an annual personal tradition of reviewing copyrights to works I’d created more than 14 years ago (the original initial term of copyright enacted by the founders of the US, and also approximately the ideal copyright term given in a recent economic analysis) and dedicating works to the public domain that I didn’t feel needed further copyright.  Accordingly, today I dedicate all the work of my creation that I published in 1997, and for which I still control rights, to the public domain.  For me, this consists primarily of websites like The Online Books Page as of that year, and other online writings.  But others have dedicated more high-profile material to the public domain after the same term.   And I’d be very happy to hear from others who are making similar dedications today (whether or not it’s after 14 years).

So, happy Public Domain Day to everyone in the US and elsewhere!  We all have things to celebrate, and things we can do, in the name of the public domain.

September 27, 2011

Libraries: Be careful what your web sites “Like”

Filed under: crimes and misdemeanors,data,libraries,people,privacy — John Mark Ockerbloom @ 6:15 pm

Imagine you’re working in a library, and someone with a suit and a buzz cut comes up to you, gestures towards a patron who’s leaving the building, and says “That guy you were just helping out; can you tell me what books he was looking at?”

Many librarians would react to this request with alarm.  The code of ethics adopted by the American Library Association states “We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”  Librarians will typically refuse to give such information without a carefully-verified search warrant, and many are also campaigning against the particularly intrusive search demands authorized by the PATRIOT Act.

Yet it’s possible that the library in this scenario is routinely giving out that kind of information, without the knowledge or consent of librarians or patrons, via its web site.  These days, many sites, including those of libraries, invoke a variety of third-party services to construct their web pages.  For instance, some library sites use Google services to analyze site usage trends or to display book covers.  Those third party services often know what web page has been visited when they’re invoked, either through an identifier in the HTML or Javascript code used to invoke the service, or simply through the Referer information passed from the user’s web browser.

Patron privacy is particularly at risk when the third party also knows the identity of users visiting sensitive pages (like pages disclosing books they’re interested in).  The social networking sites that many library patrons use, for instance, can often track where their users go on the Web, even after they’ve left the social sites themselves.

For instance, if you go to the website of the Farmington Public Library (a library I used a lot when growing up in Connecticut), and search through their catalog, you may see Facebook “Like” buttons on the results.  On this page, for example, you may see that four people (possibly more by the time you read this) have told Facebook they Liked the book Indistinguishable from Magic.  Now, you can probably easily guess that if you click the Like button, and have a Facebook account, then Facebook will know that you liked the book too.  No big surprise there.

But what you can’t easily tell is that  Facebook is informed you’ve looked at this book page, even if you don’t click on anything.  If you’re a Facebook user and haven’t logged out– and for a while recently, even if you have logged out– Facebook knows your identity.  And if Facebook knows who you are and what you’re looking at, it has the power to pass along this information. It might do it through a “frictionless sharing” app you decided to try.  Or it might quietly provide it to organizations that it can sell your data to as permitted in its frequently changing data use policies.  (Which for a while even included tracking non-members.)

For some users, it might not be a big deal if it’s generally known what books they’re looking at online. But for others it definitely is a big deal, at least some of the time.  The problem with third-party inclusions like the Facebook “Like” button in catalogs is that library patrons may be denied the opportunity to give informed consent to sharing their browsing with others.  Libraries committed to protecting their patron’s privacy as part of their freedom to read need to carefully consider what third party services they invite to “tag along” when patrons browse their sites.

This isn’t just a Facebook issue.  Similar issues come up with other third-party services that also track individuals, as for instance Google does.  Libraries also have good reasons to partner with third party sites for various purposes.  For some of these purposes, like ebook provision, privacy concerns are fairly well understood and carefully considered by most libraries.  But librarians might not keep as close track of the development of their own web sites, where privacy leaks can spring up unnoticed.

So if any of your web sites (especially your online catalogs or other discovery and delivery services) use third party web services, consider carefully where and how they’re being invoked.  For each third party, you should ask what information they can get from users browsing your web site, what other information they have from other sources (like the “real names” and exact birthdates that sites like Facebook and Google+ demand), and what real guarantees, if any, they make about the privacy of the information.  If you can’t easily get satisfactory answers to these questions, then reconsider your use of these services.

June 15, 2011

A digital public library we still need, and could build now

Filed under: citizen librarians,copyright,libraries,people,sharing — John Mark Ockerbloom @ 12:39 pm

It’s been more than half a year since the Digital Public Library of America project was formally launched, and I’m still trying to figure out what the project organizers really want it to be.  The idea of “a digital library in service of the American public” is a good one, and many existing digital libraries already play that role in a variety of ways.  As I said when I christened this blog, I’m all for creating a multitude of libraries to serve a diversity of audiences and information needs.

At a certain point after an enthusiastic band of performers says “Let’s put on a show!”, though, someone has to decide what their show’s going to be about, and start focusing effort there.  So far, the DPLA seems to be taking an opportunistic approach.  Instead of promulgating a particular blueprint for what they’ll do, they’re asking the community for suggestions, in a “beta sprint” that ends today.   Whether this results in a clear distinctive direction for the project, or a mishmash of ideas from other digitization, aggregation, preservation, and public service initiatives, remains to be seen.

Just about every digital project I’ve seen is opportunistic to some extent.   In particular, most of the big ones are opportunistic when it comes to collection development.  We go after the books, documents, and other knowledge resources that are close to hand in our physical collections, or that we find people putting on the open web, or that our users suggest, or volunteer to provide on their own.

There are a number of good reasons for this sort of opportunism.  It lets us reuse work that we don’t have to redo ourselves.  It can inform us of audience interests and needs (at least as far as the interests of the producers we find align with the interests of the consumers we serve).  And it’s cheap, and that’s nothing to sneer at when budgets are tight.

But the public libraries that my family prefers to use don’t, on the whole, have opportunistically built collections.  Rather, they have collections shaped primarily by the needs of their patrons, and not primarily by the types of materials they can easily acquire.   The “opportunistic” community and school library collections I’ve seen tend to be the underfunded ones, where books in which we have yet to land on the Moon, the Soviet Union is still around, or Alaska is not yet a state may be more visible than books that reflect current knowledge or world events.  The better libraries may still have older titles in their research stacks, but they lead with books that have current relevance to their community, and they go out of their way to acquire reliable, readable resources for whatever information needs their users have.  In other words, their collections and services are driven by  demand, not supply.

In the digital realm, we have yet to see a library that freely provides such a digital collection at large scale for American public library users.   Which is not to say we don’t have large digital book collections– the one I maintain, for instance, has over a million freely readable titles, and Google Books and lots of other smaller digital projects have millions more.  But they function more as research or special-purpose collections than as collections for general public reference, education, or enjoyment.

The big reason for this, of course, is copyright.  In the US, anyone can freely digitize books and other resources published before 1923, but providing anything published after that requires copyright research and, usually, licensing, that tends to be both complex and expensive.  So the tendency of a lot of digital library projects is to focus on the older, obviously free material, and have little current material.  But a generally useful digital public library needs to be different.

And it can be, with the right motivation, strategy, and support.  The key insight is that while a strong digital public library needs to have high-quality, current knowledge resources, it doesn’t need to have all such resources, or even the most popular or commercially successful ones.  It just needs to acquire and maintain a few high-quality resources for each of the significant needs and aptitudes of its audience. Mind you, that’s still a lot of ground to cover, especially when you consider all the ages, education levels, languages, physical and mental abilities, vocational needs, interests, and demographic backgrounds that even a midsized town’s public library serves.  But it’s still a substantially smaller problem, and involves a smaller cost, than the enticing but elusive idea of providing instant free online access to everything for everyone.

There are various ways public digital libraries could acquire suitable materials proactively.  The America.gov books collection provides one interesting example.  The US State Department wanted to create a library of easy-to-read books on civics and American culture and history for an international audience.  Some of these books were created in-house by government staff.  Others were commissioned to outside authors.  Still others were adapted from previously published works, for which the State Department acquired rights.

A public digital library could similarly create, commission, solicit, or acquire rights to books that meet unfilled information needs of its patrons.  Ideally it would aim to acquire rights not just to distribute a work as-is, but also to adapt and remix into new works, as many Creative Commons licenses allow.  This can potentially greatly increase the impact of any given work.  For instance, a compellingly written,  beautifully illustrated book on dinosaurs might be originally written for 9-12 year old English speakers, and be noticeably obsolete due to new discoveries after 5 or 10 years.  But if a library’s community has reuse and adaptation rights, library members can translate, adapt, and update the book, so it becomes useful to a larger audience over a longer period of time.

This sort of collection building can potentially be expensive; indeed, it’s sobering that America.gov has now ceased being updated, due to budget cuts.  But there’s a lot that can be produced relatively inexpensively.  Khan Academy, for example, contains thousands of short, simple educational videos, exercises, and assessments created largely by one person, with the eventual goal of systematically covering the entire standard K-12 curriculum.  While I think a good educational library will require the involvement of many more people, the Khan example shows how much one person can get accomplished with a small budget, and projects like Wikipedia show that there’s plenty of cognitive surplus to go around, that a public library effort might usefully tap into.

Moreover, the markets for rights to previously authored content can potentially be made much more efficient than they are now.  Most books, for instance, go out of print relatively quickly, with little or no commercial exploitation thereafter.  And as others have noted, just trying to get permission to use  a work digitally, even apart from any royalties, can be very expensive and time-consuming.  But new initiatives like Gluejar aim to make it easier to match up people who would be happy to share their book rights with people who want to reuse them. Authors can collect a small fee (which could easily be higher than the residual royalties on an out-of-print book); readers get to share and adapt books that are useful to them.   And that can potentially be much cheaper than acquiring the rights to a new work, or creating one from scratch.

As I’ve described above, then, a digital public library could proactively build an accessible collection of high-quality, up to date online books and other knowledge resources, by finding, soliciting, acquiring, creating, and adapting works in response to the information needs of its users.  It would build up its collection proactively and systematically, while still being opportunistic enough to spot and pursue fruitful new collection possibilities.  Such a digital library could be a very useful supplement to local public libraries, would be open any time anywhere online, and could provide more resources and accessibility options than a local public library could provide on its own.  It would require a lot of people working together to make it work, including bibliographers, public service liaisons, authors, technical developers, and volunteers, both inside and outside existing libraries.  And it would require ongoing support, like other public libraries do, though a library that successfully serves a wide audience could also potentially tap into a wide base of funds and in-kind contributions.

Whether or not the DPLA plans to do it, I think a large-scale digital free public library with a proactively-built, high-quality, broad-audience general collection is something that a civilized society can and should build.  I’d be interested in hearing if others feel the same, or have suggestions, critiques, or alternatives to offer.

April 9, 2011

Opt in for open access

Filed under: copyright,libraries,online books,open access — John Mark Ockerbloom @ 8:40 am

There’s been much discussion online about Judge Chin’s long-awaited decision to reject the settlement proposed by Google and authors and publishers’ organizations over the Google Books service. Settlement discussions continue (and the court has ordered a status conference for April 25).  But it’s clear that it will be a while before this case is fully settled or decided.

Don’t count on a settlement to produce a comprehensive library

When the suit is finally resolved, it will not enable the comprehensive retrospective digital library I had been hoping for.  That, Chin clearly indicated, was an over-reach.  The  proposed settlement would have allowed Google to sell access to most pre-2009 books published in the English-speaking world whose rightsholders had not opted out.   But, as Chin wrote, “the case was about the use of an indexing and searching tool, not the sale of complete copyrighted works.”  The changes in the American copyright regime that the proposed settlement entailed, he wrote, were too sweeping for a court to approve.

Unless Congress makes changes in copyright law, then, a rightsholder has to opt in for a copyrighted book to be made readable on Google (or on another book site).  Chin’s opinion ends with a strong recommendation for the parties to craft a settlement that would largely be based on “opt-in”.  Of course, an “opt in” requirement necessarily excludes orphan works, where one cannot find a rightsholder to opt in.  And as John Wilkin recently pointed out, it’s likely that a lot of the books held by research libraries are orphan works.

Don’t count on authors to step up spontaneously

Chin expects that many authors will naturally want to opt in to make their works widely available, perhaps even without payment.  “Academic authors, almost by definition, are committed to maximizing access to knowledge,” he writes.  Indeed, one of the reasons he gives for rejecting the settlement is the argument, advanced by Pamela Samuelson and some other objectors, that the interests of academic and other non-commercially motivated authors are different from those of the commercial organizations that largely drove the settlement negotiations.

I think that Chin is right that many authors, particularly academics, care more about having their work appreciated by readers than about making money off of it.  And even those who want to maximize their earnings on new releases may prefer freely sharing their out of print books to keeping them locked away, or making a pittance on paywall-mediated access.  But that doesn’t necessarily mean that we’ll see all, or even most, of these works “opted in” to a universally accessible library.  We’ve had plenty of experience with institutional repositories showing us that even when authors are fine in principle with making their work freely available, most will not go out of their way to put their work in open-access repositories, unless there are strong forces mandating or proactively encouraging it.

Don’t count on Congress to solve the problem

The closest analogue to a “mandate” for making older books generally available would be orphan works legislation.    If well crafted, such a law could make a lot of books available to the public that now have no claimants, revenue, or current audience, and I hope that a coalition can come together to get a good law passed. But an orphan works law could take years to adopt (indeed, it’s already been debated for years). There’s no guarantee on how useful or fair the law that eventually gets passed would be, after all the committees and interest groups are done with it.  And even the best law would not cover many books that could go into a universal digital library.

Libraries have what it takes, if they’re proactive

On the other hand, we have an unprecedented opportunity right now to proactively encourage authors (academic or otherwise) to make their works freely available online.  As Google and various other projects continue to scan books from library collections, we now have millions of these authors’ books deposited in “dark” digital archives.  All an interested author has to do is say the word, and the dark  copy can be lit up for open access.  And libraries are uniquely positioned to find and encourage the authors in their communities to do this.

It’s now pretty easy to do, in many cases.  Hathi Trust, a coalition of a growing number of research institutions, currently has over 8 million volumes digitized from member libraries.  Most of the books are currently inaccessible due to copyright.  But they’ve published a permission agreement form that an author or other rightsholder can fill out and send in if they want to make their book freely readable online.  The form could be made a bit clearer and more visible, but it’s workable as it is.  As editor of The Online Books Page, I not infrequently hear from people who want to share their out of print books, or those of their ancestors, with the world.  Previously, I had to worry about how the books would get online.  Now I usually can just verify it’s in Hathi’s collection, and then refer them to the form.

Google Books also lets authors grant access rights through their partner program.  Joining the program is more complicated than sending in the Hathi form, and it’s more oriented towards selling books than sharing them.  But Google Books partners can declare their books freely readable in full if they wish, and can give them Creative Commons licenses (as they can with Hathi).  Google has even more digitized books in its archives than Hathi does.

So, all those who would love to see a wide-ranging (if not entirely comprehensive), globally accessible digital library now have a real opportunity to make it happen.  We don’t have to wait for Congress to act, or  some new utopian digital library to arise.  Thanks to mass digitization, library coalitions like Hathi’s, and the development of simplified, streamlined rights and permissions processes, it’s easier than ever for interested authors (and heirs, and publishers) to make their work freely available online.  If those us involved in libraries, scholarship, and the open access movement work to open up our own books, and those of our colleagues, we can light up access to the large, universal digital library that’s now waiting for us online.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 86 other followers