Everybody's Libraries
Libraries for everyone, by everyone, shared with everyone, about everything
Skip to content
  • Home
  • About
  • About the Free Decimal Correspondence
  • Free Decimal Correspondence
  • ILS services for discovery applications
  • John Mark Ockerbloom
  • The Metadata Challenge
← The value of catalogs in the linked data era: Two recent talks
Public Domain Day 2017: Keeping memory alive →

Forward to Libraries update, and some thoughts on sustainability and scale

Posted on October 3, 2016 by John Mark Ockerbloom

It’s been a while since I posted about Forward to Libraries, but if you’ve been following my Github repo, you may have noticed that it’s had a steady stream of updates and growth.  If making connections across library collections or between libraries and Wikipedia interests you, the service is more comprehensive and wide-ranging than ever.  Here’s an update:

Number of libraries

We now support forwarding to over 1,000 library systems worldwide, running dozens of off-the-shelf and custom-developed catalog and discovery systems.  I’ve expanded coverage in all 50 states, Canada, the UK, and Australia, and have also added links to more countries in all inhabited continents.  (Because the system currently focuses on Library of Congress headings, it works best with Anglo-American catalogs, but it yields acceptable results in some other catalogs as well, like the big Norway research library union catalog I just added today.)  While major research libraries and big-city public libraries are well-represented, I’ve also been trying to add HBCUs, community colleges, rural library networks, and other sometimes-overlooked communities.  And I respond quickly to requests from users to add their libraries.

Wikipedia coverage

Forward to Libraries can be used to make links between library collections, and to and from my millions-strong Online Books Page listings.  But it’s also known for its interoperation with Wikipedia, where it knows how to link between over half a million distinct Library of Congress name and subject headings and their corresponding English language Wikipedia articles.  The majority of these mappings are name mappings provided by OCLC’s VIAF, which in its most recent data dump includes over 485,000 VIAF identifiers that include both a Library of Congress Name Authorities identifier and an English Wikipedia article link.  An additional 50,000 or so topical, demographic, and geographical LC subject headings are also mapped.  These mappings derive from exact matches, algorithmic matching for certain kinds of heading similarities, and a manually curated mappings file that has grown over time to include more than 22,000 correspondences.

What you can do with this data

If you’re browsing a topic or an author’s works on the Online Books Page, you can follow links to searches for the same topic or author in any of the 1000+ libraries currently in my dataset.  (If your library’s not one of them, just ask me to add it.)  If the topic or the author is covered in one of the 500,000 corresponding Wikipedia articles the system knows of, you’ll also be offered a link to the relevant article.

If you’re browsing certain Wikipedia articles, you’ll also find links from them back to library searches in your favorite of those 1000+ library systems (or any other of those systems you wish to search).  Right now those links use templates that must be manually placed, so there are only about 2500 Wikipedia articles with those links, but any Wikipedia editor can add the templates to additional articles.  (A bot could potentially add more quickly, but that would require some negotiation with the Wikipedia community that I haven’t undertaken to date.)  If you’re involved in a Wikipedia-library collaboration project (like this one), you may want to add one of these templates when editing articles on topics that are likely to have relevant source materials in multiple libraries.  (The most common template used is the Library Resources Box, generally added to the External Links or Further Reading section of an article.)

If you’re interested in offering a similar library or Wikipedia linking service from your own catalog or discovery system, I’d be interested in hearing from you.  You can either point to my forwarding service (using a standard linking syntax) or implement your own forwarder based on my code and data on Github.  Right now it requires some effort and expertise to implement either method, but I’m happy to work with interested libraries or developers to make forwarding easier to implement.

Scaling and sustaining the service

The Forward to Libraries service still runs largely as a 1-person part-time project.  (The Wikipedia templates are largely placed by others, and the service fundamentally depends on regularly updated data sets from OCLC, the Library of Congress, and Wikipedia, but I maintain the code and coordinate the central referral database myself.)

Part-time, 1-person projects raise some common questions:  “Will they be sustained?”  and “Can they scale?”  I wasn’t sure myself of the answers to those questions when I started developing this service.  Fortunately, I went ahead and introduced it anyway, and three-going-on-four years later, I’m happy to say that the basic service *is* in fact more sustainable and scalable than I’d thought it might be.   The code is fairly simple, and doesn’t require a lot of updating to keep running.  The main scale issues for the basic service have to do with the number of library systems and the number of topic mappings in the system, and those are both manageable.

The number of libraries turns out to be the more challenging factor to manage.  Libraries change their catalogs and discovery systems on a regular basis, and when they do, search links that worked for their old catalogs often don’t work for the new ones.  I have an automated tool that I run occasionally to flag libraries that return an error code to my search links; it’s not very sophisticated, but it does alert me to many libraries whose profiles I need to update, without my having to check all of them manually.  (If you find any libraries where forwarding is no longer working, you can also use the suggestion form to alert me to the problem.)  The more pressing scaling problem at the moment is the user experience: right now, when you’re asked to choose a library, the program shows a list of links to all 1000+ libraries currently in the system.  That can be a bit much to handle, especially for users whose data is metered.  Updating the library choice form to only show local libraries after the user selects the state, province or country they’re in will cut down on the data sent to the user; that may cost the user an extra click, but the tradeoff seems worth it at this point.

The number of topic mappings, on the other hand, has been easier to manage than I’d thought.  VIAF publishes updated data files for names about once a month, and I can run a script over it to automatically update my service’s name heading mappings when a new file comes out. Likewise, Wikipedia is now providing twice-a-month dump files of its English language encyclopedia.  I can download one of Wikipedia’s files in a couple of hours, and then run a script that flags any topical articles I map to that have gone away, changed their title, or redirected to other articles.  I can then change my mappings file accordingly in under an hour. Library of Congress subject headings change as well, but they don’t change very fast.  New and changed topical headings are published online about once a month, and one can generally add or change mappings from one of their updates within a few hours.  I spend a few spare-time hours each month adding mappings for older subject headings, so if the current rate of LCSH growth holds, in theory I could cover *all* LCSH topical headings in my manual mappings file after some number of years.  In practice, I don’t have to do this, especially as topical mappings also start to get collaboratively managed in places like Wikidata.  (I’m not currently working with their data set, but hope to do so in the future.)

Broadening the service

While I can maintain and gradually grow the basic service with my current resources, broadening it would require more.  Not every library, particularly outside the US, uses Library of Congress Subject Headings, so it would be nice to offer mappings to more subject ontologies and languages.  Similarly, not everyone likes to work with Wikipedia (often with good reason), and it’d be nice to support links to and from alternative knowledge bases as well.  The basic architecture of Forward to Libraries is capable of handling multiple library and web-based ontologies, but additional coding and data maintenance would be required.  There are also various ways to build on the service to engage more deeply with libraries and current topics of interest; I’ve explored some ideas along these lines, but haven’t had the time to implement them.

Things continuing as they are, though, I should be able to maintain and grow the basic Forward to Libraries service for quite some time to come.  I’m thankful to the people and the data providers that have made this service possible. And if you’re interested in doing more with it, or helping develop it in new directions, I’d be very glad to talk with you.

Share this:

  • Email
  • Print
  • Twitter
  • Facebook
  • Reddit

Like this:

Like Loading...

Related

About John Mark Ockerbloom

I'm a digital library strategist at the University of Pennsylvania, in Philadelphia.
View all posts by John Mark Ockerbloom →
This entry was posted in data, discovery, libraries, metadata, wikipedia. Bookmark the permalink.
← The value of catalogs in the linked data era: Two recent talks
Public Domain Day 2017: Keeping memory alive →
  • RSS feed
  • Pages

    • About
    • Free Decimal Correspondence
    • ILS services for discovery applications
    • John Mark Ockerbloom
    • The Metadata Challenge
  • Recent Posts

    • Public Domain Day countdown on public social media networks
    • Building a new banned books exhibit for a new era
    • Public Domain Day 2022: Trespassers Will
    • Coming soon to the public domain in 2022
    • Public Domain Day 2021: Honoring a lost generation
  • Recent Comments

    • david on Public Domain Day countdown on public social media networks
    • Rebecca on Public Domain Day countdown on public social media networks
    • sinergio katharismou on Public Domain Day countdown on public social media networks
    • Sandra McIntyre on Public Domain Day 2022: Trespassers Will
    • Chris Rusbridge on Public Domain Day 2022: Trespassers Will
  • Archives

    • November 2022
    • September 2022
    • January 2022
    • December 2021
    • January 2021
    • December 2020
    • March 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • July 2019
    • June 2019
    • January 2019
    • December 2018
    • October 2018
    • June 2018
    • January 2018
    • December 2017
    • September 2017
    • January 2017
    • October 2016
    • September 2016
    • July 2016
    • May 2016
    • January 2016
    • January 2015
    • June 2014
    • January 2014
    • October 2013
    • August 2013
    • April 2013
    • March 2013
    • February 2013
    • January 2013
    • December 2012
    • July 2012
    • May 2012
    • January 2012
    • October 2011
    • September 2011
    • June 2011
    • May 2011
    • April 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • January 2009
    • December 2008
    • November 2008
    • October 2008
    • September 2008
    • August 2008
    • July 2008
    • June 2008
    • May 2008
    • April 2008
    • March 2008
    • February 2008
    • January 2008
    • December 2007
    • November 2007
  • Access for all

    • Open Access News
  • Copyrights and wrongs

    • Copyfight
    • Copyright & Fair Use
    • Freedom to Tinker
    • Lawrence Lessig
  • General library-related news and comment

    • LISNews
    • TeleRead
  • Interesting folks

    • Jessamyn West
    • John Scalzi
    • Jonathan Rochkind
    • K. G. Schneider
    • Karen Coyle
    • Lawrence Lessig
    • Leslie Johnston
    • Library Loon
    • Lorcan Dempsey
    • Paul Courant
    • Peter Brantley
    • Walt Crawford
  • Metadata and friends

    • Planet Cataloging
  • Shiny tech

    • Boing Boing
    • O’Reilly Radar
    • Planet Code4lib
  • Tales from the repository

    • RepositoryMan
  • Writing and publishing

    • if:book
    • Making Light
    • Publishing Frontier
Everybody's Libraries
Blog at WordPress.com.
  • Follow Following
    • Everybody's Libraries
    • Join 150 other followers
    • Already have a WordPress.com account? Log in now.
    • Everybody's Libraries
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Copy shortlink
    • Report this content
    • View post in Reader
    • Manage subscriptions
    • Collapse this bar
%d bloggers like this: