Everybody's Libraries
Libraries for everyone, by everyone, shared with everyone, about everything
Skip to content
  • Home
  • About
  • About the Free Decimal Correspondence
  • Free Decimal Correspondence
  • ILS services for discovery applications
  • John Mark Ockerbloom
  • The Metadata Challenge
← Everybody’s Library Questions: Copyright and advertisements
Invitation to participate in a new project: Help open journals’ deep backfiles →

The worth of the work

Posted on September 20, 2019 by John Mark Ockerbloom

A number of research libraries, including the one where I work, are increasingly interested in catalogs built on linked data rather than on MARC records. Linked data work has been going on for years, in the hopes of reaping benefits that include “decreasing redundant cataloging work, and increasing visibility of library resources and interoperability with non-library systems” (to quote from a Wikipedia summary of the W3C’s Library Linked Data Incubator Group Final Report from 2011).

One of the key concepts introduced in most linked data catalogs is the concept of a “work”, an entity that describes literary and intellectual creations in general that are manifested in one or more specific editions  (the entities described in MARC records).  Back in 2010, I discussed what works were, and how they were represented in various online systems.  In a followup post, I discussed a basic model for works that I had started to use in my own Online Books Page.  (I continue to use that model there, though to date I’ve only created a few dozen work records there.)

When I wrote those posts, most library catalogs were still firmly MARC-based. MARC still dominates in practice, but BIBFRAME and other linked data initiatives have gained traction in libraries, making work information models and data increasingly important.  When brought into regular use, work-based catalogs may bring about major changes in how we do cataloging work, and how our users discover our resources.  It’s a good time, then, to consider whether the linked data library catalogs and data models we’re beginning to adopt are working in the ways we want.

In considering this question, it’s worth asking: How do we want our work cataloging to work?  Here are my answers, and some of the concerns those answers raise for me:

The model for works should be simple and flexible, so that our users can understand it and use it in a wide variety of information-acquisition scenarios. It’s easy to create unnecessarily complications in our models of works. In my 2010 posts, I noted that FRBR had a two-tier model of “works” and “expressions”.  I recommended instead a general model of “works” that covered pretty much any grouping of information resources that shared a common set of characteristics that a user might be seeking.  Depending on what was being sought, work groupings could be as tightly defined as hardcover and paperback editions of identical printed pages, or as loose and wide as all religious text compilations commonly called Bibles.  (There could also be many groupings between these extremes.)

I was glad to see early BIBFRAME models collapse the FRBR “Work” and “Expression” distinction into a single Work concept. But it now looks to me like its Works might not be defined or handled as flexibly as I hoped they might be.  Hence, SVDE has found it necessary to introduce “Superworks“, and the Library of Congress has proposed “Hubs“, both of which appear intended to cover wider groupings than BIBFRAME’s Works cover.  It’s not clear to me that those new concepts will be intelligible to users, or that they will suffice to cover all the kinds of groupings that might be relevant to users’ searches.

Work catalog data should be createable and maintainable by either humans or machines, as appropriate. A number of the linked data catalogs now being built create work entities automatically, using algorithms to cluster catalog records that seem to be related.  Automated work clustering is indeed important at scale, particularly when you consider not only books but also articles. Projects like Unpaywall cluster millions of published articles with their preprints and other free alternatives, to aid in open access to research, and at that scale need to build most clusters automatically.  But we can’t let machines have the last word on creating and clustering work entities.  There may be many forms of work groupings that readers find important, and that machines can’t easily sort out.  Human catalogers are often the best determiners of how to set up, maintain, describe, and annotate groupings that are relevant to human readers.

Work identifiers and data should be maximally reusable.  A major potential advantage of cataloging works distinctly from particular editions is that the information about those works can be shared broadly across all libraries that hold the work, and with all users that are interested in information about the work.  But those advantages largely go away if every library catalog, vendor, and consortium mints and maintains its own work identifiers and data without coordination, or if the work identifiers and data are kept proprietary and have restrictions on their reuse, or if reusable identifiers can’t easily be created at scale by libraries or scholars interested in a work.  Work identifiers should persist over the long term, resolve easily to usable metadata, and grow as comprehensively as our users need them to.  The identifiers should also be reusable without restrictions, and the data associated with them should also have minimal restrictions.  (In particular, any data necessary to clearly define what a work identifier refers to should be open, so that others can use that identifier without confusion.)

Work cataloging should not waste people’s time.  The systems we use to catalog works, if well-designed, should support catalogers doing more with their time, not less.  Shared work data can potentially cut down on the time required to catalog instances of works that someone already cataloged.  But if work-level linked-data cataloging tools and environments are overly cumbersome, requiring more screens and slower data entry even for routine items than in existing cataloging environments, the worth of the work comes into question.  Similarly, work-aware catalogs should make it easier and quicker for users to find what they want, and not harder due to unwanted complexities in work representation and display.  Linked data catalogs should also support easy reference to their work identifiers and associated library data by others who want to write about, cite, or associate additional data with those works.

Maybe works in linked data catalogs will have all of the characteristics I’m asking for above.  For those who have worked more directly than I have in designing, developing, and putting data into these new catalogs, are the things I’ve described above also the things you want out of works, or are there things I’m missing?  How well do you think what’s currently being developed is satisfying these wants, and where do you think we need more work or more discussion?   I’d be interested in hearing from (or reading) anyone who has useful thoughts on making the work that goes into works worthwhile.

 

Share this:

  • Email
  • Print
  • Twitter
  • Facebook
  • Reddit

Like this:

Like Loading...

Related

About John Mark Ockerbloom

I'm a digital library strategist at the University of Pennsylvania, in Philadelphia.
View all posts by John Mark Ockerbloom →
This entry was posted in discovery, libraries, metadata, online books. Bookmark the permalink.
← Everybody’s Library Questions: Copyright and advertisements
Invitation to participate in a new project: Help open journals’ deep backfiles →
  • RSS feed
  • Pages

    • About
    • Free Decimal Correspondence
    • ILS services for discovery applications
    • John Mark Ockerbloom
    • The Metadata Challenge
  • Recent Posts

    • Public Domain Day countdown on public social media networks
    • Building a new banned books exhibit for a new era
    • Public Domain Day 2022: Trespassers Will
    • Coming soon to the public domain in 2022
    • Public Domain Day 2021: Honoring a lost generation
  • Recent Comments

    • david on Public Domain Day countdown on public social media networks
    • Rebecca on Public Domain Day countdown on public social media networks
    • sinergio katharismou on Public Domain Day countdown on public social media networks
    • Sandra McIntyre on Public Domain Day 2022: Trespassers Will
    • Chris Rusbridge on Public Domain Day 2022: Trespassers Will
  • Archives

    • November 2022
    • September 2022
    • January 2022
    • December 2021
    • January 2021
    • December 2020
    • March 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • July 2019
    • June 2019
    • January 2019
    • December 2018
    • October 2018
    • June 2018
    • January 2018
    • December 2017
    • September 2017
    • January 2017
    • October 2016
    • September 2016
    • July 2016
    • May 2016
    • January 2016
    • January 2015
    • June 2014
    • January 2014
    • October 2013
    • August 2013
    • April 2013
    • March 2013
    • February 2013
    • January 2013
    • December 2012
    • July 2012
    • May 2012
    • January 2012
    • October 2011
    • September 2011
    • June 2011
    • May 2011
    • April 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • January 2009
    • December 2008
    • November 2008
    • October 2008
    • September 2008
    • August 2008
    • July 2008
    • June 2008
    • May 2008
    • April 2008
    • March 2008
    • February 2008
    • January 2008
    • December 2007
    • November 2007
  • Access for all

    • Open Access News
  • Copyrights and wrongs

    • Copyfight
    • Copyright & Fair Use
    • Freedom to Tinker
    • Lawrence Lessig
  • General library-related news and comment

    • LISNews
    • TeleRead
  • Interesting folks

    • Jessamyn West
    • John Scalzi
    • Jonathan Rochkind
    • K. G. Schneider
    • Karen Coyle
    • Lawrence Lessig
    • Leslie Johnston
    • Library Loon
    • Lorcan Dempsey
    • Paul Courant
    • Peter Brantley
    • Walt Crawford
  • Metadata and friends

    • Planet Cataloging
  • Shiny tech

    • Boing Boing
    • O’Reilly Radar
    • Planet Code4lib
  • Tales from the repository

    • RepositoryMan
  • Writing and publishing

    • if:book
    • Making Light
    • Publishing Frontier
Everybody's Libraries
Blog at WordPress.com.
  • Follow Following
    • Everybody's Libraries
    • Join 150 other followers
    • Already have a WordPress.com account? Log in now.
    • Everybody's Libraries
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Copy shortlink
    • Report this content
    • View post in Reader
    • Manage subscriptions
    • Collapse this bar
%d bloggers like this: