The concept of a work in the catalog web

My previous post showed four examples of web-based catalogs that describe books and other library resources in a FRBR-like manner. They’re oriented around “works” (like Shakespeare’s Hamlet) that are embodied in multiple editions (like the hundreds or thousands of publications of Hamlet). None of the catalogs I showed fully conform to the FRBR “Work, Expression, Manifestation, Item” (WEMI) data model, but they seem to work well enough that they have large audiences using them to access sizable collections of books.

In developing similar work-oriented features on the Online Books Page, I’ve been implementing a similar information model. It’s simpler and more general than the FRBR WEMI stack, but it can encompass the data model of all of the catalogs from my previous post, as well as the “classic” FRBR model. In this post, I’ll describe the basics of the model, and discuss why it’s a promising basis for future catalogs.

Works as concepts

In my posts about concept-oriented catalogs, I showed how it’s useful for catalogs to guide users to resources via high-level, familiar concepts. Often when users are looking for something in the library, they don’t have an exact item in mind; rather, they seek some work, like Shakespeare’s Hamlet, in general terms. It’s our job to help them get from the concept in their head to a suitable copy of what they seek.

FRBR’s WEMI stack is one conceptual model for guiding this process. But there are other models worth considering as well, particularly given the huge array of knowledge resources potentially available to users, the sizable trove of “edition-level” bibliographic records already in library and publisher catalogs, and the expectations for simplicity that we’re accustomed to in the Google age. These conditions all make the case for models that are as simple as possible in their essentials, and that allow us to reuse the records we already have without requiring massive retooling. The four catalogs I highlighted in the last post seem to be converging towards such a model, and it’s a little different from FRBR’s.

Works as groups

If you look closely at the data contained in Open Library, Worldcat.org, LibraryThing, and Google Books, it becomes clear that they’re drawing on lots of existing edition records– from libraries, publishers, and booksellers– to populate their databases. They create “works” by grouping these edition records, and then associating identifiers and other data with the groupings. A “work” in these systems is in essence an identifiable, annotated group of “editions” that share a common creative origin.

The grouping and annotation of works can be done automatically: Google Books and WorldCat.org, for instance, seem to create groupings through automated metadata pattern matching, with little or no human intervention. Or it can involve people: LibraryThing members do much of the grouping and annotation of work records in that catalog.

Here’s a simple example: The Chicago Manual of Style is represented on The Online Books Page as a work that groups together several successive editions. This is a simple enough grouping that some libraries simply catalog the entire set with a single serial record. However, I found it useful (and not difficult) to create separate records for each edition, each with the publication data and the subtly-changing titles used in each edition. I didn’t have to include properties common to all of them, such as the corporate author and the subjects, in my edition records. Instead, I put them in my work record, and the editions automatically inherit this metadata from the work.

Property inheritance, augmentation, and overriding

Editions inherit appropriate metadata properties from a work they’re included in, but they can also augment these properties. They can even override them when appropriate.

Consider, for instance, the various editions of Hamlet in this work display. Each one inherits some metadata, like creators (Shakespeare) and subjects (like “Revenge — Drama“), from the work. Some editions also add to this metadata; for instance, a few add stage arrangers and illustrators. Even information like the display title can be inherited in brief edition records; however, in most of these edition records, I supply a display title that overrides the default from the work record.

Annotations, identifiers, and catalog links

Not all of the metadata associated with a work is inherited by its editions. The bibliographic notes on my Hamlet page, for example, provide useful context about the history of the work, but they aren’t particularly relevant to editions. So they aren’t inherited by them. Neither are the data structures I use to select, arrange, and comment on some of the more notable Hamlet editions on my work page.

In fact, this data doesn’t have to be in the work record at all. All that’s necessary to include information like this on my Hamlet page is to make annotations: associations between the work and the additional data. For this to work, I need three pieces of data: an identifier for the work, a property name (for example, “Bibliographic Notes”), and a value (which could either be directly expressed, such as with the actual text of a note, or indirectly, with something that identifies or points to that content).

If you’re familiar with Linked Data, this sort of annotation model practically screams “RDF Triples!” And indeed, RDF implements annotation data quite straightforwardly. But that’s just one way to do it. Within the confines of your own catalog, you can implement annotations however you like: as RDF, as fields encoded directly in the work record, as some rows in an SQL database, or whatever else works.

The types of identifiers I associate with my work record can make a big difference in the ease of annotating the work. With an internal identifier, I can manually add my own bibliographic notes about Hamlet, but I can’t easily interoperate with other catalogs to share notes. With standardized identifiers, though, I can go farther afield more easily. For instance, the Library of Congress defines a standard name/title authority heading for Hamlet: “Shakespeare, William, 1564-1616. Hamlet”. I use this identifier with my Hamlet work to automatically link to records for online books about Hamlet. I can do this because those books use that same heading as a work identifier in their subject headings.

I can also link Hamlet to records of books about Hamlet in any other catalog that also uses these identifiers to access their data. Thus, I can automatically link to books about Hamlet in worldcat.org, using the same Library of Congress’ authority heading as I used above. From links like these, one can start to see how a decentralized, global catalog web can form– but I’ll leave extended exploration of that idea to later posts.

Works that contain multitudes

Some works are more of a challenge than others. What if a reader comes to your catalog looking for the Bible? None of the four catalogs I profiled in my last post attempt to model this as a unified work. At most, they have work records for particular versions, like the King James Bible or the New International Version. But they don’t have a unified record covering “The Bible” as a whole. It would seem there are just too many versions and varations for this to work.

And yet every day people walk into bookstores looking for the Bible– including bookstores that stock hundreds of editions– and quickly come out with a suitable copy. Our library catalogs should be up to the same challenge. But we’ll need more than one level of work record to do it. Thus, my current work display for the Bible shows a work that groups together not only editions, but also other works. In particular, the “Bible” work currently groups the “King James Bible” work and the “Douai-Rheims Bible” work with other works and editions of the Bible.

These smaller work groups for particular Bible versions encompass subsets of what the higher-level “Bible” work group encompasses. The smaller groups inherit metadata from the larger group, and also pass along metadata to their own editions (and sub-groups, if any). I can adjust the organization of works and editions over time as I see fit. For instance, I currently list just one edition of the New International Version (NIV), so it’s directly included within the Bible work. If I later added other NIV editions, I might decide to group them all into a new “NIV” work encompassed by the Bible work. Or, if I add a lot more editions under my “King James Version” work, I might decide to add new work records representing editions of the 1611 King James text, the 1765 text, and so on.

Note that this nesting of works within works captures more specific versions of editions within more general ones. It it not meant to apply to parts of a work within a whole. So an edition of the Gospel of Matthew alone would not normally be encompassed by the “Bible” work, but by a distinct “Matthew” work. That work might have an annotation expressing a part-whole relationship with the “Bible” work.

That’s essentially what I do now, for instance, for single-testament Bible editions (including Jewish Bibles and standalone Christian New Testaments); I put them into separate works. These works (currently titled “Bible. O.T.” and “Bible. N.T.“) are not encompassed within the “Bible” work. But they are linked with it, via relationships that are prominently featured on the respective work pages. These relationships can be modeled as <identifier, property, value> triples, just like the other annotations I described in the previous section.

Putting it all together

So, to sum up the data model: We start with records (and identifiers) for editions, which basically correspond to the entities described in traditional library catalog records, or to FRBR Manifestations. We then introduce records for works, which group together sets of editions (as well as other works, perhaps) and have their own identifiers. Properties of works can be inherited by their editions (or their sub-works), though they can also be added to or overridden. And we can also annotate works (and editions) with all kinds of additional information, using logical <identifier, property, value> triples. These annotations can be managed independently of the basic records for works and editions, and they can be used to link together not only various records within our catalog, but also information maintained elsewhere in the global catalog web. There need be no arbitrary limits on the types or the sources of the annotations we include.

I’m using this general data model in my expansion-in-progress of The Online Books Page, but it also describes what I’m seeing in the four web-based catalogs I looked at in my last post. (Though none of those four at present seem to support works encompassed by other works as I’ve described above.) Moreover, this model is also compatible with both traditional library MARC catalogs and catalogs based on the standard FRBR model.

The FRBR WEMI stack, in particular, can be considered a special case, or application profile, for this general model. In particular, a standard WEMI entity set would include an edition record that represents a FRBR Manifestation, included in a “work” grouping that represents the manifestation’s FRBR Expression, included in another “work” grouping that represents the expression’s FRBR Work. FRBR also adds constraints on where different kinds of metadata get assigned in these records.

Traditional MARC catalogs can also be represented. Bibliographic records would simply be encoded as edition records that have not (yet) been grouped into works. The catalog could then be made more “FRBR-like” incrementally, by adding work groupings and appropriate annotations over time. Ideally, a lot of this additional information would come from others, not just one’s own cataloging work– but that’s a topic for another time.

Moving forward

By no means should libraries immediately throw out their existing catalogs, or their plans to adopt FRBR, in favor of this model. (And I especially would not recommend doing so on the basis of a single blog post!) But I believe that over time, it will become increasingly useful to have hybrid catalogs, ones that can include both traditional, “FRBR-ized”, and “FRBR-like” records, and that can integrate and interlink data from a wide variety of sources. Some of those sources (like the four web-based catalogs I discussed previously) will not always conform religiously to FRBR or other library standards.

Many libraries may have good reason to continue to do their own cataloging based on fairly strict, specific standards like MARC, RDA, and FRBR. But if I were designing a system to manage libraries’ catalog data, and I hoped to interoperate as much as I could with the ever-growing, ever-diversifying catalog web, I’d be strongly inclined to have the system use a more general underlying data model, like the one I’ve outlined here.

1 Response to The concept of a work in the catalog web

John Mark Ockerbloom says:

September 17, 2010 at 1:26 pm

Just to avoid any confusion, I don’t make any claim of originality for the basic ideas in this data model. A number of other people have been writing about them for quite some time now. In particular, Elaine Svenonius’s work on information organization has more than a little influence on what you see above. My main addition is simply to say “Hey, this model is actually working, in real catalogs being widely right now, and seems to work well in my own implementations. So maybe we should look at it more closely as we’re redesigning our cataloging architecture.”)

Comments are closed.

1 Response to The concept of a work in the catalog web

Pages

Recent Posts

Recent Comments

Archives

Access for all

Copyrights and wrongs

General library-related news and comment

Interesting folks

Metadata and friends

Readers and their rights

Shiny tech

Tales from the repository

Trans rights are human rights

Writing and publishing