This is the second of a series of think-aloud posts on what I’m calling “concept-oriented catalogs”; catalogs that, as Lorcan Dempsey aptly describes them, go “beyond the bibliographic record”. This post will present examples of concept-oriented catalogs, describe the concepts they use, and describe some of the features that make them work.
Reflections on the story so far
As I stated in my first post, a catalog helps a user get from some set of concepts (ideas, citations, words, people, places, etc.) of the information they’re seeking, to some useful knowledge resources (books, articles, web sites, songs, etc.) A concept-oriented catalog uses a variety of concepts, not just those of knowledge resources themselves, as first-class entities to help locate useful resources.
Note that this is essentially a functional view of a catalog, from a user’s perspective, as a means of knowledge discovery. Library catalogs also serve a number of other important functions. In particular, they also manage the inventory of resources the library has acquired and makes available. That function is necessarily resource-oriented. Some concept-oriented catalogs don’t need to be resource-oriented; Wikipedia, for example, maintains no inventory of the sites it links to from its concept-based articles. A concept-oriented library catalog, however, will probably also need to be resource-oriented to fulfill all its functions. That’s fine– the two qualities are not mutually exclusive.
Concepts can come from a variety of places; from libraries, from experts, and from ordinary readers. I’ll show examples of all three.
Fun with FRBR
I noted in the last post that current library catalogs are largely based around the entities that FRBR describes as manifestations and items. FRBR defines a number of other entities as well. These entities can also provide useful focuses for concept-oriented catalogs.
Fiction Finder, mentioned in my last post, is not the only example of a work-oriented catalog. A recent status report from the OpenLibrary Project indicates that they are moving to make their catalog work-oriented as well. The Amazon book catalog, and its aggregation of various editions of a book (and their reviews) on one catalog page, is oriented around expressions in the FRBR sense.
In practice, the line between works and expressions tends to be blurry in catalogs. But we’re definitely seeing more catalogs present search results at these higher bibliographic levels. (See, for instance, the Worldcat.org search results for War and Peace, which include various “view all editions and formats” links.) Catalogs that can’t easily and sensibly aggregate their manifestations of works or expressions will be at a competitive disadvantage over catalogs that can.
Numerous catalogs also provide special information for authors, whether persons or corporate bodies. I mentioned WorldCat Identities in my previous post; a commenter to that post noted the New Zealand Electronic Text Centre’s catalog, which also aggregates useful information on its authors, as well as other concepts.
Google Maps and other geospatial platforms provide the basis for many catalogs oriented around places, mediating access to information resources of all types, from local histories to violent incident reports, to hospital ratings. Catalog interfaces can also be built around events, as demonstrated in MIT’s Simile Timelines widget and in Google’s Living Stories news tracker. Objects, concepts (which in FRBR-ese denote abstract subjects, rather than the broader use I’m making of “concept” in this post), and other kinds of subjects are the focus of the subject map-based catalogs I mentioned in my last post.
Catalogs need not limit their focus to one kind of concept. They often get more interesting when users can move between different kinds of concepts. For instance, if you look up the author Fanny Jackson Coppin on The Online Books Page, you’ll see that there are not only resources by her, but resources about her. If you look for the latter, you’ll also discover that Coppin was an African-American teacher in Philadelphia, and then be able to follow further links to more books about African American teachers, or Philadelphia, or other related subjects as you see fit. Or, if you started a search with African American teachers in mind, you’ll find that concept linked to Coppin. So users can move back and forth between particular subjects, and people that are relevant to them, as they browse the catalog.
What makes this sort of navigation work? Part of it involves analysis of existing bibliographic records: in this case, catalogs like The Online Books Page (as well as others like WorldCat Identities) analyze patterns of subject headings to relate particular people to particular subjects. Having common identifiers helps as well: the authority-controlled string “Coppin, Fanny Jackson” is used both for author and subject metadata, making it possible to link author- and subject-based concepts. Subject maps also require additional data beyond just bibliographic records. Mine use a variety of data sources, including authority records and a small but essential set of records I created for certain geographical entities.
Consult the experts
Librarians have lots of understanding and experience working with FRBR-like concepts. But users also find value in many other concepts. A recent research report from Project Information Literacy on information seeking patterns of college students included some interesting findings on favored starting points. Wikipedia, as I expected, was a very popular starting point for everyday research, but for course-related research, the most popular starting point, edging out not only Wikipedia but even Google itself, was course readings.
Course readings are highly focused guideposts for academic research. The list of readings is a set of knowledge resources chosen carefully by the instructor to educate students about the subject of the course. The readings themselves typically have bibliographies, or at least lists of references, that support their own assertions and suggest further avenues of research. Essentially, the syllabus and the readings represent a careful curation by subject experts of important knowledge resources for a particular topic. The value of that curation often extends well beyond a particular class. At Penn, many of the reading lists originally developed for a class get adapted into research guides on our library’s Web site, helping others researching in similar areas. And interesting things also start to happen as you aggregate multiple scholars’ reading and citation lists, as we’ve seen with services like PennTags (originally designed, and still frequently used, for class projects), CiteULike, and Google Scholar.
There are also many collections assembled and curated by non-faculty experts. The curation of many of these collections often involves a rich set of concepts appropriate to the focus of each collection. Consider, for instance, the Freedman Jewish Sound Archive, curated by a lawyer and his wife and now housed at Penn. The catalog for the archive is oriented around several important concepts: among them songs that are musical compositions, tracks that record performances of the songs, albums that contain the tracks, sheet music in which songs are published, and artists that have various kinds of relationships with each of those other concepts. Each of these concepts has an expressive metadata schema that includes relations to the other concepts, and to resources that can be read or listened to onsite or online. The catalog is not only a guide to knowledge resources, but a valuable knowledge resource in its own right.
Because of the complexity and interrelationship between concepts, we had to write a specialized interface to bring out the full expressiveness of the catalog. Simply providing a flat search index of tracks was not enough. But many of the special concepts used to provide depth for this catalog can also be mapped to some extent into more common bibliographic data structures and interfaces. And new technologies I’ll discuss later on can make it easier to build, and share data from, these sorts of specialized catalogs.
There are a lot more scholars, and expert curators, of all types, than there are professional librarians. And they often know more about knowledge resources in their areas of interest than we do. We can potentially build much more broad, conceptually rich, and carefully curated catalogs, if we develop effective ways to work with them.
Power to the people
Ordinary readers or scholars might not have the information science training or metadata expertise that professional librarians have. But they can still help us greatly in finding useful knowledge resources, just through their ongoing reading and commentary, if we have some way of tracking and aggregating what they do.
Social software gives us a way to do that, and its benefits have been widely discussed in recent years. Social software introduces the user as a first-class concept that can be associated with particular descriptions and knowledge resources. Implicitly or explicitly, users build up collections of resources that they have noted, possibly using tags to describe them, or posts to comment on them. The tags and posts are typically informal expressions, rather than controlled terms or highly structured records. Tags can often describe resources in more accessible language, or to greater specificity, than established controlled descriptive terms. And users often find useful resource recommendations from other users who share their interests.
Many social communities have built up around different user groups, types of resource, or styles of communication. PennTags (oriented around Penn scholars), Flickr (oriented around photos and other images), and Twitter (oriented around short real-time posts) are just a few examples. I’m far from the only person who relies heavily on the users I follow on Twitter for reading tips and current awareness. Longer-form posts that review and invite comments on books and other works can also be very helpful in finding useful knowledge resources.
Social software has been making its way into libraries for a few years now. Many library catalogs, ours included, now let users tag and annotate bibliographic records. The catalogs provide a smattering of new functions based on this tagging, but tagging is essentially an add-on, not fully integrated into the catalog.
Catalogs designed from the ground up to be socially powered can look quite different. We’re starting to see some of them now. One notable “social catalog” is LibraryThing, which has been developed by Tim Spalding, a staff of about a dozen professionals, and a membership of nearly a million readers. Tim gave a talk to librarians in October called “What is Social Cataloging?“. In the talk, he describes and demonstrates the features and the concepts of his catalog (including many of the concepts I’ve mentioned in this post), and the many benefits that emerge when hundreds of thousands of readers collaboratively catalog their personal libraries. The video of the full talk runs just under an hour, and is well worth watching.
LibraryThing does not provide all of the functionality you’d find in a good research library catalog. But it provides many useful forms of guidance that most traditional catalogs lack, and it also suggests some ways in which libraries and their users can join forces in building comprehensive, user-focused, concept-oriented catalogs. (Much of the bibliographic metadata used in LibraryThing comes from traditional library catalog records; and LibraryThing in turn now sells a service that embeds some of the other information it aggregates back into traditional library catalog displays.)
Recap and coming attractions
In this post, I’ve shown how catalogs can use a wide variety of concepts to help users find resources. The concepts come from, and are maintained by, various groups of people, including librarians, scholars, collectors and other domain experts, and lots and lots of everyday readers. The catalog concepts may be derived in part from existing MARC bibliographic metadata (sometimes through automated analysis), but often draw from additional data sources. The catalogs may require new user interface designs, going beyond the standard search-box and list-of-hits paradigm, for users to take full advantage of their concepts.
The examples I’ve shown should demonstrate the versatility of concept-oriented catalogs, and also suggest some of the challenges of implementing them. In future posts, I hope to discuss the technologies, data models, system architectures, and social structures that can help make useful concept-oriented catalogs practical for libraries to build and maintain.