7 Responses to Open data’s role in transforming our bibliographic framework (Updated)

  1. MJ Ray says:

    Opening the data is a no-brainer for a cooperative – I explain why at http://www.software.coop/info/coopdev.html – and it would be brilliant to see OCLC do this and help build a library cooperative commonwealth. Please, pingback this post when you know where and how the LC and OCLC conversations will be taking place.

  2. Interesting, this is the first I had heard of OCLC considering that.

    What would be the best way to ‘join the conversation’, as you say? Oh, you say you don’t know either, heh.

    Really, any open license is good news. A “BY” license, while it sounds nice, can still be an odd barrier when it comes to data — especially because re-use of data often involves re-using data from _multiple_ (potentially dozens or in the future even hundreds) of databases. AND because it doesn’t neccesarily involve leaving the records intact, it can involve mixing and matching individual pieces of data from multiple sources, in a continual and iterative process of re-mixing. After such a process, you don’t really know what piece of data came from where — without spending lots of resources on trying to track, which if you are doing only to meet the needs of the license, means the license has added significant cost to your project. So if you don’t know what piece of data came from where, and there are dozens or hundreds of sources — where do you need to put the attribution exactly? Do you need all dozens or hundreds attributed on every page (and every API response?) from your app?

    Of course, it can all come down to the ‘terms as specified by the licensor’ for attribution. If it’s just on your About/splash page, “Some data from X”, that’s do-able.

  3. John Mark Ockerbloom says:

    Well, one good start is talking about it online in public, where LC, OCLC. and others can see what we say, and they and others can chime in.

    As far as more formal forums go, ALA Annual next month seems a promising place to have discussions. LC’s announcement says they’ll be having discussions there on their bibliographic transformation proposal, and OCLC always has multiple sessions at ALA. I don’t know offhand if any of the OCLC sessions are specifically about opening WorldCat data, but the issue is probably relevant to at least some of the OCLC ALA discussions. So if nothing else, it’d be helpful for people there to let them know that they value open bibliographic data, and will support them if they go that route (I’m not at this point planning on going to Annual myself, but others reading this might be, and my own plans could still conceivably change.)

    Extending the discussion beyond the usual professional cataloging and ALA crowd would also be useful. Off the top of my head, I could see people involved in VIVO, ORCID, CrossRef/DOI, Wikipedia, LibraryTbing, and publishing in general all potentially having something to contribute to open bibliographic data, and I’m sure there are plenty of others that could participate as well.

  4. Laura Smart says:

    If libraries are only allowed to use copies of OCLC bib records that they created, sans OCLC number, will it be a barrier for smaller/less technical institutions wishing to enter the linked data world? I’ve been pondering this lately. It seems to me like it would be more efficient for OCLC to make WorldCat available as linked data rather than giving people permission to use their contributions to WorldCat as linked data. There’s a world of difference. Why should the multitudes of small less technically blessed libraries do the work to expose their stuff when OCLC could do it on their behalf? We achieved economies of scale with cooperative cataloging. Why not with linked data?

    It makes no sense for libraries without a lot of unique holdings and/or all their bib records in WorldCat to host their own linked data. How many linked data bib records of Feynman’s lectures does the world need? Holdings records, on the other hand, would have much more utility.

    I get it that it’s a big scary thing for OCLC to have WorldCat as-a-whole be linked data. I get that they can’t leap into doing it as Open Data, although that would be ideal. I think Jay Jordan’s notion of “doing a deal” with anybody wanting to use the full database is a better direction to go. The licensing terms aren’t quite there yet for Open Data, due to the attribution pearling problem Jonathan mentions. A two-party agreement, however, would overcome that issue. Of course, it would probably also limit the mining of the 2nd party’s exposed data. It would be a good start though.

  5. John Mark Ockerbloom says:

    Laura: It’s not yet clear to me whether the proposal only applies to WorldCat records originally created by a library, or WorldCat records for items held by a library. (In any case, the proposal is not for releasing the entire WorldCat database under an open license; I’ve updated my post to make that clear.)

    If members want to make their catalogs generally available as open linked data, which sounds like the motivation from Karen’s slides, then it would seem to me to require the latter case: members being allowed to release open data for anything in their holdings. This in itself could be very useful.

    Organizations like Hathi Trust, for instance, could release full catalog records for their digital volumes (the records I’m getting from them now are stripped down somewhat at the moment to allow open reuse). Or, a few major research libraries could start an open data resource on nearly all significant scholarly journals. There are a number of other interesting and useful things one could do with a subset of WorldCat representing the holdings of even a few key members.

    If you or others know more details about the proposal, or can provide pointers to more information, I’d be grateful to hear about it.

  6. Laura Smart says:

    Now that I think about it, it seems like OCLC’s efforts to prevent WorldCat as-a-whole from being released as linked data will eventually be futile. If they let people expose their bib records (either original contributions only or the bib of any item they happen to hold) then at some future point there would be critical mass and the bulk of stuff-in-use would be out in the wild.

    I’m going to read into it more. I confess to skimming the record use agreement when it was released and I have to take a closer look. I’m recalling from somewhere that OCLC really doesn’t want the OCLC numbers included in any records a member wants to release. That would make any follow-your-nose RDF linking from member-exposed records back to WorldCat more difficult. And that doesn’t seem to be in OCLC’s best interest.

    I *love* the idea of research libraries releasing bib information on academic serials. I’d contribute our serials records into that (nobody should have to trace the publication of Comptes Rendu if somebody else has done it!). Of course there’s nothing to stop the journal publishers from exposing their title information themselves (and I believe they will be doing that, but at the more granular article level). If the publishers do it, where does a library fit in?

    Interesting times.

  7. Karen Coyle says:

    It seems to me that OCLC could release a modified version of WorldCat records as linked data that wouldn’t result in releasing what is essentially “cataloging data.” Most users of bibliographic data are not interested in much of the esoterica of the library catalog entry: they want authors, main title, dates, publisher. The stuff of citations. This is also purely factual information, generally taken from the piece. Releasing this wouldn’t rival OCLC’s cataloging revenue because it wouldn’t be enough for most libraries to accept in the place of a cataloging record, but it would be enough for folks wanting to create citations. Also, the big deal about WorldCat is not the bibliographic data but the holdings information. By releasing the base citation with an OCLC number as its base URI, OCLC creates better linking back to WorldCat and libraries.

    The defect that I see in OCLC’s thinking is that they seem to assume that everyone who is interested in WorldCat data wants it for cataloging, or at least that is the fear. The greatest interest, however, in my opinion is from outside of the library world, or it would be if WorldCat were presented differently.

Comments are closed.