Free the sources!

I gave a lightning talk this past Sunday when Mary1 and I attended Wikipedia Day at the Columbia School of Journalism. Below is approximately what I said, with links to websites I showed during the talk, and few footnotes.Our thanks to Wikipedia NYC and the Brown Institute for Media Innovation for hosting the event!

I’m glad to be here to celebrate Wikipedia’s birthday this weekend. (And I’m looking forward to the cake2.) Many of us are also celebrating some other things, like recently a new public domain day. And we’re not just celebrating famous characters like Mickey Mouse, but all kinds of cultural works and information resources that we write about in Wikipedia and use as sources for our articles.

And it’s not just works from 1928 like Steamboat Willie, but it’s also a lot of later works that are not so obviously in the public domain, like all the works as late as 1963 that didn’t renew their copyrights when required and works as late as 1989 published in the US without copyright notices.


Wikipedians have long recognized the value of public domain resources in the work we do. And if we can build up a better, more comprehensive and more reliable understanding of all the things are in the public domain, we can share more of it with the world, and use more of it in Wikipedia and other free and open projects.


I work at the Libraries at the University of Pennsylvania. Our collections have a lot of public domain source materials. A fair bit of our obvious public domain has been digitized. But we also have a lot of non-obvious “hidden” public domain materials. In particular, we have a lot of serials: journals, magazines, newspapers, newsletters, and the like. They’re often great sources for knowledge and culture you can’t find anywhere else, and a lot of this content from the 20th century is public domain because the publishers didn’t bother to maintain their copyrights.


So, a while back we started what we call the Deep Backfile serials project. We wrote some code to identify serials we held that might be in the public domain. That table of serials that we compiled was big, and we weren’t likely to research all of it any time soon. But then the COVID pandemic hit and we had to close the library buildings. We realized that it was a great opportunity to have many of our staff now working from home research the copyrights of lots of these serials so they could eventually be made available online not just during lockdowns, but afterwards as well.

To do this, we created a detailed questionnaire which allowed a librarian to consult some designated sites about any serial in our list, and once they’d answered all the questions they could and submitted the questionnaire, an expert would review it, and we’d post what we found out about what was copyrighted in that serial, what seemed to be public domain, and what could be freely put online3


Now some serials, like The New Yorker, had regular renewals, and pretty much all of their issues get the full 95 years of copyright. But for other serials, like, say, the Columbia Journalism Review, little or nothing was renewed in their early days, so in fact a number of their issues from the 1960s can freely go online (and some have).

It turns out there are lot more serials like the Columbia Journalism Review than there are are like The New Yorker. And we know that in part because while our library buildings were closed our librarians used that questionnaire to research over eight thousand serials.

I still have a few hundred of them left to review– regrettably, the only person regularly available for expert review was me– but everything we have reviewed we’ve published online as linked open data, with links to and from Wikidata, and to Wikipedia, and to any free and legal online copies of serial issues that we know about. And that’s a growing corpus, because digitizers like Internet Archive, HathiTrust, as well as any number of smaller independent digitizers have access to this information, and they can use it to make serial content available online free for all.


Now Wikipedia also has a lot of information about serials. In fact, when I ran a Wikidata query to find serials that had articles about them in English Wikipedia, I found well over ten thousand of them that were potentially or actually in the public domain, at least in part. And while Penn librarians have researched a lot of them, and I show what we’ve found out in this table, the majority of these serials described in Wikipedia don’t yet have expert-reviewed copyright information on them.


So, I hope I’m not going to regret this, but I’ve just taken that questionnaire that we used in the Penn Libaries, and I’ve now made it available for all of these serials described in Wikipedia.

So if you’re a Wikipedian interested in documenting and freeing these serials, you can fill out this questionnaire for any serial in this table you’re interested in. And I can review it, and publish what you’ve found as CC0 linked open data, and link it with Wikidata, so it’ll be available to anyone who’s willing and able to put public domain content from that serial online.


There’s a lot of work that can be done here, but I’m hoping there are are a few interested Wikipedians here who are interested in some of these serials, and we can try putting them into this Deep Backfile open knowledge base, and perhaps scale it up over time as we have in the Penn Libraries to document and free a lot of new sources in the public domain.


If this interests you, the Meetup page for this Wikipedia Day event has a link under Lightning talks to the Deep Backfile knowledge base I’ve created for serials covered in Wikipedia, and a link for contacting me. Thank you!

Footnotes

  1. Mary Mark Ockerbloom has more experience editing Wikipedia than I do, and has been active in the Wikipedia community for years, co-leading a regular WikiSalon, and working on her own and as a Wikipedian in Residence for various organizations on topics like women writers, science communication, and countering disinformation. She’s currently available to work with new projects and organizations. ↩︎
  2. As it turned out, Mary and I had to leave the event before the cake came out, so we could catch our train back home. But we hope it was good. ↩︎
  3. At the time I gave this talk, the questionnaire link went to a full detailed form for a serial we hadn’t yet researched, with the title and ISSN pre-filled but the rest of the form blank. It might look different in the future after the serial is researched and included in our Deep Backfile knowledge base. To see an example of a blank detailed questionnaire, go to any serial in this table with “Unknown” in the “First renewal” column, and select the “Contact us” link at the right end of its table row. ↩︎

About John Mark Ockerbloom

I'm a digital library strategist at the University of Pennsylvania, in Philadelphia.
This entry was posted in citizen librarians, open access, publicdomain, serials, wikipedia. Bookmark the permalink.

Leave a comment