Free the sources!

I gave a lightning talk this past Sunday when Mary1 and I attended Wikipedia Day at the Columbia School of Journalism. Below is approximately what I said, with links to websites I showed during the talk, and few footnotes.Our thanks to Wikipedia NYC and the Brown Institute for Media Innovation for hosting the event!

I’m glad to be here to celebrate Wikipedia’s birthday this weekend. (And I’m looking forward to the cake2.) Many of us are also celebrating some other things, like recently a new public domain day. And we’re not just celebrating famous characters like Mickey Mouse, but all kinds of cultural works and information resources that we write about in Wikipedia and use as sources for our articles.

And it’s not just works from 1928 like Steamboat Willie, but it’s also a lot of later works that are not so obviously in the public domain, like all the works as late as 1963 that didn’t renew their copyrights when required and works as late as 1989 published in the US without copyright notices.


Wikipedians have long recognized the value of public domain resources in the work we do. And if we can build up a better, more comprehensive and more reliable understanding of all the things are in the public domain, we can share more of it with the world, and use more of it in Wikipedia and other free and open projects.


I work at the Libraries at the University of Pennsylvania. Our collections have a lot of public domain source materials. A fair bit of our obvious public domain has been digitized. But we also have a lot of non-obvious “hidden” public domain materials. In particular, we have a lot of serials: journals, magazines, newspapers, newsletters, and the like. They’re often great sources for knowledge and culture you can’t find anywhere else, and a lot of this content from the 20th century is public domain because the publishers didn’t bother to maintain their copyrights.


So, a while back we started what we call the Deep Backfile serials project. We wrote some code to identify serials we held that might be in the public domain. That table of serials that we compiled was big, and we weren’t likely to research all of it any time soon. But then the COVID pandemic hit and we had to close the library buildings. We realized that it was a great opportunity to have many of our staff now working from home research the copyrights of lots of these serials so they could eventually be made available online not just during lockdowns, but afterwards as well.

To do this, we created a detailed questionnaire which allowed a librarian to consult some designated sites about any serial in our list, and once they’d answered all the questions they could and submitted the questionnaire, an expert would review it, and we’d post what we found out about what was copyrighted in that serial, what seemed to be public domain, and what could be freely put online3


Now some serials, like The New Yorker, had regular renewals, and pretty much all of their issues get the full 95 years of copyright. But for other serials, like, say, the Columbia Journalism Review, little or nothing was renewed in their early days, so in fact a number of their issues from the 1960s can freely go online (and some have).

It turns out there are lot more serials like the Columbia Journalism Review than there are are like The New Yorker. And we know that in part because while our library buildings were closed our librarians used that questionnaire to research over eight thousand serials.

I still have a few hundred of them left to review– regrettably, the only person regularly available for expert review was me– but everything we have reviewed we’ve published online as linked open data, with links to and from Wikidata, and to Wikipedia, and to any free and legal online copies of serial issues that we know about. And that’s a growing corpus, because digitizers like Internet Archive, HathiTrust, as well as any number of smaller independent digitizers have access to this information, and they can use it to make serial content available online free for all.


Now Wikipedia also has a lot of information about serials. In fact, when I ran a Wikidata query to find serials that had articles about them in English Wikipedia, I found well over ten thousand of them that were potentially or actually in the public domain, at least in part. And while Penn librarians have researched a lot of them, and I show what we’ve found out in this table, the majority of these serials described in Wikipedia don’t yet have expert-reviewed copyright information on them.


So, I hope I’m not going to regret this, but I’ve just taken that questionnaire that we used in the Penn Libaries, and I’ve now made it available for all of these serials described in Wikipedia.

So if you’re a Wikipedian interested in documenting and freeing these serials, you can fill out this questionnaire for any serial in this table you’re interested in. And I can review it, and publish what you’ve found as CC0 linked open data, and link it with Wikidata, so it’ll be available to anyone who’s willing and able to put public domain content from that serial online.


There’s a lot of work that can be done here, but I’m hoping there are are a few interested Wikipedians here who are interested in some of these serials, and we can try putting them into this Deep Backfile open knowledge base, and perhaps scale it up over time as we have in the Penn Libraries to document and free a lot of new sources in the public domain.


If this interests you, the Meetup page for this Wikipedia Day event has a link under Lightning talks to the Deep Backfile knowledge base I’ve created for serials covered in Wikipedia, and a link for contacting me. Thank you!

Footnotes

  1. Mary Mark Ockerbloom has more experience editing Wikipedia than I do, and has been active in the Wikipedia community for years, co-leading a regular WikiSalon, and working on her own and as a Wikipedian in Residence for various organizations on topics like women writers, science communication, and countering disinformation. She’s currently available to work with new projects and organizations. ↩︎
  2. As it turned out, Mary and I had to leave the event before the cake came out, so we could catch our train back home. But we hope it was good. ↩︎
  3. At the time I gave this talk, the questionnaire link went to a full detailed form for a serial we hadn’t yet researched, with the title and ISSN pre-filled but the rest of the form blank. It might look different in the future after the serial is researched and included in our Deep Backfile knowledge base. To see an example of a blank detailed questionnaire, go to any serial in this table with “Unknown” in the “First renewal” column, and select the “Contact us” link at the right end of its table row. ↩︎
Posted in citizen librarians, open access, publicdomain, serials, wikipedia | Leave a comment

The public domain gets the last word

In 1857, work began on a revolutionary new dictionary covering the entire history of English word usage with example quotations. The first installment of A New English Dictionary on Historical Principles, covering A through Ant, appeared in 1884. The last, covering V-Z, was published in 1928. Its US copyright status has been murky, but as of tomorrow the entire first edition of what’s now known as the Oxford English Dictionary is definitively in the US public domain.

Posted in publicdomain | Tagged | 1 Comment

Extra! Extra!

Sometimes one work’s arrival in the public domain brings extras along with it. In two days, Ben Hecht and Charles MacArthur’s play The Front Page, which Peter Marks called “the best play about newspapering ever written”, joins the public domain. Assuming no other prior copyright dependencies, that also frees two films derived from it that have unrenewed copyrights: the 1931 film The Front Page, and the 1940 film His Girl Friday. Both are in the National Film Registry.

Posted in publicdomain | Tagged | Leave a comment

Move fast and disintegrate things

John Taine’s science fiction novel Green Fire is set in 1990, and some of what it describes fits that time, like television and mobile phones. Other aspects, like gender and social customs, read much more like 1928, the year it was published. It may be early, but in some ways unsettlingly on the nose, for its villain, a billionaire technocratic monopolist with a chip on his shoulder and little care for the devastation he creates. The book goes public domain in 3 days.

Posted in publicdomain | Tagged | Leave a comment

“The havoc of this nicety… on the work of imagination”

While Lady Chatterley’s Lover was eventually published intact, Djuna Barnes’s first novel Ryder never was. Barnes replaced passages she was forced to cut with asterisks, “showing plainly”, she wrote, “where the war, so blindly waged on the written word, has left its mark”. After the original text was lost in World War II, the cuts were never restored. Still, Len Gutkin calls what remains an “utterly sui generis 1928 comic novel“. It joins the public domain in 4 days.

Posted in publicdomain | Tagged | Leave a comment

Once freed of obscenity charges, soon freed of copyright claims

One of the longest-running “Can we publish this?” questions in literature concerns Lady Chatterley’s Lover. D. H. Lawrence first issued his erotic novel in Italy in 1928. The UK and US banned it, and also stymied its copyright. Later court rulings allowed uncensored (and unlicensed) editions. Recently international treaties gave Lawrence’s heirs an opening to reclaim US copyright on the 1928 edition (which my library owns). In 5 days, that claim (valid or not) expires.

Posted in publicdomain | Tagged | Leave a comment

New life for a century-old African American opera

Harlem Renaissance composer Harry Lawrence Freeman wrote his opera Voodoo in 1914. In 1928 he registered its copyright, and had it first performed on a radio program, and then staged in New York with a full orchestra.

Obscurity followed. Voodoo was never published, and not performed again until 2015, in a well-reviewed concert at Columbia University, which has Freeman’s scores and papers. Anyone anywhere can take it up freely when it joins the public domain in 6 days.

Posted in publicdomain | Tagged | Leave a comment

“On Christmas night all Christians sing”

1928 was the annus mirabilis of the carol, particularly the Christmas carol,” write Jeremy Summerly and John Francis, not least for The Oxford Book of Carols. Instead of severe chant or sentimental Victoriana, editors Percy Dearmer, Martin Shaw, and Ralph Vaughan Williams aimed to include the best of English folk carol tradition in this collection of over 200 songs. The original 1928 edition, still sung from today, joins the US public domain on the octave of Christmas.

Posted in publicdomain | Tagged | Leave a comment

A public domain gift for difficult journeys

The US copyright system often makes it hard to determine the end of an artwork’s copyright. But we know it ends soon for N. C. Wyeth’s painting “Mary Rode on Thistles… and Joseph Waded the Stream Below”, which the Brandywine Museum of Art shows on its website. Wyeth’s painting was used for the frontispiece and dust jacket of Henry Van Dyke‘s 1928 book Even Unto Bethlehem: The Story of Christmas. The painting and the book both join the US public domain in 8 days.

Posted in publicdomain | Tagged | Leave a comment

A film that took audiences for a ride

The first all-talking feature wasn’t a prestige film, and initially wasn’t even meant to be a feature film, but the gangster movie Lights of New York grew in the making, and its box office success helped convince studios to completely replace silent films with “talkies”. Online, you can peruse Warner Brothers’ newspaper-like pressbook promoting the movie, and read film historian J. B. Kaufman on how its production broke new ground. It joins the public domain in 9 days.

Posted in publicdomain | Tagged | Leave a comment