I’ve been happy to hear from a number of people and institutions interested in the IMLS-funded project we now have underway to shed light on the hidden public domain of 20th century serials that I discussed in my last post. I gave a short 6-minute presentation on the project at this fall’s Digital Library Federation Forum, and you can find my presentation slides and script on the Open Science Framework. (You can download either a Powerpoint file or a PDF as you prefer; both have the slides and the notes.)
With the help of Alison Miner and Carly Sewell, I’m now starting to add data to the inventory of serials that is one of the deliverables of the project. Right now, we’re putting up data from 1966 renewals, where serial contributions from 1938 and 1939 were renewed. But there’s quite a bit more data in the pipeline, and I hope that we’ll have all of the 1930s covered by the end of this month, and then advance rapidly into the 1940s in the new year. (Our goal is to get up to 1978’s renewals, which will finish off the 1940s and get to 1950, and from there on the Copyright Office’s online database can be consulted for serial renewals. We’re aiming to have that completed sometime in the spring.)
I’ve heard from various people who are interested in clearing copyrights for their own serials digitization projects– as well as some projects that are doing it already, like the Hevelin Fanzines project that was also discussed at the DLF Forum. As I mentioned in my previous post, we intend to publish suggested procedures for doing such copyright-clearing. We’ll be preparing drafts of such procedures in the new year, and we’ll let interested folks know when such drafts are available for comments and suggestions.
We’re also happy to hear suggestions about other aspects of the project. One suggestion we heard in an early presentation was the inventory should be shared as downloadable structured data, and not just as a big web page, so that it would be easier for people to repurpose and automatically analyze the data or various purposes. That sounded like a good idea to me, and I got more excited about its possibilities when looking at all the work that people were doing with projects like the FictionMags project, ISFDB and similar projects, where volunteers have crowd-compiled detailed structured contents information for a large number of serials.
So the new data going into the inventory is going in as structured data files, and we’re also slowly refitting existing entries into such files. That’s meant a bit of a slower startup than we’d originally planned, but we believe this work will pay dividends in the future. Already it means that we can reuse the same data in multiple contexts– for instance, we can show first-renewal information for Adventure magazine on the Online Books Page issue listings, on a copyright information page, and in the big inventory page. Updating the underlying structured data file can change what appears in all of these contexts.
Moreover, data structures are expandable. When readers asked me to list Amazing Stories and Galaxy science fiction magazines on The Online Books Page, I had to determine which parts of their runs were public domain and thus could be listed without any further inquiries into permission. I looked up copyright renewals for these magazines and then recorded renewal data in the same sorts of structured data files so it could be reused. (Here, for instance, is the automatically generated copyright information page for Amazing Stories.) I also added structured data fields that allowed the inclusion of name identifiers, in particular, Library of Congress Name Authorities, so that authors could be consistently identified, and then linked to other information about them, such as contact information for permissions.) With these links, and with the links to full issue tables of contents compiled by other projects, it becomes easier to digitize nearly any story in the early years of the magazine, by checking to see whether there is still an active copyright on it, and by sending an inquiry for permissions if there is one.
To be clear: I’m not going to compile lists of all renewals for all the serials in my inventory. That would take more time than I have– the scope of my 1-year grant only covers the first renewals for the serials that have them, up to 1950. But if I can create more detailed renewal lists for Amazing Stories using the defined structure, then others who are interested could create similarly structured renewal lists for other serials. And maybe those lists could be linked, shared or distributed from my inventory as well, if there’s interest.
So before I get much further into the project, I’d like to hear from folks who might be interested in using or compiling this sort of detailed renewal information. Is this sort of structured information useful to you? And if so, will the format and structure I’ve defined for the data files work? Or should it change (something that’s easier to do now than later), or be augmented?
I didn’t find any pre-existing schema that covered detailed article-level copyright renewal data, so I decided to roll my own for starters. There’s a variety of encoding schemes one could use for it, including XML, JSON, and the various RDF formats. I figured JSON would be easiest for laypeople and librarians to understand and edit in its native format, and it can be automatically translated into suitable XML or RDF schemes if desired. But if you know of good reasons for preferring a different native format, or know of schemes I should reuse or extend instead of reinventing the wheel, I’d be interested in hearing about them. (I’m especially interested if the format or schema is already in common use by the sorts of folks who compile serial contents information.) Alternatively, if what I have now is as good a starting point as anything else, I’d be happy to know that, and could then take the time to write up formal documentation for it.
To see the existing files I have, you can go to the big inventory page and follow the “More details” links that you’ll see for certain serials in the list. These lead to copyright information pages for the serials in question, which in turn have links to JSON files at the bottom of each page. I also have most of the JSON files in a Github folder that’s part of my Online Books Page project repository.
If you work with this sort of metadata, or would like to, I’d love to hear from you. If we get this right, I hope this data will spark all kinds of useful work opening access to a wide variety of 20th-century serial publications.