March 30, 2009

How to find complete multi-volume works in Google Books

While Google’s agreement on copyrighted books has been the subject of much discussion lately, they’ve also been continuing to add public domain titles at a brisk pace.  For instance, they announced in February that they now had 1.5 million public domain volumes formatted for mobile devices.  And last week, they noted that they had completed their scans of hundreds of thousands of volumes of 19th century public domain books from Oxford’s Bodleian library.

If you look at the three example book links in their Oxford post, you’ll notice that each of them goes to a volume of a multi-volume edition.   Works from the nineteenth century and before were often originally published in multiple volumes, such as the “three-decker” format common for Victorian novels.  When such books are reprinted today, they’re usually printed as a single volume, but to read all of many Google titles, you’ll have to range over multiple volumes.

Unfortunately, as various readers have noted, it can be quite difficult to find readable copies of all of the volumes in a multi-volume edition.  For various reasons, they often don’t all come up when you do a search for a particular title.  This can make readers think there are no complete digital editions of a work they’re seeking, even when there are.

In working with people who have helped me fill requests for public domain books, I’ve compiled a series of techniques for finding complete multi-volume sets in Google Books.  I’d be happy to hear additional tips from readers.

  • First, do a search for full-view volumes of the work you’re looking for.  One good way to do this is to go to Google’s advanced book search page, select the “full view only” option, and enter author and title words in the appropriate blanks.
  • If you get a hit, check the start and the end of the scan, to verify which volumes are actually present. Sometimes you’ll find more than one volume in the scan, either because multiple volumes were bound together, or because Google combined volumes in its scan.
  • Go to the “about this book” page for the scan, and look in the lower regions to see if there is an “Other editions” section. This often includes links to other volumes, not just other editions. If there’s a “See more” at the bottom of such a section, click on it to see more volumes or editions.  (Sometimes Google will have multiple editions as well as multiple volumes for the same work.  It’s best when possible to compile volumes from the same edition.  You can do this by matching publishers and dates between volumes, though keep in mind that some multivolume editions came out over the course of multiple years.  Editions from different publishers, or from different times, may have inconsistent content, and might not divide into volumes at the same points.)
  • If the book is from the University of Michigan (as reported either in the “about this book” page or in the scanned front pages) check the Mirlyn catalog for the book. Sometimes this will turn up volumes scanned by Google that have been put in the Hathi Trust repository, or in Google Book Search itself, but that for some reason don’t show up in an ordinary Google books search. Some other Hathi Trust libraries also have links to digitizations of their content; see this page for details.
  • If this didn’t turn up all the volumes you’re looking for, repeat the process above for the other volumes in your initial hit list. Sometimes those will have “Other editions” links to additional volumes that didn’t appear with the earlier hits.
  • If you manage to complete a set this way, consider sharing your success with other readers.  If you fill in my book suggestion form with the volumes you find,  I can list a neatly consolidated edition of all the volumes on The Online Books Page, and help other people avoid going through all the trouble you just did.  (Give the book’s title, URL for the first volume, and other information in the appropriate blanks, and then add URLs for subsequent volumes in the “Anything else we should know?” section of the form.)
  • Even if you only partially succeeded, if it’s a work you’re particularly interested in you can use my suggestion form to let me know what you’ve been able to find.  If I can’t easily find the other volumes myself, I can at least list what was found on my works-in-progress page. With luck, someone coming along later will find or digitize the remaining volumes, and I can list the set.

Similar techniques can be used for compiling runs of historic serials, which are also present in Google, and can be of great interest to readers.

If you find these suggestions useful, I hope you’ll help me compile sets of your favorite public domain works, so we can take advantage of all this wonderful old material that Google and others are digitizing.


  1. If you could figure out an easy way to have your list of compiled volumes in a machine-readable form, listing which individual GBS ids and/or HathiTrust ids are part of a single set, I’d find a way make use of this in my software that’s querrying GBS/HathiTrust, to make sure to present the user with link(s) to a complete set, instead of just one volume of a multi-volume work.

    And likewise for serials, to as much of the run as is in Google/Hathi, instead of just the one arbitrary scanned copy I happen to find now.

    This is definitely something that’s problematic in the existing GBS and HT interfaces. It’s painstaking to compile this info, but if you’re doing it by hand anyway, please find a way to make it machine-readable!

    Comment by jrochkind — March 30, 2009 @ 10:53 pm

  2. Well, my catalog can be slurped down via OAI-PMH, in either Dublin Core or MODS formats. See http://onlinebooks.library.upenn.edu/webbin/OAI-onlinebooks?verb=Identify for details.

    The OAI data feeds don’t explicitly identify volume information in a machine-processable format. (I’m not aware of a standard way of doing this in the formats I’m using, but would be happy to hear suggestions. I’d prefer not to resort to significantly more complicated formats if avoidable.) The Dublin Core version of the metadata, though, includes all the URLs in dc:identifier fields. Any DC record that has multiple identifiers that are all Google can be almost always assumed to refer to successive volumes of a multi-volume work; and the GBS ID is embedded right in the URI.

    So you can probably scrape compiled volume data out of my records that way; it’s not the most elegant method, but I believe it will work. I have no corresponding method at present for serial volume ID harvesting short of HTML-scraping their cover pages (though those at least can be identified programmatically in the OAI-PMH feed). I’m happy to entertain suggestions for more elegant methods for harvesting this info for serials and books.

    Comment by John Mark Ockerbloom — March 31, 2009 @ 12:44 pm

  3. Great tutorial. After all, its not that hard to get those free books

    Comment by BromaCleanse — August 28, 2009 @ 3:40 pm

