Everybody's Libraries

June 20, 2010

How we talk about the president: A quick exploration in Google Books

Filed under: data,online books,sharing — John Mark Ockerbloom @ 10:28 pm

On The Online Books Page, I’ve been indexing a collection of memorial sermons on President Abraham Lincoln, all published shortly after his assassination, and digitized by Emory University.  Looking through them, I was struck by how often Lincoln was referred to as “our Chief Magistrate”.  That’s a term you don’t hear much nowadays, but was once much more common. Lincoln himself used the term in his first inaugural address, and he was far from the first person to do so.

Nowadays you’re more likely to hear the president referred to in different terms, with somewhat different connotations, such as “chief executive” or “commander-in-chief”.  The Constitution uses that last term in reference to the president’s command over the Armed Forces. Lately, though, I’ve heard “commander in chief” used as if it referred to the country in general.  As someone wary of the expansion of executive power in recent years, I find that usage unsettling.

I wondered, as I went through the Emory collection, whether the terms we use for the president reflect shifts in the role he has played over American history.  Is he called “commander in chief” more in times of war or military buildup, for instance?  How often was he instead called “chief magistrate” or “chief executive” over the course of American history?  And how long did “chief magistrate” stay in common use, and what replaced it?

Not too long ago, those questions would have simply remained idle curiosity.  Perhaps, if I’d had the time and patience, I could have painstakingly compiled a small selection of representative writings from various points in US history, read through them, and tried to draw conclusions from them.  But now I– and anyone else on the web– also have a big searchable, dated corpus of text to query: the Google Books collection.  Could that give me any insight into my questions?

It looks like it can, and without too much expenditure of time.  I’m by no means an expert on corpus analysis, but in a couple of hours of work, I was able to assemble promising-looking data that turned up some unexpected (but plausible) results.  Below, I’ll describe what I did and what I found out.

I started out by going to the advanced book search for Google.  From there, I specified particular decades for publications over the last 200 years: 1810-1819, 1820-1829, and so on up to 2000-2009.  For each decade, I recorded how many hits Google reported for the phrases “chief magistrate”, “chief executive”, and “commander in chief”, in volumes that also contained the word “president”.  Because the scope of Google’s collection may vary in different decades, I also recorded the total number of volumes in each decade containing the word “president”.  I then divided the number of phrase+”president” hits by the number of “president” hits, and graphed the proportional occurrences of each phrase in each decade.

The graph below shows the results.  The blue line tracks “chief magistrate”, the orange line tracks “chief executive”, and the green line tracks “commander in chief”.  The numbers in the horizontal axis refer to the decade+1800s; e.g. 1 is the 1810s, 2 is 1820s, all the way up to 20 being the 2000s.

Relative frequencies of "chief magistrate", "chief executive", and "commander in chief" used along with "president", by decade, 1810s-2000s

You can see a larger view of this graph, and the other graphs in this post, by clicking on it.

The graph suggests that “chief magistrate” was popular in the 19th century, peaking in the 1830s.  “Chief executive” arose from obscurity in the late 19th century,  overtook “chief magistrate” in the early 20th century, and then became widely used, apparently peaking in the 1980s.  (Though by then some corporate executives– think “chief executive officer”– are in the result set along with the US president.)

We don’t see a strong trend with “commander in chief”.  There are some peaks in usage in the 1830s, and the 1960s and 1970s, but they’re not dominant, and they don’t obviously correspond to any particular set of events.  What’s going on?  Was I just imagining a relation between its usage and military buildups?  Is the Google data skewed somehow?  Or is something else going on?

It’s true that the Google corpus is imperfect, as I and others have noted before.  The metadata isn’t always accurate; the number of reported hits is approximate when more than 100 or so, and the mix of volumes in Google’s corpus varies in different time periods.  (For instance, recent years of the corpus may include more magazine content than earlier years; and reprints can make texts reappear decades after they were actually written.  The rise of print-on-demand scans of old public-domain books in the 2000s may be partly responsible for the uptick in “chief magistrate” that decade, for instance.)

But I might also not be looking at the right data.  There are lots of reasons to mention “commander-in-chief” at various times.  The apparent trend that concerned me, though, was the use of “commander in chief” as an all-encompassing term.  Searching for the phrase “our commander in chief” with “president” might be better at identifying that. That search doesn’t distinguish military from civilian uses of that phrase, but an uptick in usage would indicate either a greater military presence in the published record, or a more militarized view among civilians.  So either way, it should reflect a more militaristic view of the president’s role.

Indeed, when I graph the relative occurrences of “our commander in chief” over time, the trend line looks rather different than before.  Here it is below, with the decades labeled the same way as in the first graph:

Scaled frequency of "Our commander in chief" used along with "President", by decade

Scaled frequency of "our commander in chief" used along with "president", by decade, 1810s-2000s

Here we see increases in decades that saw major wars, including 1812, the Mexican war of the 1840s, the civil war of the 1860s, and the Vietnam war expanding in the 1970s.  This past decade had the second most-frequent usage (by a small margin) of “our commander in chief” in the last 200 years of this corpus.  But it’s dwarfed by the use during the 1940s, when Americans fought in World War II.  That’s not something I’d expected, but given the total mobilization that occurred between 1941 and 1945, it makes sense.

If we look more closely at the frequency of “our commander in chief” in the last 20 years, we also find interesting results. The graph below looks at 1991 through 2009 (add 1990 to each number on the horizontal axis; and as always, click on the image for a closer look):

Scaled frequency of "our commander in chief" used along with "president", by year, 1991-2009

Not too surprisingly, after the successful Gulf War in early 1991, usage starts to decrease.  And not long after 9/11, usage increases notably, and stays high in the years to follow.  (Books take some time to go from manuscript to publication, but we see a local high by 2002, and higher usage in most of the subsequent years.)  I was a bit surprised, though, to find an initial spike in usage in 1999.  As seen in this timeline, Bill Clinton’s impeachment and trial took place in late 1998 and early 1999, and a number of the hits during this time period are in the context of questioning Clinton’s fitness to be “our commander in chief” in the light of the Lewinsky scandal.  But once public interest moved on to the 2000 elections, in which Clinton was not a candidate, usage dropped off again until the 9/11 attacks and the wars that followed.

I don’t want to oversell the importance of these searches.  Google Books search is a crude instrument for literary analysis, and I’m still a novice at corpus analysis (and at generating Excel graphs).  But my searches suggest that the corpus can be a useful tool for identifying and tracking large-scale trends in certain kinds of expression.  It’s not a substitute for the close reading that most humanities scholarship requires.  And even with the “distant reading” of searches, you still need to look at a sampling of your results to make sure you understand what you’re finding, and aren’t copying down numbers blindly.

But with those caveats, the Google Books corpus supports an enlightening high-altitude perspective on literature and culture.  The corpus is valuable not just for its size and searchability, but also for its public accessibility.  When I report on an experiment like this, anyone else who wants to can double-check my results, or try some followup searches of their own.  (Exact numbers will naturally shift somewhat over time as more volumes get added to the corpus.)  To the extent that searching, snippets, and text are open to all, Google Books can be everybody’s literary research laboratory.

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 76 other followers

%d bloggers like this: