The Language of Data: Analyzing the State of the Union

Each president’s State of the Union address is an attempt to set the tone for his term in office — what are the administration’s goals, plans and promises? Those questions dominate the 24-hour news cycle around this yearly address to Congress.

Nearly 50 million people tuned in to watch President Obama’s first State of the Union address in 2010. President Trump’s speech last year generated more than 4.5 million tweets, the most ever for a presidential address.

It’s been more than a century since President Woodrow Wilson’s first State of the Union address, and while the language presidents use to communicate has evolved, many of the words, phrases, and ideas have stood the test of time. 

Comparing those words, over time, is one way to hold presidencies up against each other, to assess the ways in which presidential communication has changed. How can we objectively compare one State of the Union address to another? Data analysis can be used to transform otherwise dry, static information into comparable statistics that show trends across words, presidents, and years.

Reading Levels

One way to compare speeches is to assess their reading levels using the Flesch-Kincaid grade level test. The Flesch-Kincaid readability tests are designed to indicate how difficult a passage is to understand.

Read the text-only version of Reading Level of State of the Union Addresses.

State of the Union Word Analysis

The data from speeches can be used to compare more than just overall values like reading level. For example, it’s easy to predict that the economy will come up in a State of the Union address. But data shows that presidents tend to talk about taxes more than jobs, and jobs more than banks. Also, among presidents since Wilson, Harry Truman talked about the economy the most — 2.9 percent of his speech was made up of words like “business,” “debt,” and “dollar.”

Read the text-only version of State of the: Economy.

Truman is the only president to say “refugees” in his first State of the Union, and Reagan was the first president to say “terrorism” in his.

Read the text-only version of State of the: Policy.

President Trump and President George W. Bush are the only ones to mention Guantanamo in their first State of the Union addresses.

Read the text-only version of State of the: Military.

Words like “I”, “me”, and “us” make up a larger percentage of each address, but it varies. For example, President Ronald Reagan said “together” more than anyone else — it was 0.29 percent of his address.

Read the text-only version of State of the: People.

President Trump said “very” almost twice as frequently as any other president in their first State of the Union address, while President Nixon said “great” more than anyone else.

Read the text-only version of State of the: Superlatives.

How often is the rest of the world mentioned in this early-term address? Countries like Libya and Palestine have only ever been mentioned once, while presidents have said “Europe” 54 times in total.

Read the text-only version of State of the: World.

Overall, education isn’t talked about as frequently. Of the words listed, President Roosevelt only said “education” once, and Wilson mentioned “schools” just one time as well.

Read the text-only version of State of the: Schools.

View methodology.

Citation for this content: datascience@berkeley, the online masters in data science from UC Berkeley