The Language of Data: Analyzing the State of the Union
January 28, 2019
Each president’s State of the Union address is an attempt to set the tone for his term in office — what are the administration’s goals, plans and promises? Those questions dominate the 24-hour news cycle around this yearly address to Congress.
It’s been more than a century since President Woodrow Wilson’s first State of the Union address, and while the language presidents use to communicate has evolved, many of the words, phrases, and ideas have stood the test of time.
Comparing those words, over time, is one way to hold presidencies up against each other, to assess the ways in which presidential communication has changed. How can we objectively compare one State of the Union address to another? Data analysis can be used to transform otherwise dry, static information into comparable statistics that show trends across words, presidents, and years.
One way to compare speeches is to assess their reading levels using the Flesch-Kincaid grade level test. The Flesch-Kincaid readability tests are designed to indicate how difficult a passage is to understand.
The data from speeches can be used to compare more than just overall values like reading level. For example, it’s easy to predict that the economy will come up in a State of the Union address. But data shows that presidents tend to talk about taxes more than jobs, and jobs more than banks. Also, among presidents since Wilson, Harry Truman talked about the economy the most — 2.9 percent of his speech was made up of words like “business,” “debt,” and “dollar.”
Words like “I”, “me”, and “us” make up a larger percentage of each address, but it varies. For example, President Ronald Reagan said “together” more than anyone else — it was 0.29 percent of his address.