Erez Lieberman Aiden and Jean-Baptiste Michel: What We Learned From 5 Million Books

Google has digitized 15 million books, representing roughly 12 percent of the 129 million books that have ever been published. According to Harvard researchers Erez Lieberman Aiden and Jean-Baptiste Michel’s TED Talk: What We Learned From 5 Million Books, the useful information contained in these 15 million books can be condensed to 5 million books. From these 5 million books, the researchers and their team have created a table of 2 billion phrases (or n-grams) and are using it to study how cultural trends change over time.

Aiden and Michel are fellows at the Harvard Society of Fellows and Visiting Faculty at Google. Their research focuses on what they refer to as culturomics, which they define as “the application of massive scale data collection and analysis to the study of human culture.” Their work has been featured in Science, Nature, The New York Times, the Boston Globe, Wired and a variety of other venues.

Aiden and Michel demonstrate the Google Ngram Viewer, a phrase-usage graphing tool that takes n-gram data and creates graphs. For example, a graph illustrates how use of the word “influenza” has spiked during periods when flu epidemics are known to have occurred. According to Michel, digitizing the historical record will transform our understanding of history, language, and culture.