Diverse Data Science Reading Recommendations for Students, Professionals and Beginners

person looking through a stack of books at a library

The field of data science covers all kinds of ground, ranging from hyper-technical model building to philosophical and ethical questions regarding privacy and bias. The industry’s literature covers much of that same ground.

Featuring recommendations from datascience@berkeley faculty, this collection covers topics such as how technology can reinforce discrimination, how data can exclude groups and amplify bias, and the tradeoffs between usefulness and transparency.

The investigations and strategies found in these books can be a useful supplement to anyone working with data, including beginners just starting out in data science, professionals with decades of experience, and everyone in between.

Use the links below to navigate to the different sections:

Big Data and Statistics
Data, Racism, and Inequality
Women and Data
Privacy and Data Ethics
Data and Business

Books About Big Data and Statistics

The Data Detective: Ten Easy Rules to Make Sense of Statistics by Tim Harford

Faculty Pick

Hartford untangles the complicated world of statistics with 10 strategies for interpreting data in a way that addresses biases and knowledge gaps.

“A great text on how statistics and data science can help decision making. In particular, focus on Rule Six: Ask Who Is Missing. Oftentimes, folks are missing from data in a way that systematically excludes their lived experiences.” – Michael Rivera, assistant professor of practice, datascience@berkeley

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz

For fans of The Signal and the Noise by Nate Silver and Freakonomics by Stephen J. Dubner and Steven Levitt, Everybody Lies offers surprising data-informed insights into the economy, sports, gender and more.

“The empirical findings in Everybody Lies are so intriguing that the book would be a page-turner even if it were structured as a mere laundry list. But Mr. Stephens-Davidowitz also puts forward a deft argument: The web will [revolutionize] social science just as the microscope and telescope transformed the natural sciences.” – How to Find Out What People Really Think,” The Economist

Image via Wiley

Naked Statistics: Stripping the Dread from the Data by Charles Wheelan

Wheelan describes key statistical concepts such as inference, correlation, and regression analysis using pop culture examples and non-technical language.

“While a great measure of the book’s appeal comes from Mr. Wheelan’s fluent style — a natural comedian, he is truly the Dave Barry of the coin toss set — the rest comes from his multiple real world examples illustrating exactly why even the most reluctant mathophobe is well advised to achieve a personal understanding of the statistical underpinnings of life.” – A Crash Course in Playing the Numbers,” The New York Times

Back to top

Books About Data, Racism, and Inequality

Algorithms of Oppression: How Search Engines Reinforce Racism by Safiya Umoja Noble

Faculty Pick

Noble examines data discrimination among search engines, such as Google, where biased algorithms privilege whiteness and punish women of color.

“This is a great text on how online spaces reinforce, and magnify, racism and misogyny.” – Michael Rivera

Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor by Virginia Eubanks

Eubanks investigates how data mining, policy algorithms, and predictive risk modeling disproportionately hurt the poor and working class, as resources are both taken and given based on statistical profiles.

Automating Inequality is riveting (an accomplishment for a book on technology and policy). Its argument should be widely circulated, to poor people, social service workers, and policymakers, but also throughout the professional classes. Everyone needs to understand that technology is no substitute for justice.” – “How Big Data Is ‘Automating Inequality,’” The New York Times

Race After Technology: Abolitionist Tools for the New Jim Code by Ruha Benjamin

Benjamin explores automation as a powerful propeller for discrimination and White supremacy. Her concept, “The New Jim Code,” and accompanying guidance explore how discriminatory design is deepening social inequities and how to decode tech’s promises.

“This book is worthy of the widest readership, leaving us not only with a deeper understanding of the mutual and shifting roles of race and technology, but also, importantly, with the manageable and doable tools with which to create alternative, equitable, inclusive, and prosperous futures.” – “Domesticating the Techno-Racial Project,” Nature Machine Technology

Back to top

Books About Women and Data

Invisible Women: Exposing Data Bias in a World Designed for Men by Caroline Criado Perez

Perez examines how data that fails to take gender into account is amplifying bias and discrimination against women in our policy, healthcare, and education decisions.

“Essential reading for people of ALL genders and from all walks of life, and will likely affect how you think about the world, and about how women fit into it.” – “Invisible Women: Exposing Data Bias in a World Designed for Men,” Forbes

Image via The MIT Press

Data Feminism by Catherine D’Ignazio and Lauren F. Klein

An explanation of how data science can be used to eliminate pervasive biases and improve outcomes for those often hurt by discriminatory data, Data Feminism provides an intersectional guide to using feminism and data as tools toward justice.

“Anyone who works with data — and all scientists do, of course — will benefit from reading this book. But the readers who may gain the most from it are those who are trying to use data in the public interest. Data Feminism does such a good job of integrating theories and projects across several fields that it will likely become a touchstone for teaching data science that goes beyond data ethics.” – “Using Data to End Oppression,” American Scientist

Brotopia: Breaking Up the Boys’ Club of Silicon Valley by Emily Chang

Chang’s exposé of the “bro” culture among venture capital firms and tech companies is less specific to data science but reveals recurring experiences for women in male-dominated workplaces.

“…Brotopia is more than a business book. Silicon Valley holds extraordinary power over our present lives as well as whatever utopia (or nightmare) might come next. ‘If robots are going to run the world, or at the very least play a hugely critical role in our future, men shouldn’t be programming them alone,’ Chang writes. ‘The scarcity of women in an industry that is so forcefully reshaping our culture simply cannot be allowed to stand.’” – “In ‘Brotopia,’ Silicon Valley Disrupts Everything but the Boys’ Club,” The New York Times

Back to top

Books About Privacy and Data Ethics

The Ethical Algorithm: The Science of Socially Aware Algorithm Design by Michael Kearns and Aaron Roth

Faculty Pick

This set of solutions is based on the emerging science of socially aware algorithm design for the increasingly common privacy concerns and violations of basic rights caused by overreaching technology.

“This is a great introduction for folks just starting out in data science because it’s suitable to a general audience without [dumbing] down the technical aspects, which is a difficult balance to strike. The authors discuss the inevitable trade-offs between fairness, privacy, transparency, and usefulness when it comes to algorithmic decision making and present technical solutions on how to optimize these seemingly opposing forces at the same time.” – Kyle Hamilton, lecturer, datascience@berkeley

Image via Amazon

The Algorithmic Foundations of Differential Privacy by Cynthia Dwork and Aaron Roth

Faculty Pick

Dwork and Roth examine privacy-preserving data analysis and include an introduction to the problems and techniques of differential privacy.*

“If you are getting into the field of statistical privacy engineering, this would be a great start. The book builds its way from the basic terms and definitions into designing differentially private mechanisms and algorithms in an approachable manner.” – Daniel Aranki, assistant professor of practice, datascience@berkeley, and executive director, Berkeley Telemonitoring Project

*The Algorithmic Foundations of Differential Privacy is available for free (PDF, 1.3 MB) at the International Association of Privacy Professionals web site.

The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power by Shoshana Zuboff

Faculty Pick

Zuboff explains the concept of surveillance capitalism as “a new economic order that claims human experience as free raw material for hidden commercial practices of extraction, prediction, and sales.”

“Zuboff’s expansive, erudite, deeply researched exploration of digital futures elucidates the norms and hidden terminal goals of information-intensive industries. Zuboff’s book is the information industry’s Silent Spring.” – Chris Hoofnagle, professor of practice, cybersecurity@berkeley

Back to top

Books About Data and Business

Product-Led Growth: How to Build a Product That Sells Itself by Wes Bush

Faculty Pick

Real-life examples, email scripts, and answers to some of the most persistent business decisions in product marketing are all available in the first part of Bush’s Product-Led series.

“There is a lot of talk about product-led growth in the industry right now… As you [may] suspect, [it] requires analyzing and understanding data.” – Joyce Shen, lecturer, datascience@berkeley

Storytelling with Data: Let’s Practice! by Cole Nussbaumer Knaflic

A thorough guide to the fundamentals of data visualization and communicating information with data, Nussbaumer Knaflic’s book and accompanying exercises explain how to direct the audience’s attention, eliminate clutter, and more.

“Intended for anyone committed to improving their ability to communicate data and complemented by a web site that enables users to further hone their skills, this book is written in a fun, friendly and accessible manner and will be highly appreciated by visual learners and creative data-minded individuals.” – “Book Review: Storytelling with Data: Let’s Practice! by Cole Nussbaumer Knaflic,” LSE Review of Books

Data Smart: Using Data Science to Transform Information into Insight by John W. Foreman

This book includes nine tutorials on data science techniques such as linear programming, Naïve Bayes classification, and outlier detection using Excel spreadsheets.

“This book is set apart from many of the data mining books because of its hands-on exercises and the way the author uses those exercises to describe certain techniques and practices used in data science. The first chapter provides a primer on using Microsoft Excel because the exercises in the book use the spreadsheet.” – “Book Review: ‘Data Smart’ by John W. Foreman,” Seattle PI/BlogCritics

Back to top

Citation for this content: datascience@berkeley, the online Master of Information and Data Science from UC Berkeley