Diverse Data Science Reading Recommendations for Students, Professionals and Beginners
April 12, 2022
The field of data science covers all kinds of ground, ranging from hyper-technical model building to philosophical and ethical questions regarding privacy and bias. The industry’s literature covers much of that same ground.
Featuring recommendations from datascience@berkeley faculty, this collection covers topics such as how technology can reinforce discrimination, how data can exclude groups and amplify bias, and the tradeoffs between usefulness and transparency.
The investigations and strategies found in these books can be a useful supplement to anyone working with data, including beginners just starting out in data science, professionals with decades of experience, and everyone in between.
Use the links below to navigate to the different sections:
For fans of The Signal and the Noise by Nate Silver and Freakonomics by Stephen J. Dubner and Steven Levitt, Everybody Lies offers surprising data-informed insights into the economy, sports, gender and more.
“The empirical findings in Everybody Lies are so intriguing that the book would be a page-turner even if it were structured as a mere laundry list. But Mr. Stephens-Davidowitz also puts forward a deft argument: The web will [revolutionize] social science just as the microscope and telescope transformed the natural sciences.” – “How to Find Out What People Really Think,” The Economist
Wheelan describes key statistical concepts such as inference, correlation, and regression analysis using pop culture examples and non-technical language.
“While a great measure of the book’s appeal comes from Mr. Wheelan’s fluent style — a natural comedian, he is truly the Dave Barry of the coin toss set — the rest comes from his multiple real world examples illustrating exactly why even the most reluctant mathophobe is well advised to achieve a personal understanding of the statistical underpinnings of life.” – “A Crash Course in Playing the Numbers,” The New York Times
Eubanks investigates how data mining, policy algorithms, and predictive risk modeling disproportionately hurt the poor and working class, as resources are both taken and given based on statistical profiles.
“Automating Inequality is riveting (an accomplishment for a book on technology and policy). Its argument should be widely circulated, to poor people, social service workers, and policymakers, but also throughout the professional classes. Everyone needs to understand that technology is no substitute for justice.” – “How Big Data Is ‘Automating Inequality,’” The New York Times
Benjamin explores automation as a powerful propeller for discrimination and White supremacy. Her concept, “The New Jim Code,” and accompanying guidance explore how discriminatory design is deepening social inequities and how to decode tech’s promises.
“This book is worthy of the widest readership, leaving us not only with a deeper understanding of the mutual and shifting roles of race and technology, but also, importantly, with the manageable and doable tools with which to create alternative, equitable, inclusive, and prosperous futures.” – “Domesticating the Techno-Racial Project,” Nature Machine Technology
An explanation of how data science can be used to eliminate pervasive biases and improve outcomes for those often hurt by discriminatory data, Data Feminism provides an intersectional guide to using feminism and data as tools toward justice.
“Anyone who works with data — and all scientists do, of course — will benefit from reading this book. But the readers who may gain the most from it are those who are trying to use data in the public interest. Data Feminism does such a good job of integrating theories and projects across several fields that it will likely become a touchstone for teaching data science that goes beyond data ethics.” – “Using Data to End Oppression,” American Scientist
Chang’s exposé of the “bro” culture among venture capital firms and tech companies is less specific to data science but reveals recurring experiences for women in male-dominated workplaces.
“…Brotopia is more than a business book. Silicon Valley holds extraordinary power over our present lives as well as whatever utopia (or nightmare) might come next. ‘If robots are going to run the world, or at the very least play a hugely critical role in our future, men shouldn’t be programming them alone,’ Chang writes. ‘The scarcity of women in an industry that is so forcefully reshaping our culture simply cannot be allowed to stand.’” – “In ‘Brotopia,’ Silicon Valley Disrupts Everything but the Boys’ Club,” The New York Times
This set of solutions is based on the emerging science of socially aware algorithm design for the increasingly common privacy concerns and violations of basic rights caused by overreaching technology.
“This is a great introduction for folks just starting out in data science because it’s suitable to a general audience without [dumbing] down the technical aspects, which is a difficult balance to strike. The authors discuss the inevitable trade-offs between fairness, privacy, transparency, and usefulness when it comes to algorithmic decision making and present technical solutions on how to optimize these seemingly opposing forces at the same time.” – Kyle Hamilton, lecturer, datascience@berkeley
A thorough guide to the fundamentals of data visualization and communicating information with data, Nussbaumer Knaflic’s book and accompanying exercises explain how to direct the audience’s attention, eliminate clutter, and more.
This book includes nine tutorials on data science techniques such as linear programming, Naïve Bayes classification, and outlier detection using Excel spreadsheets.
“This book is set apart from many of the data mining books because of its hands-on exercises and the way the author uses those exercises to describe certain techniques and practices used in data science. The first chapter provides a primer on using Microsoft Excel because the exercises in the book use the spreadsheet.” – “Book Review: ‘Data Smart’ by John W. Foreman,” Seattle PI/BlogCritics