Advanced Course Spotlight: Behind the Data: Humans and Values
February 16, 2016
Students in the Master of Information and Data Science (MIDS) program at the UC Berkeley School of Information can elect to add the advanced course Behind the Data: Humans and Valuesto their course of study. The course examines legal, policy, and ethical issues that arise throughout the life cycle of data science from collection and storage to processing, analysis, and use. In this elective, MIDS students can expect to master ethical and legal frameworks, policy analysis, and oral and written presentation.
We asked our students to reflect on the topics covered in this course and how the course has impacted the development of their data science skills. Here are a few of their responses:
This course expands your ability to think about the relationship between data and people. I believe this can lead to better, more meaningful analyses and products. It’s important to understand how complicated issues like data privacy and security actually are, and to learn about how data rights are viewed differently across borders, in ways that have real implications for businesses. For example, consider the legacy of genetic data: If you make your genome publicly available, could health insurance companies discriminate against your kids because your genes are more likely to develop cancer? Because we’re still developing these concepts, it becomes more important for data scientists to learn about them now while the industry is young. People are still beginning to understand the implications of things like data ethics and privacy. It’s more complicated than it sounds, and unlike, say, war or human rights, we haven’t had a thousand years of practice to learn what is “right” versus what is “wrong.” How we handle these situations matter, and this course provides the opportunity to talk about these and other issues with exceptional classmates.
This course forces you to examine the role of information, and your role as a data scientist, from a more holistic point of view. You are tasked with understanding how your decisions as a data scientist affect the conclusions you make, as well as issues related to information privacy. It’s not just about the stats or computer science but rather about understanding the human element of it as well. This is what this class teaches you — decision makers need to be aware of the technological and policy decisions they make, as data and technology are tools that promote both inclusion and exclusion. We need to be aware of all the biases we introduce in our work and how these affect our conclusions and the policies and decisions that follow. My rationale is simple: There are people behind all of these numbers. Whether you are working for the NSA on national security or at a hedge fund, responsibility and care is needed at all times.
The course serves as a constant reminder of the simple fact that there are few, if any, clear issues when using data, especially when it relates directly to human behavior. This course was one of the reasons that I applied to the MIDS program. The ramifications of a specific study or publication are impossible to determine completely, more so with the persistency of the web, and so the best approach is always an honest and responsible attitude. I believe the MIDS program serves as a wonderful incubator for these discussions, since the program brings together students from a wide array of backgrounds and experiences.
This is a great course that broadens the competency and value of a data scientist beyond pure technical skills. Most data scientists are not exposed to this content formally, so it raised the awareness of many issues we tend to take for granted, such as privacy and understanding the intended use of data that we provide to companies. How data is collected, managed, and used are key considerations for any data scientist. Understanding privacy, de-identification, and how to manage the rights to data and its intended use will only become more important in the future. Clearly there is more sensitivity when there are people involved in data collection and the use of that data for cases such as health, insurance, and employment. Given that today’s competition hinges on the ability to exploit data, I would say this course becomes even more important as its applicability is vast.