Ethical Data Science with Cathy O’Neil
<intro music>
Monica Manney
Welcome back to UVA Data Points. I'm your host, Monica Manney. If you haven't already, I'd recommend listening to the trailer episode of data points for a detailed description of the 4 + 1 Model of data science. In this first episode, we'll explore the area of Value from the model. To recap, here's how Raf Alvarado defines this area.
Raf Alvarado
By value. I mean, why are you acquiring data in the first place? Right? What's the business proposition? What's the scientific motivation? What is it in the world that you're interested in studying or affecting, that you're acquiring data for and doing analysis for? So we call that the area of value, because that's where the purpose of working with data comes from. And also it's where data has an influence on the world, where it can either do good or harm. And so that's where ethics comes in. Right?
Monica Manney
So value is focused on the purpose, motivation and impact of working with data. One of the leading experts in this area is Cathy O'Neil.
Cathy O'Neil
My name is Cathy O'Neil, mathematician and data scientist, I also wrote a book called Weapons of Math Destruction. And my newest book is called The Shame Machine: Who Profits in the New Age of Humiliation. And I also run a consulting company that audits algorithms.
Monica Manney
Cathy O'Neil has a gift for explaining complex ideas and informing the public on the ways in which data impacts their lives. So, whether you're a data scientist or not, you'll find value in Cathy’s research and insights. In this episode, Cathy is in conversation with Professor Brian Wright.
Brian Wright
My name is Brian Wright. I'm an assistant professor in the School of Data Science. My background is in Education and Economics. And I'm also the director of undergraduate programs. So I've been mostly building an undergraduate data science program here at UVA.
Monica Manney
Cathy, and Brian's conversation covers the topics of data science education, algorithmic bias, effects of social media, how and why to audit algorithms, and much more. To kick off this conversation. We started by asking Cathy, how she defines the field of data science. Cathy and Brian, take the conversation from there.
Cathy O'Neil
How do I define data science? Well, data science is a craft rather than a science, but it should be a data driven craft. So, when I say craft, what I mean is we have sort of phases of work. So, we at the beginning with a new data science project, you would sort of try to get to know your data by looking at the shape of it and sort of understanding it. It's called exploratory data analysis. Sometimes I will compare it to another craft, which I know well, which is knitting. Sort of like, what kind of yarn do I have? You know, do I have thick yarn that I need big needles? Or do I have thin yarn that needs small needles? Is it silk? Or is it cotton, you know, you need to know your, your material. And then you want to, if you're a data scientist, you want to figure out what your definition of success is, if you're building an algorithm, what kind of predictions are you trying to trying to make? And what is your definition of success? And when we optimize to success, are we going to build in mistakes, if not mistakes, sort of like an unforeseen consequences, you know, unexpected repercussions or feedback loops that are negative. So you sort of have to always measure and be aware of what could happen, what could go wrong, for whom will this fail? Which is my favorite question. Whenever you're building and designing a data project. In my opinion, in order to say you've done a data science project, you have to build the sort of scaffolding, which is a monitoring system that makes sure that in the, in an ongoing sense, things aren't falling off the rails?
Brian Wright
Yeah, I mean, I think having these conversations are really timely, right? So we're building this undergraduate, building many programs at the school, and one of which is kind of this undergraduate path. But there's the new kind of frontier, I think, in data science, education. And even in my classrooms, we talk to the students a lot that we can give them kind of this pipeline mentality. But the hard work associated with data science is usually on the front and the back end, right? The front end is totally understanding the problem that you're trying to address. And I think to these points and the audiences, it could affect thoroughly that's a real thought, kind of qualitative process. And then once things are built, you know, we construct them in this kind of fabricated world, right, but then they live in the real world. And then I say, you know, much, much of the real talent behind being a data scientist is understanding how your algorithm is actually behaving once it's out in the world. And that's the hard part I think about education is actually creating an environment where they can experience that authentically, because it's quite difficult, right to measure data drift and like to have, so we're trying to build so…
Cathy O'Neil
It also like, makes people feel defensive and vulnerable. I will add you know as at a sort of emotional level, what you're really doing, because I sort of developed a data journalism program at Columbia, so it's similar but you know, we're we're starting with people that aren't really considered to them concerning themselves techies, right.
The first half where you're teaching them skills like Python, and you know, Mongo DB and, you know, sure techniques and skills there. It's so exciting because they're like, Oh, now I know how to do this. I know how to do this.Some like developing skills that they can enumerate and quantify.
Brian Wright
I know kung fu.
Cathy O’Neil
I know. Yeah. And then the second half, it's like, but, but sometimes you shouldn't do this. And sometimes you have to realize that you've made assumptions that if you made them differently, you come up with really different answers. And sometimes, even though your data says this is true, it's wrong. And, and the idea of adding all those qualifiers is it's hard for people to take because they're like, Wait, I thought I thought it was good at this. And now you're telling me I'm bad at this? And we're like, No, we're not telling you you’re bad at this. We're telling you. This is hard.
Brian Wright
Yeah. And then it's, it's hard, because there's so many things about it that are as much an art as they are a science. Right? That you have to be, and we talk about at least I think this is kind of the nature, which is building data, data intuition. So you have an intuition about what is going on or like that there are there are things that you can't control and like what does that mean for how you're going to use this algorithm? Or should you use it?
Cathy O’Neil
Yeah, that's why I called it a craft. Yeah. So I'm not an artist, but I am a crafter. Yeah, that's, but I sort of think of it as like, if I, if I were building something with wood that was supposed to be a chair, eventually, I would want to test it all the time, to make sure it's stable. And to make sure it doesn't tip. that would that would mean like getting a bunch of people who are not me and me to like, sit in it in weird ways. And you know, try to tip it over. Likewise, like when you're building a data science project, you want to make sure it's robust. But you also want to understand what you're doing. So like, that's the kind of sensitivity analysis or whatever, however you do it, to make sure that it's it's not tipping.
Brian Wright
Yeah, I love that analogy. That's great. So let's talk about what let's talk about what you're doing right now. So tell us a little bit about just kind of where you're at?
Cathy O’Neil
Well, I'm doing what I call algorithmic auditing now. I started a company called ORCAA that does that.
And to be fair, there's really two different types of algorithmic auditing. There's cooperative algorithmic auditing and adversarial algorithmic auditing. So, like, cooperative by that, I mean, I just people hire me to look into their algorithms at companies, like private companies, and I have to sign an NDA, so I can't even tell you where they are. Some of them are on my website. But usually, to be honest, like, the reason they do that is because they got in trouble. Because they're embarrassed and, and they're like, oh, people already think we're racist, usually it’s racist, accusations of racism. So we might as well like clear our name, if we can, and they hire us to help them or sometimes just like, we don't want to be accused of that. And we think that this is high risk. So we're gonna get audited. And then there are lots of companies that should get their algorithms audited, but they don't want to, and they don't see why they should, you know, they don't have, there's no leverage to make them do it. So there's, these are the ghost clients, if you will, like, like a data scientist will call me up who works at one of these places, and they'll be like, I read your book, Weapons of Math Destruction. I got really worried about what I do here. And I want to talk and I'm like, Okay, let's talk. And it's like a therapy session, because they're like, feeling like, I can't sleep at night. And I'm like, yeah, there's reason for that, because what you're doing is high impact and like, but it's probably problematic. And they often say, like, Well, I'm not allowed to collect race information. So I don't even know whether it's really a problem. I just feels like there's a problem. And I'm like, yeah, there probably is a problem, to be honest. And then the second call, because the first call is always awesome, like, we're feeling really good about this, and we have a way of doing of inferring race, we can talk about that. That's a big part of my job.
Brian Wright
Okay.
Cathy O’Neil
The second call, though, they're the lawyers on the line. And the lawyers are like I don't, we don't want to ask these questions. Because we don't want to know the answer and be using our product and then go to court, and it's in discovery and we’re the next tobacco company, right?
Brian Wright
Right. So how do you deal with that? Where do you go?
Cathy O’Neil
I don't deal with that.
Brian Wright
You just don’t.
Cathy O’Neil
What I do is, Okay, so let me tell you about the other half of my work, which is adversarial audits.
Brian Wright
I see,
Cathy O’Neil
I work with the Attorney General's I work now with a Colorado Insurance Commissioner, I work with, you know, federal agencies that are interested in these things. And I'm not allowed to talk about it yet. But they're like, how do we enforce city discrimination laws when there are algorithms? And I'm like, Yeah, great question. I can help you with that. So basically, and that's not well paid. Just FYI, it takes a lot of time. And there's a lot of people to convince, but it is eventually going to pay off in the form of leverage for that first type of ghost client, because they're the answer to the question, why should we ask these questions is because otherwise the regulator will get you in trouble for not doing so.
Brian Wright
And then have you seen progress in kind of regulatory..?
Cathy O’Neil
Yeah,
Brian Wright
…activity associated with this? Right? I think you mentioned something in New York, right?
Cathy O’Neil
Well, New York City, City Council just passed a law a couple a few months ago that requires all employers in New York City to audit hiring algorithms. The rules haven't come out, so we don't exactly know what that means, but it’s supposed to start in 2023. So yes, that is exciting for me. If they require a third party auditor for the algorithm, and I'm one of the only, like, I have one of the only companies…
Brian Wright
That's great for ORCCA.
Cathy O’Neil
Yeah, that's great. Yeah. So good for, but it could be bad too, to be honest. Like if the rules are bad if there could pop up a bunch of competitors that are willing to do crappy things and call it an audit.
Brian Wright
Oh, I see.
Cathy O’Neil
So one of the other things I do in the meantime, is I tried to set standards for what is an audit? It can't be nothing. Right?
Brian Wright
So it just can't be superficial, right? It has to be some type of activity, but still even, even if we're just getting started, and this seems like progress, right? And then maybe if your continued activity, they could kind of next generation this thing.
Cathy O’Neil
100% in the following sense, like when I started researching my book, Weapons of Math Destruction, which came out in 2016. But I started researching in 2012. Nobody I knew cared, like, everybody, I talked to thought that algorithms must be perfect. And I was like, No, as a data scientist, I can assure you that I make lucky people luckier and unlucky people unluckier. And the way I decide whether someone is lucky or unlucky is based on their wealth, their race and their gender. And that's how we do it. And like, that's a simplification, but not much of one. And so obviously, it's racist, sexist and classist. Class, because it has happened to be a protected class, by the way, ironically. So it's not illegal to be classist. It's not illegal to discriminate against poor people. But some of those things are illegal. I knew that eventually down the line, this is going to be a thing and people would have to take it seriously. And people are starting to take it seriously.
Brian Wright
Well, I would certainly say that's true. I mean, think I mean, thank goodness, you had that foresight. Right? I mean, to, you know, no short extent, I think your book kind of created this whole modern ideal about exploring this particular concern about the expansion of data science generally, you know.
Cathy O’Neil
I don't think I invented any of this. But I do think that my book helped make it really easy to understand. And that is what I was intending. So I'm really happy about that.
Brian Wright
Yeah, I think we all are. So let's pivot a little bit. Let's talk about, I wonder your thoughts on data science education, since we're kind of here, yeah, at UVA. So what should we? I mean, we talked about it a little bit, but tell me your thoughts. What should we make sure that the future data scientists should be learning?
Cathy O’Neil
You know, it's really a struggle, if I were you to, to build a curriculum that has any, like, real connection to being a real life data scientist, like an industry, employee data scientists, like, you just don't have the data? May I say,
Brian Wright
No, that's totally fair.
Cathy O’Neil
So it's hard. Like there's the compass algorithm, a compass model dataset. It's like the most overused data set ever. I'm glad it's being used. It's to be clear, I'm talking about a recidivism risk algorithm. That was, that was this focus of a ProPublica report a few years ago, that found that black men had like twice as high a false positive risk score as a white man, which they argued meant that the algorithm itself was racist. And then the North Point which built the algorithm came back with a white paper saying we define racism differently. And according to us it’s not racist. I think that's a really, really interesting use case study for you guys to do. Because at the very least, it it brings up the question of like, what's the definition of racist? We don't have one. That's one of the things I talked about constantly is like, What metric is, what is too racist? What is acceptable racism? given that you've chosen a metric, but we haven't chosen a metric. Like, start with that? Anyway? That's a great use case, there aren't that many. And yeah, you want to teach these kids, how to think about real life consequences of algorithms. So you're gonna have to rely pretty heavily on thought experiments. But the good thing is thought experiments work really well.
Brian Wright
Well, it's funny, because I mean, that's actually how we originally met, right? I reached out because I was wanting to get some use cases to put into our curriculum as good examples of how we can do things kind of from an ethical lens. Yeah. When we're starting to build...
Cathy O’Neil
You wrote to me, and I was like, No, I just want to visit. Can I visit?
Brian Wright
And I was like, Yes, that sounds fantastic. Yeah. So I mean, I think that's the way that we're designing things. It's important to think about that. And, you know, I've really thought a lot about trying to get the students off, I mean, getting them off grounds to write like, getting them embedded in companies that are struggling with these types of problems. I think it's gonna be really important.
Cathy O’Neil
Oh, 100%. Yeah, I was just talking to your dean about about the capstone projects, that your master's students have. And I think that's wonderful. And you know, there is no data set, that's good. I mean, can I say like, yeah, every data set sucks. And so if you just, if you just get assigned a couple actual projects that actual people have, then you have to sort of reckon with the like, complete, completely crappy data set and you have to sort of say, Okay, what is this missing data mean about the blind spots of my algorithm? What does this bias data mean on the unintended, unintended bias of my predictions on like, that's the really how you get these kids to think.
Brian Wright
Yeah. And I think we're, we're trying to design like I said, I'm trying to have kind of these active learning as the default approach for these, and then don't present the solutions, but let them reason through them to a certain extent. And I think that's how you, at least, we were, you know, considering that that's probably the way that we get there for that.
Cathy O’Neil
You know, I love that idea. I mean, and it goes back to our earlier conversation about the craft of data science and the sensitivity analysis, like sometimes one of the projects I wanted to do, and I think you guys should do now that I think of it, is give, give, like, five or six groups of students the same exact project with the same exact data and then compare the answers, because they're going to be totally different. And, and backtracking and sort of delving into why they're different, not saying anybody was wrong about it. Yeah. But why are they so different?
Brian Wright
No, that's a very exciting to, to kind of like, you know, we do our own algorithmic audit associated with how they came to these conclusions.
Cathy O’Neil
Exactly.
Brian Wright
And then have them debate about why the results are different.
Cathy O’Neil
Sometimes it's like a hyper parameter chain difference, like we used, we use three variables they used two, you know, like, whatever, like coefficients or whatever. And you and you're gonna get different results. And they're both gonna be like, perfectly okay, as answers, but they're gonna be different.
Brian Wright
Yeah, the subtlety on the front end can make a huge impact, you know, in these things go live, things like that. So like having them understand that, that these are deliberate choices, like you're making choices, and you have amount of freedom. So I think people come in to this idea about machine learning or statistical learning models that they're very canonical, right, that there are these things, but that's not really true at all.
Cathy O’Neil
It's not at all true.
Brian Wright
They're hyper dynamic. I mean, you can change them almost at a whim. And it can make such a huge difference. And that's a hard thing. And accidentally, right, unintended. Hey, can you tell us about your new book?
Cathy O’Neil
Yeah, I talk about shame as a profit motive. First sort of traditional, like, you know, age shaming would be the cosmetics industry, fat shaming would be the weight loss industry, but also, and they're making, they're doing the shame that directly shaming people to buy a product, that doesn't work, so that they come back for more, it's like that simple. But I also talked about other shame machines that are not exactly profiting, but they are profiting but it's not money it’s power. So like the Catholic Church shaming the victims of, of child abuse, or the way we shame poor people in this country, or the way we shame people with addictions in this country, and sometimes that is propped for profit, if you look at the Sackler family, emails to each other, like, shame the addicts, was an actual phrase from one of the, one of the, one of them to the another. And then the rehab centers, which are shame based, and don't even use like medically assisted treatment, which is much, much better than those stuff they do, which is, you know, basically, Shame.
Shame doesn't work. That's kind of my point, like punching down shame, that kind of shame doesn't work. So that's shame, for profit, shame, industrial complex, if you will. The new, the newer, sort of instantiation of this notion of profiting by shame is the social media giants, where they're not shaming us directly to make us buy our product, they're, they're creating the perfect system for us to shame each other.
Brian Wright
I see
Cathy O’Neil
And thereby profiting off of us. So they're making us work for them for free. And, and it's crazy, because it's so successful. And the reason it's successful, by the way is because it's actually really fun to shame people. When I say fun, it's like, it doesn't feel fun, exactly, but it lights up our pleasure centers.
Brian Wright
Sure.
Cathy O’Neil
And then when we, on the, on the top of that the design of the social media is that our in group, our friends and stuff, they congratulate us on being so righteous. So we get more pleasure center bumps. And so at the end of the day, we are conditioned to do that kind of thing, even if it doesn't work, which it often doesn't, because we're often punching down, which is to say punching at people who don't have actual choice to conform, or don't have, all of us almost never have the voice I talked about punching down as choice, a matter of choice and voice. Like if you're punching at somebody who doesn't have a choice, that's inappropriate. If you punch, somebody who cannot defend themselves or cannot be seen to improve their behavior, that's also inappropriate. And it's almost always true. Like when you see people shaming on Facebook, or Twitter or something, they're not going to be seen again, we don't know what's going to happen to them. They might change their tune tomorrow and behave better, but we're never going to know. So that's punching down. That happens so much. So then you might be like, well, what's the point of that shame? Because they're not going to, they're not going to conform, at least not visibly. The answer is, it's performative. Mostly, it's mostly performative. And we are conditioned to do that performative shaming, and they make money from us doing it because it's a spiral. It's a spiral like we shame them, they get outraged. They shame us back, because we're often shaming people. They don't even agree on the norm. So that's definitely not going to work. Anyway, the point is that social media has perfected the art of engaging shame, manufacturing shame, for profit. They're sort of the new instantiation of the shame machine. But having said all that, the book isn't anti shame.
Brian Wright
Okay.
Cathy O’Neil
I'm actually pro shame.
Brian Wright
Okay.
Cathy O’Neil
But I'm not pro inappropriate punching down shame, I'm pro punching up shame, which is the opposite. So you're punching up at somebody with, with a choice. And with a voice. Typically, that means you're holding power to account every single civil rights movement was shame based. Holding power to account is shame on you, you say you believe this, you're not acting that way, behave better, we're watching you. So just the very fact that they're being watched means they have the power to defend themselves and the, the sort of staying power to be seen behaving better. And the point is that you're saying, You have, you had a stated ideal that you're not living up to. So hypocrite. That is punching up and we need it. The problem is that we aim too low. We are, and Facebook makes us aim too low.
Brian Wright
Now this makes sense. It's, it's interesting. This is like the duality we talk about, like social media is not going anywhere, right? We're not going to suddenly eviscerate these gigantic,
Cathy O’Neil
I really wish we could.
Brian Wright
I wish we could to, I have young, young girls that I'm terrified for it. But you know, there is this idea that, you know, maybe, maybe if we can think about it in the broadest sense possible that there, there could, this is the most effective platform for positive shaming, that
Cathy O’Neil
If you think about all the energy we put to shaming, to punching down and punching people we don't know, we'll never see again. And we've just imagined taking 5% of that, using solidarity to punch up. Instead, it would be so powerful, especially if we punched up at Facebook. At the social media giants themselves for profiting. I just want to tell you one story, and then we'll be done. Okay, so the story is, I was invited to Kyiv. Last September, it was the only trip I took before this one essentially. And I didn't even know why I was there. But like some oligarch brought me there who was pro Western pro democracy. And I met Zelinsky, for example, he was there. And I was talking about algorithms and how they optimize us to fight with each other. And, you know, I was like, I don't know why I'm here, but I'm here. And then this woman in the Q&A afterwards was like, I am a member of parliament. I used to be a technologist. When I was a technologist. I thought technology was going to save democracy. And now I realized that technology is destroying democracy. And what can we do in the Ukraine Parliament about Russian propaganda that is undermining people's trust in in democracy in the Ukraine. And I'm like, dude, nothing, you can do fucking nothing. I'm sorry, I was not supposed to swear.
Brian Wright
It's fine.
Cathy O’Neil
You can do nothing. And it's outrageous that you can do nothing. It's outrageous that our politicians aren't, aren't putting this, like putting us off to this. And, you know, when, when Russia actually invaded Ukraine, and Facebook made a big deal about how they're going to stop Russian state propaganda. Like, they're not gonna stop profiting from Russian state propaganda, as if that's like some great thing. It's like, dude, you guys made how much money setting this all up? You know, they made it, they played a major role in the, in the sort of table setting of that invasion. And I will not forgive them for that.
Brian Wright
Well, and none of us should. I mean, one of many, right.
Cathy O’Neil
So, they've been doing this type of thing across... I mean, nevermind, the Rohingya, Myanmar, and there was a genocide against them because of the Facebook situation. So yeah, we should shut them down, to be honest.
Brian Wright
Yeah. I mean, if maybe, it's the federal regulation. Maybe it's, you know, we don't have the laws...
Cathy O’Neil
Yes. That's what Congress is for. Yeah. So. And to be honest, like I spoke to the Senate, Senate committee about this, it is bipartisan, like there is a bipartisan push to like, do something, although I'm not really sure what, it's not clear.
Brian Wright
Well, it's important for us to, you know, have voices like yours pushing for this.
Cathy O’Neil
Thank you.
Brian Wright
No, because I don't think there's that many people that really think about it in the terms that they should, you know, they have kind of complicated…
Cathy O’Neil
It's profiting from propaganda. I don't know why that's so hard to grok. But I think, I think it's hard for politicians to grok because that's how they make their money. They actually, they actually make money from appeals on Facebook.
Brian Wright
Yeah. And I think we, we talked about functioning, like any, you know, 21st kind of modern democracy, you know, there's a real importance for civic education. But I think, you know, the frontier, and that's a threshold we still have not totally crossed, like the frontier would be data literacy. Is going to be something that people are going to have to know universally, right. And so we're thinking about ways to expand, you know, even into K through 12. We're working here in the state of Virginia to try and promote the ideas about, you know, how algorithms work and data science education earlier on, because I think it'd be more important as we dig further down the road.
Cathy O’Neil
I'm not going to disagree that everybody should have a basic knowledge of stuff, but I just want to caution that I don't think we should ever make it a requirement for the average person to understand how algorithms work. The, the average person should be protected from, their human rights should be protected, if you will, or their rights or constitutional rights should be protected, without them having to sort of understand it and fight. So that's a pretty important thing. But I'm not saying I'm against literacy, I'm totally for literacy. And I do think it'll be helpful for people to know, even if it just means that they'll know enough to be like, I don't believe you.
Brian Wright
That's what I think, there's just having people be aware of how these systems work, like that type of, maybe it's more like an overview of you know, that how technology is impacting their lives as an option, as an important part of general education.
Cathy O’Neil
When, when data science sort of got popular in 2012, or whenever it was, people were like, it's gonna make things more fair because it's a, it's an algorithm. That's not true. That was pure hype. But theoretically, we could make things more fair, we could choose values that we aspire to, and embed them in code, we could do that. That's the most exciting thing I think about the future about data science.
Brian Wright
Well, that's what we're trying to do here. I hope so at least, we're designing our classes that way. And thank you for being here. And being a part of that. And if you can help us…
Cathy O’Neil
I’m excited about this place.
Brian Wright
Yeah, keep it coming.
Cathy O’Neil
Thanks. Thanks for emailing me, Brian.
Brian Wright
Hey, no problem! Thanks for responding so quickly and coming and visiting. It's been a pleasure.
Cathy O’Neil
It has for me, too.
Brian Wright
All right.
Monica Manney
Thanks for listening to UVA Data Points. For more information on Cathy O'Neil, head to her website, mathbabe.com. And be sure to check out her books Weapons of Math Destruction and The Shame Machine. You can find these links and more in the episode show notes.
For more information on the UVA School of Data Science visit datascience.virginia.edu. And if you're enjoying UVA Data Points, write us a review wherever you listen to podcasts. And if you have an episode idea, email us at
[email protected]. Our next episode, which will focus on the area of Design, features a conversation between Raf Alvarado, who you heard from in our trailer episode, and Alison Bigelow, who's with the UVA Department of Spanish, Italian and Portuguese. They discuss their exploration of the Kiche Mayan book of creation. This episode will release on October 1, but keep an eye out for bonus episodes later this month.
We'll see you next time.
<outro music>