[Speaker slides were not presented in the video recording]
Julia Angwin: Hi there. I am going to talk about something that doesn’t sound like it should be at a tech conference, but is. So I want tell you about what I’ve learned about forgiveness in my studies of algorithmic accountability. But I’m going to start with a little bit of just a small history of me and my own experience that I think will be relevant.
So, this is me. I grew up in Silicon Valley, in Palo Alto, at a time when the personal computer was really exciting. This is my first computer. And I thought I would be a programmer when I grew up. I actually didn’t know there were any choices other than hardware or software. I thought like, you pick one and then you’re cool. And I was like software seemed more interesting. So I was studying to do that. And I took a wrong turn somehow and I fell in love with journalism, my college newspaper, and thought well, I’ll just do this for a little while, it seems fun.
So I ended up at The Wall Street Journal in 2000. They hired me to cover “the Internet.” And I was like, “Anything in particular about the Internet?”
They were like, “Nah. Just everything. You seem like you know computers.”
So I was like okay, that sounds like a great assignment. So I spent thirteen years there covering technology. And I was in the New York office so I had the weird angle of mostly covering the AOL-Time Warner merger.
So I want to talk about what I learned about forgiveness in my time as a reporter. So, for fourteen years, I was covering technology for The Wall Street Journal. I wrote a lot of stories, right. Most of them weren’t that interesting, but some of them were. One of the biggest stories I worked on was the AOL-Time Warner merger, because AOL was a “tech company” of its time. And that merger consumed ten years of my life.
And really, as you probably all may remember it was based on accounting fraud, right. Like they were they were actually doing this crazy thing. AOL, they had like a cafeteria—they would ask the guys to provide the cafeteria services at AOL. And instead of paying them, they would say, “We’re going to pay you double and then you buy ads.” And that was the game. That’s how they got all this ad revenue. Because Wall Street was only valuing on ad revenue, not on profit.
So that was fun times. And in the end I think they paid a $300 million fine. It was big story. I got to be part of a Pulitzer Prize winning team. Super good.
But, what’s weird is that of my reporting there’s only two times where people that I wrote about went to jail. So, one was a spammer. So, this guy was a spammer—he was called the Buffalo Spammer. It was the early days of spam, so it was exciting. I went to his house and knocked on his door, talked to his mother, you know, etc. And as part of my reporting and then additional…you know, the New York Attorney General charged him, and he actually went to jail. He got the maximum sentence of three and a half years.
The other guy who I wrote about who went to jail was an AOL executive, actually, who did an embezzlement scheme. I don’t have his photo—he seems to have removed from the Internet, but he was also a black man. And he also got a prison sentence for doing some small-time embezzlement; he was the head of HR.
And then I think about where the former AOL executive I wrote about who did all those round-trip deals—they’re doing cool. Steve Case, funding a lot of things. Bob Pittman, running a giant radio network. Dave Colburn, the guy who actually did all the deals that were round-tripped, he settled with the FTC for four million and now he’s investing in tech companies and in Israel.
So you know…look, this isn’t a particularly unique story. But let’s just say this is a very American story about forgiveness. Who is forgiven in this case? What was the unique factor about these people? I don’t know, they were white men. I don’t know, they were really powerful, right? And I find it really depressing and sad that the two people I wrote about as a tech reporter who went to prison were both black men, because as you know they were probably the only two black men I ever encountered in my whole time covering technology, right?
So, we already know that our society hands out forgiveness and punishment unequally. Right? I’m not telling you anything you don’t know, I’m just telling it to you in the form of a personal story.
So, flash forward to I decide to write a series about algorithmic accountability at my new employer, ProPublica. As we all know, algorithms are very important in our lives, right. This is…if you haven’t seen it, The Wall Street Journal Blue Feed, Red Feed is a delightful app that you can visit every day; this is last night. And it shows you sort of the top stories that would be trending in a conservative news feed and in a liberal news feed. So of course yesterday was spectacular. The conservative news feed, Trump: economy growing 3%. And the liberal news feed: Mueller’s charges, right. So it’s like, very different stories are being presented.
So I decided I wanted to do some accountability studies about algorithms in our lives. And it’s hard to study the newsfeed in a quantitative way, and I also wanted something with higher stakes. So I started with an algorithm that is used in the criminal justice system to predict whether a person is likely to commit a future crime. This is, literally, Minority Report software basically, that is used throughout the United States for sentencing, parole, pretrial release, a lot of different stages of the criminal justice system.
So, how many people were aware that that’s even happening? Okay, good. Yay. When I started looking into it two years ago, it wasn’t as well-known that this was even being done. And so I thought well, I’m going to look into this and see if I can actually figure out if the software is biased.
So, I went and did a freedom of information request in Florida. And it took five months and some legal wrangling but we did get all of the scores that they had assigned everyone who was arrested during a two-year period. So first thing we did was put those scores up in a histogram. What you see is that on the left is black defendants, and on the right is white defendants. And so the distribution of scores 1 through 10—which is basically 1, least risky; 10, most risky—is very even for the black defendants. But for the white defendants it’s strangely skewed towards low risk, right?
So my first thought was, “Huh. That’s weird.” But, you can’t really say it’s 100% biased until you test whether it’s accurate, right? What if everyone of the white defendants was Mother Teresa, right, and they never did anything wrong… It was just some weird like, jaywalking ticket or something.
So, we went and did six months of scraping the criminal records of every one of those defendants—that’s 18,000 people; It was a complete nightmare—and joining those data sets to make sure we had the match of a person’s score with their true recidivism. Did they actually go on to commit a crime in the next two years. And also what was their prior record like?
And what we found was that there was a disparity, right. When you did a logistic regression, which is just a statistical technique which allows you to control for all other factors. When you control for all other factors than race, you saw that black defendants were 45% more likely to be given a high-risk score. And that’s controlling for the outcome too, right, which is like, not committing a crime or committing a crime in the future—in the next two years.
So that meant there was some disparity here. And when you looked at it in a chart, basically when you look at false positives and false negatives, you see that the difference is really stark. The false positive rate for African American defendants was twice as high, right. There were twice as likely to be given high-risk score but not actually go on to commit future crimes. So like, falsely be given a higher-risk score than a white defendant. And similarly the white defendant is twice as likely to get an unjustifiable low-risk score despite the fact that they turned out to have been far more risky.
And so, that disparity is really a question of forgiveness, right? We have decided that some people are just more forgivable up front, right, despite the fact that the facts on the ground were exactly the same. That was a surprising outcome for me, because I think we also think of bias as against, right, the bias against black defendants. But really what this was was a bias for white defendants. And it’s sort of a distinction without a difference but it’s interesting to think about, and that’s why I like to frame these conversations around forgiveness.
But you know, you could say this is a one-off. So anyways, we did another analysis— Oh, I forgot to show you what—sorry—what it looks like in practice. So here’s a black defendant and a white defendant; different same: crime, petty theft. Brisha, high risk; Vernon, low risk. So Brisha, 18 years old, walking down the street, grabbed a kid’s bicycle from a yard. Tried to get on it, ride it. Got a few yards down. The mother came out, said, “That’s my kid’s bike.” She gave it back. But in the meantime the neighbor had called the police so she was arrested. And Vernon stole about $80 worth of stuff from the CVS.
So they get these high-risk scores. They get their risk scores when they’re arrested. And basically, when you look at it it was completely the opposite, right. So Vernon got a low-risk score despite the fact that he had already committed two armed robberies, and one attempted armed robbery, and he had already served a five-year prison sentence. And, he went on to commit grand theft; he stole thousands of dollars of electronics from a warehouse and he’s now serving a ten-year prison term.
Brisha had some prior arrests, but they were juvenile misdemeanors, and so records are sealed. But I can tell you that misdemeanors are not usually armed robberies. So, let’s just say it’s a smaller crime. And she doesn’t go on to commit any crimes in the next two years. So this is what a false positive and a false negative look like in real life. That’s what forgiveness…unfair forgiveness, really in that one case, looks like. Well then you could argue we should forgive everybody. But that’s a separate issue. So anyway, this is what we found for this one thing.
So then I was like okay, I want to try another. This was fun. So we did another analysis. I was like what’s another algorithm that predicts an outcome. Well, weirdly car insurance. So, your car insurance premium that you pay is actually meant to predict your likelihood of getting in an accident, right. So I was like I want to compare that to true risk. That’s my new game. Predicted risk, true risk. That’s what I do.
So, once again it was an enormous amount of work to get all the data. Consumer Reports actually bought a proprietary data set that I analyzed with them. And we found a similar issue, which was there was a difference in the way risk was allocated. An example is… This is a guy Otis Nash. He lives in East Garfield Park in Chicago. Which is…there’s really no way to describe it, it’s pretty much a bombed-out, bad neighborhood in Chicago on the West Side, and it’s dangerous and almost entirely minority. And he pays $190 a month for car insurance. He’s never had any accidents, he’s a great driver, and he has Geico. But he’s struggling. He’s…$190 a months for somebody who works as a security guard is no joke. He works six days a week and he can barely afford it.
So then, there’s this guy Ryan across town. He pays $55 a month for the same plan from Geico, right. And he has actually just recently gotten in an accident, and has the same coverage. And you know, the real difference between these two is their ZIP code. So insurance companies actually have one factor that they use to price your insurance that is separate from your driving record, and it’s called the “ZIP code factor.” And they basically assign a risk score to each ZIP code that is independent of how you drive. And when you look at it— Now, Ryan and Otis are never going to be exactly the same. They’re not the same age, they don’t have exactly the same risk factors. But when you control for all the risk factors, every single one of our charts looks like this.
So, the chart is basically predicted risk to true risk. And if you think of predicted risk as essentially your premium. And the red straight line is for minority neighborhoods. So, for minority neighborhoods the prices track risk; they just keep going straight like a nice linear relationship. And the blue line that goes down? That’s where the white neighborhoods are. And that’s basically they go up, and then all of a sudden as they get riskier, the price goes down. Unexplainedly, right?
And so once again we have this strange measure of discount applying to white neighborhoods that is not explainable by risk. And the insurance industry to this day—we published this earlier this year—has yet to respond. They said they would come out with a “big paper” explaining why this was true, and as of yet they have not responded. I’m speaking at their convention next week. And I’m anxiously awaiting the rebuttal. Maybe they want to present it to me on stage.
But you know, this is again this weird, unexplained forgiveness for one set of people, baked into an algorithm, right. And so I guess what I want to say is like, all of you guys might be in the position to build algorithms. Maybe that’s what you’re going to do next, or maybe you’re going to be auditing them. We’re all in a world of automated decisionmaking. There will be more and more decisions that are going to be automated. And so I guess I would just like to leave you with this thought, which is we talk about bias and bias is important to think about, but think about forgiveness, too. Because in some ways, what we have done is, at least in the things I’ve studied, is that we’ve just meted out forgiveness—some people get impunity. They’re not held to the same standard—that theoretical standard that we apply to everyone else. And so take that with you. And I’d be happy to take any questions.
Sarah Marshall: Questions for Julia. While people are getting their confidence, I’m really keen to know the make-up of your team, who are you working with to kind of help you to scrape eighteen thousand records, etc.
Julia Angwin: Oh yeah, right. Yeah, I almost did a talk on the future of journalism. Which was about… I couldn’t decide, because I do feel like I’m building a new kind of journalism here. I have two programmers working for me, and a researcher. So we have a real team, and each one of these projects takes a year. And I think that as we go towards— You know, I thought the talk on the earthquake was so important. Because journalists are going to need to do much more validation, verification, forensics analysis, right. And so we do need to build more of basically quantified teams. And so I’m trying to pioneer that a little bit in my way.
Audience 1: Thank you very much. Actually, my question was delving into that a little bit deeper. How do you process all the data, and how do you actually merge the different databases? Like, if you could just explain in a little bit more detail—I’m sure it’s a very complicated process—but a little bit more in detail how like the A B C of it goes.
Angwin: Right. So, one reason that you don’t see so much work like this, including from academics, is because it’s a nightmare. So for instance, in both cases the special sauce that we brought was to match the predicted risk to the true risk. And what that means really is a giant database join, right. And those are super messy. And in both cases… You know, one took six months, one took nine months. And there’s really no getting around the fact that you have to do a lot of it by hand. Like there’s…we tried to automate, and we tried to do probabilistic matching and all that stuff. But truthfully, the standards that we’re held to as journalists is it can’t be just probabilistic match. It can’t be like 80% right. It has to be right.
And so in the end we ended up doing a lot of hand matching of records. Which was…horrible. And one thing I’ve been thinking about a lot is how to build more capacity for that. Because I don’t think… Most newsrooms can’t do this, right. ProPublica is like, you know, this utopian universe of journalism, nonprofit funded, really. Doing well and invested in this type of work. But that’s not true of most newsrooms. And so I’ve been thinking about the fact that this is something maybe Mechanical Turk could be brought to bear. I’m actually trying to work with this amazing coding group at San Quentin prison in California. They actually have a coding academy and they need— So I’m trying to work with them to teach the inmates maybe to help with this type of matching? I think that there’s a lot of untapped opportunities for this type of work that I’ve been trying to explore, because I do think this is the gating factor for this type of work.
Audience 3: Hi. I work for the New York City Department of Education. And we’ve actually been using your articles to teach students about algorithmic bias—
Angwin: Oh yay!
Audience 3: —so thank you for writing such important journalism that our students can use. But I was curious, how might we think about empowering the next generation of students to make ethical decisions? Some of this can feel a little bit hopeless when you see some of it, and how do we make them feel empowered?
Angwin: Oh, I love that question, because I am a strangely hopeful person despite my weird job of doing only unhopeful things. And so I do believe you know… Like, the criminal risk score algorithm is a good example. If they they fixed… So, after our story came out, a bunch of mathematicians and computer scientists came out with all these papers studying our our data set and coming up with some theoretical conclusions. And essentially, they all said you know, you could fix this algorithm if you were to balance the error rates. Like if you were to choose to optimize your algorithm to balance the error rates— They’ve chosen to optimize it another way, which is for predictive accuracy. So meaning it’s “correct” in its predictions 60% of the time for both black and white defendants.
But, when it’s wrong it’s wrong in this completely disparate way, right. So you could actually fix it that way, and all that would happen—the only “bad” outcome—was the algorithm would be more accurate for black defendants than white. Which makes sense because there’s actually more black defendants in our criminal justice system.
So, what’s weird is there is a hopeful outcome, right. There’s like something you could do. Now, I would also, though, like to step back and say I’m not entirely sure we should be predicting anyone’s criminality in the future. I can’t even predict my husband and I’ve been married to him for a very long time. It’s like predicting human behavior, like, we can’t even get our maps to get us to the right place most of the time. So is this really where we want to bring computers to bear, is predicting human behavior? I feel like this is maybe like a future thing that we’re not going to be so good at, yet.
But I do think that algorithms are going to be better than people, right. In a lot of ways. But we have to learn how to hold them accountable, and we have to build systems around that. I’m perfectly sure that a car is going to drive better than me. I’m a pretty bad driver, right? So I feel like there is— I don’t want to be a Luddite about it, I just want to say we need systems of oversight and accountability before we can move forward, because otherwise it really will be the Wild West.