https://www.youtube.com/watch?v=Yei7SS8nEqw

[Speaker slides were not pre­sent­ed in the video record­ing]

Julia Angwin: Hi there. I am going to talk about some­thing that does­n’t sound like it should be at a tech con­fer­ence, but is. So I want tell you about what I’ve learned about for­give­ness in my stud­ies of algo­rith­mic account­abil­i­ty. But I’m going to start with a lit­tle bit of just a small his­to­ry of me and my own expe­ri­ence that I think will be rel­e­vant.

So, this is me. I grew up in Silicon Valley, in Palo Alto, at a time when the per­son­al com­put­er was real­ly excit­ing. This is my first com­put­er. And I thought I would be a pro­gram­mer when I grew up. I actu­al­ly did­n’t know there were any choic­es oth­er than hard­ware or soft­ware. I thought like, you pick one and then you’re cool. And I was like soft­ware seemed more inter­est­ing. So I was study­ing to do that. And I took a wrong turn some­how and I fell in love with jour­nal­ism, my col­lege news­pa­per, and thought well, I’ll just do this for a lit­tle while, it seems fun.

So I end­ed up at The Wall Street Journal in 2000. They hired me to cov­er the Internet.” And I was like, Anything in par­tic­u­lar about the Internet?”

They were like, Nah. Just every­thing. You seem like you know com­put­ers.”

So I was like okay, that sounds like a great assign­ment. So I spent thir­teen years there cov­er­ing tech­nol­o­gy. And I was in the New York office so I had the weird angle of most­ly cov­er­ing the AOL-Time Warner merg­er.

So I want to talk about what I learned about for­give­ness in my time as a reporter. So, for four­teen years, I was cov­er­ing tech­nol­o­gy for The Wall Street Journal. I wrote a lot of sto­ries, right. Most of them weren’t that inter­est­ing, but some of them were. One of the biggest sto­ries I worked on was the AOL-Time Warner merg­er, because AOL was a tech com­pa­ny” of its time. And that merg­er con­sumed ten years of my life.

And real­ly, as you prob­a­bly all may remem­ber it was based on account­ing fraud, right. Like they were they were actu­al­ly doing this crazy thing. AOL, they had like a cafeteria—they would ask the guys to pro­vide the cafe­te­ria ser­vices at AOL. And instead of pay­ing them, they would say, We’re going to pay you dou­ble and then you buy ads.” And that was the game. That’s how they got all this ad rev­enue. Because Wall Street was only valu­ing on ad rev­enue, not on prof­it.

So that was fun times. And in the end I think they paid a $300 mil­lion fine. It was big sto­ry. I got to be part of a Pulitzer Prize win­ning team. Super good.

But, what’s weird is that of my report­ing there’s only two times where peo­ple that I wrote about went to jail. So, one was a spam­mer. So, this guy was a spammer—he was called the Buffalo Spammer. It was the ear­ly days of spam, so it was excit­ing. I went to his house and knocked on his door, talked to his moth­er, you know, etc. And as part of my report­ing and then additional…you know, the New York Attorney General charged him, and he actu­al­ly went to jail. He got the max­i­mum sen­tence of three and a half years.

The oth­er guy who I wrote about who went to jail was an AOL exec­u­tive, actu­al­ly, who did an embez­zle­ment scheme. I don’t have his photo—he seems to have removed from the Internet, but he was also a black man. And he also got a prison sen­tence for doing some small-time embez­zle­ment; he was the head of HR.

And then I think about where the for­mer AOL exec­u­tive I wrote about who did all those round-trip deals—they’re doing cool. Steve Case, fund­ing a lot of things. Bob Pittman, run­ning a giant radio net­work. Dave Colburn, the guy who actu­al­ly did all the deals that were round-tripped, he set­tled with the FTC for four mil­lion and now he’s invest­ing in tech com­pa­nies and in Israel.

So you know…look, this isn’t a par­tic­u­lar­ly unique sto­ry. But let’s just say this is a very American sto­ry about for­give­ness. Who is for­giv­en in this case? What was the unique fac­tor about these peo­ple? I don’t know, they were white men. I don’t know, they were real­ly pow­er­ful, right? And I find it real­ly depress­ing and sad that the two peo­ple I wrote about as a tech reporter who went to prison were both black men, because as you know they were prob­a­bly the only two black men I ever encoun­tered in my whole time cov­er­ing tech­nol­o­gy, right?

So, we already know that our soci­ety hands out for­give­ness and pun­ish­ment unequal­ly. Right? I’m not telling you any­thing you don’t know, I’m just telling it to you in the form of a per­son­al sto­ry.

So, flash for­ward to I decide to write a series about algo­rith­mic account­abil­i­ty at my new employ­er, ProPublica. As we all know, algo­rithms are very impor­tant in our lives, right. This is…if you haven’t seen it, The Wall Street Journal Blue Feed, Red Feed is a delight­ful app that you can vis­it every day; this is last night. And it shows you sort of the top sto­ries that would be trend­ing in a con­ser­v­a­tive news feed and in a lib­er­al news feed. So of course yes­ter­day was spec­tac­u­lar. The con­ser­v­a­tive news feed, Trump: econ­o­my grow­ing 3%. And the lib­er­al news feed: Mueller’s charges, right. So it’s like, very dif­fer­ent sto­ries are being pre­sent­ed.

So I decid­ed I want­ed to do some account­abil­i­ty stud­ies about algo­rithms in our lives. And it’s hard to study the news­feed in a quan­ti­ta­tive way, and I also want­ed some­thing with high­er stakes. So I start­ed with an algo­rithm that is used in the crim­i­nal jus­tice sys­tem to pre­dict whether a per­son is like­ly to com­mit a future crime. This is, lit­er­al­ly, Minority Report soft­ware basi­cal­ly, that is used through­out the United States for sen­tenc­ing, parole, pre­tri­al release, a lot of dif­fer­ent stages of the crim­i­nal jus­tice sys­tem.

So, how many peo­ple were aware that that’s even hap­pen­ing? Okay, good. Yay. When I start­ed look­ing into it two years ago, it was­n’t as well-known that this was even being done. And so I thought well, I’m going to look into this and see if I can actu­al­ly fig­ure out if the soft­ware is biased.

So, I went and did a free­dom of infor­ma­tion request in Florida. And it took five months and some legal wran­gling but we did get all of the scores that they had assigned every­one who was arrest­ed dur­ing a two-year peri­od. So first thing we did was put those scores up in a his­togram. What you see is that on the left is black defen­dants, and on the right is white defen­dants. And so the dis­tri­b­u­tion of scores 1 through 10—which is basi­cal­ly 1, least risky; 10, most risky—is very even for the black defen­dants. But for the white defen­dants it’s strange­ly skewed towards low risk, right?

So my first thought was, Huh. That’s weird.” But, you can’t real­ly say it’s 100% biased until you test whether it’s accu­rate, right? What if every­one of the white defen­dants was Mother Teresa, right, and they nev­er did any­thing wrong… It was just some weird like, jay­walk­ing tick­et or some­thing.

So, we went and did six months of scrap­ing the crim­i­nal records of every one of those defendants—that’s 18,000 peo­ple; It was a com­plete nightmare—and join­ing those data sets to make sure we had the match of a per­son­’s score with their true recidi­vism. Did they actu­al­ly go on to com­mit a crime in the next two years. And also what was their pri­or record like?

And what we found was that there was a dis­par­i­ty, right. When you did a logis­tic regres­sion, which is just a sta­tis­ti­cal tech­nique which allows you to con­trol for all oth­er fac­tors. When you con­trol for all oth­er fac­tors than race, you saw that black defen­dants were 45% more like­ly to be giv­en a high-risk score. And that’s con­trol­ling for the out­come too, right, which is like, not com­mit­ting a crime or com­mit­ting a crime in the future—in the next two years.

So that meant there was some dis­par­i­ty here. And when you looked at it in a chart, basi­cal­ly when you look at false pos­i­tives and false neg­a­tives, you see that the dif­fer­ence is real­ly stark. The false pos­i­tive rate for African American defen­dants was twice as high, right. There were twice as like­ly to be giv­en high-risk score but not actu­al­ly go on to com­mit future crimes. So like, false­ly be giv­en a higher-risk score than a white defen­dant. And sim­i­lar­ly the white defen­dant is twice as like­ly to get an unjus­ti­fi­able low-risk score despite the fact that they turned out to have been far more risky.

And so, that dis­par­i­ty is real­ly a ques­tion of for­give­ness, right? We have decid­ed that some peo­ple are just more for­giv­able up front, right, despite the fact that the facts on the ground were exact­ly the same. That was a sur­pris­ing out­come for me, because I think we also think of bias as against, right, the bias against black defen­dants. But real­ly what this was was a bias for white defen­dants. And it’s sort of a dis­tinc­tion with­out a dif­fer­ence but it’s inter­est­ing to think about, and that’s why I like to frame these con­ver­sa­tions around for­give­ness.

But you know, you could say this is a one-off. So any­ways, we did anoth­er analy­sis— Oh, I for­got to show you what—sorry—what it looks like in prac­tice. So here’s a black defen­dant and a white defen­dant; dif­fer­ent same: crime, pet­ty theft. Brisha, high risk; Vernon, low risk. So Brisha, 18 years old, walk­ing down the street, grabbed a kid’s bicy­cle from a yard. Tried to get on it, ride it. Got a few yards down. The moth­er came out, said, That’s my kid’s bike.” She gave it back. But in the mean­time the neigh­bor had called the police so she was arrest­ed. And Vernon stole about $80 worth of stuff from the CVS.

So they get these high-risk scores. They get their risk scores when they’re arrest­ed. And basi­cal­ly, when you look at it it was com­plete­ly the oppo­site, right. So Vernon got a low-risk score despite the fact that he had already com­mit­ted two armed rob­beries, and one attempt­ed armed rob­bery, and he had already served a five-year prison sen­tence. And, he went on to com­mit grand theft; he stole thou­sands of dol­lars of elec­tron­ics from a ware­house and he’s now serv­ing a ten-year prison term.

Brisha had some pri­or arrests, but they were juve­nile mis­de­meanors, and so records are sealed. But I can tell you that mis­de­meanors are not usu­al­ly armed rob­beries. So, let’s just say it’s a small­er crime. And she does­n’t go on to com­mit any crimes in the next two years. So this is what a false pos­i­tive and a false neg­a­tive look like in real life. That’s what for­give­ness…unfair for­give­ness, real­ly in that one case, looks like. Well then you could argue we should for­give every­body. But that’s a sep­a­rate issue. So any­way, this is what we found for this one thing.

So then I was like okay, I want to try anoth­er. This was fun. So we did anoth­er analy­sis. I was like what’s anoth­er algo­rithm that pre­dicts an out­come. Well, weird­ly car insur­ance. So, your car insur­ance pre­mi­um that you pay is actu­al­ly meant to pre­dict your like­li­hood of get­ting in an acci­dent, right. So I was like I want to com­pare that to true risk. That’s my new game. Predicted risk, true risk. That’s what I do.

So, once again it was an enor­mous amount of work to get all the data. Consumer Reports actu­al­ly bought a pro­pri­etary data set that I ana­lyzed with them. And we found a sim­i­lar issue, which was there was a dif­fer­ence in the way risk was allo­cat­ed. An exam­ple is… This is a guy Otis Nash. He lives in East Garfield Park in Chicago. Which is…there’s real­ly no way to describe it, it’s pret­ty much a bombed-out, bad neigh­bor­hood in Chicago on the West Side, and it’s dan­ger­ous and almost entire­ly minor­i­ty. And he pays $190 a month for car insur­ance. He’s nev­er had any acci­dents, he’s a great dri­ver, and he has Geico. But he’s strug­gling. He’s…$190 a months for some­body who works as a secu­ri­ty guard is no joke. He works six days a week and he can bare­ly afford it.

So then, there’s this guy Ryan across town. He pays $55 a month for the same plan from Geico, right. And he has actu­al­ly just recent­ly got­ten in an acci­dent, and has the same cov­er­age. And you know, the real dif­fer­ence between these two is their ZIP code. So insur­ance com­pa­nies actu­al­ly have one fac­tor that they use to price your insur­ance that is sep­a­rate from your dri­ving record, and it’s called the ZIP code fac­tor.” And they basi­cal­ly assign a risk score to each ZIP code that is inde­pen­dent of how you dri­ve. And when you look at it— Now, Ryan and Otis are nev­er going to be exact­ly the same. They’re not the same age, they don’t have exact­ly the same risk fac­tors. But when you con­trol for all the risk fac­tors, every sin­gle one of our charts looks like this.

So, the chart is basi­cal­ly pre­dict­ed risk to true risk. And if you think of pre­dict­ed risk as essen­tial­ly your pre­mi­um. And the red straight line is for minor­i­ty neigh­bor­hoods. So, for minor­i­ty neigh­bor­hoods the prices track risk; they just keep going straight like a nice lin­ear rela­tion­ship. And the blue line that goes down? That’s where the white neigh­bor­hoods are. And that’s basi­cal­ly they go up, and then all of a sud­den as they get riski­er, the price goes down. Unexplainedly, right?

And so once again we have this strange mea­sure of dis­count apply­ing to white neigh­bor­hoods that is not explain­able by risk. And the insur­ance indus­try to this day—we pub­lished this ear­li­er this year—has yet to respond. They said they would come out with a big paper” explain­ing why this was true, and as of yet they have not respond­ed. I’m speak­ing at their con­ven­tion next week. And I’m anx­ious­ly await­ing the rebut­tal. Maybe they want to present it to me on stage.

But you know, this is again this weird, unex­plained for­give­ness for one set of peo­ple, baked into an algo­rithm, right. And so I guess what I want to say is like, all of you guys might be in the posi­tion to build algo­rithms. Maybe that’s what you’re going to do next, or maybe you’re going to be audit­ing them. We’re all in a world of auto­mat­ed deci­sion­mak­ing. There will be more and more deci­sions that are going to be auto­mat­ed. And so I guess I would just like to leave you with this thought, which is we talk about bias and bias is impor­tant to think about, but think about for­give­ness, too. Because in some ways, what we have done is, at least in the things I’ve stud­ied, is that we’ve just met­ed out forgiveness—some peo­ple get impuni­ty. They’re not held to the same standard—that the­o­ret­i­cal stan­dard that we apply to every­one else. And so take that with you. And I’d be hap­py to take any ques­tions.


Sarah Marshall: Questions for Julia. While peo­ple are get­ting their con­fi­dence, I’m real­ly keen to know the make-up of your team, who are you work­ing with to kind of help you to scrape eigh­teen thou­sand records, etc.

Julia Angwin: Oh yeah, right. Yeah, I almost did a talk on the future of jour­nal­ism. Which was about… I could­n’t decide, because I do feel like I’m build­ing a new kind of jour­nal­ism here. I have two pro­gram­mers work­ing for me, and a researcher. So we have a real team, and each one of these projects takes a year. And I think that as we go towards— You know, I thought the talk on the earth­quake was so impor­tant. Because jour­nal­ists are going to need to do much more val­i­da­tion, ver­i­fi­ca­tion, foren­sics analy­sis, right. And so we do need to build more of basi­cal­ly quan­ti­fied teams. And so I’m try­ing to pio­neer that a lit­tle bit in my way.

Audience 1: Thank you very much. Actually, my ques­tion was delv­ing into that a lit­tle bit deep­er. How do you process all the data, and how do you actu­al­ly merge the dif­fer­ent data­bas­es? Like, if you could just explain in a lit­tle bit more detail—I’m sure it’s a very com­pli­cat­ed process—but a lit­tle bit more in detail how like the A B C of it goes.

Angwin: Right. So, one rea­son that you don’t see so much work like this, includ­ing from aca­d­e­mics, is because it’s a night­mare. So for instance, in both cas­es the spe­cial sauce that we brought was to match the pre­dict­ed risk to the true risk. And what that means real­ly is a giant data­base join, right. And those are super messy. And in both cas­es… You know, one took six months, one took nine months. And there’s real­ly no get­ting around the fact that you have to do a lot of it by hand. Like there’s…we tried to auto­mate, and we tried to do prob­a­bilis­tic match­ing and all that stuff. But truth­ful­ly, the stan­dards that we’re held to as jour­nal­ists is it can’t be just prob­a­bilis­tic match. It can’t be like 80% right. It has to be right.

And so in the end we end­ed up doing a lot of hand match­ing of records. Which was…hor­ri­ble. And one thing I’ve been think­ing about a lot is how to build more capac­i­ty for that. Because I don’t think… Most news­rooms can’t do this, right. ProPublica is like, you know, this utopi­an uni­verse of jour­nal­ism, non­prof­it fund­ed, real­ly. Doing well and invest­ed in this type of work. But that’s not true of most news­rooms. And so I’ve been think­ing about the fact that this is some­thing maybe Mechanical Turk could be brought to bear. I’m actu­al­ly try­ing to work with this amaz­ing cod­ing group at San Quentin prison in California. They actu­al­ly have a cod­ing acad­e­my and they need— So I’m try­ing to work with them to teach the inmates maybe to help with this type of match­ing? I think that there’s a lot of untapped oppor­tu­ni­ties for this type of work that I’ve been try­ing to explore, because I do think this is the gat­ing fac­tor for this type of work.

Audience 3: Hi. I work for the New York City Department of Education. And we’ve actu­al­ly been using your arti­cles to teach stu­dents about algo­rith­mic bias—

Angwin: Oh yay!

Audience 3: —so thank you for writ­ing such impor­tant jour­nal­ism that our stu­dents can use. But I was curi­ous, how might we think about empow­er­ing the next gen­er­a­tion of stu­dents to make eth­i­cal deci­sions? Some of this can feel a lit­tle bit hope­less when you see some of it, and how do we make them feel empow­ered?

Angwin: Oh, I love that ques­tion, because I am a strange­ly hope­ful per­son despite my weird job of doing only unhope­ful things. And so I do believe you know… Like, the crim­i­nal risk score algo­rithm is a good exam­ple. If they they fixed… So, after our sto­ry came out, a bunch of math­e­mati­cians and com­put­er sci­en­tists came out with all these papers study­ing our our data set and com­ing up with some the­o­ret­i­cal con­clu­sions. And essen­tial­ly, they all said you know, you could fix this algo­rithm if you were to bal­ance the error rates. Like if you were to choose to opti­mize your algo­rithm to bal­ance the error rates— They’ve cho­sen to opti­mize it anoth­er way, which is for pre­dic­tive accu­ra­cy. So mean­ing it’s cor­rect” in its pre­dic­tions 60% of the time for both black and white defen­dants.

But, when it’s wrong it’s wrong in this com­plete­ly dis­parate way, right. So you could actu­al­ly fix it that way, and all that would happen—the only bad” outcome—was the algo­rithm would be more accu­rate for black defen­dants than white. Which makes sense because there’s actu­al­ly more black defen­dants in our crim­i­nal jus­tice sys­tem.

So, what’s weird is there is a hope­ful out­come, right. There’s like some­thing you could do. Now, I would also, though, like to step back and say I’m not entire­ly sure we should be pre­dict­ing any­one’s crim­i­nal­i­ty in the future. I can’t even pre­dict my hus­band and I’ve been mar­ried to him for a very long time. It’s like pre­dict­ing human behav­ior, like, we can’t even get our maps to get us to the right place most of the time. So is this real­ly where we want to bring com­put­ers to bear, is pre­dict­ing human behav­ior? I feel like this is maybe like a future thing that we’re not going to be so good at, yet.

But I do think that algo­rithms are going to be bet­ter than peo­ple, right. In a lot of ways. But we have to learn how to hold them account­able, and we have to build sys­tems around that. I’m per­fect­ly sure that a car is going to dri­ve bet­ter than me. I’m a pret­ty bad dri­ver, right? So I feel like there is— I don’t want to be a Luddite about it, I just want to say we need sys­tems of over­sight and account­abil­i­ty before we can move for­ward, because oth­er­wise it real­ly will be the Wild West.


Help Support Open Transcripts

If you found this useful or interesting, please consider supporting the project monthly at Patreon or once via Cash App, or even just sharing the link. Thanks.