Anne L. Washington So, my col­leagues today are going to go against every­thing I’m talk­ing about, that we’re not talk­ing enough about data sci­ence. But we just heard two amaz­ing talks that ground us in a lot of dif­fer­ent ways.

So, peo­ple are talk­ing about data sci­ence. And usu­al­ly it’s out there, it’s a big buzz­word. Most peo­ple approach it from this point of view of com­pu­ta­tion and com­put­er sci­ence, or they’re think­ing per­haps of sta­tis­tics. Now, as a com­put­er sci­en­tist who works in pol­i­cy I’m very aware that these con­ver­sa­tions are kind of priv­i­leged and tech­ni­cal. Because if we go even far­ther, maybe we’ll add a few more words but nobody knows what these things are.

And it’s being used in high­ly impor­tant, prac­ti­cal times, in pub­lic pol­i­cy, and I was real­iz­ing there is no way for the tech­ni­cal experts to have con­ver­sa­tions with pol­i­cy­mak­ers in a way that they could actu­al­ly under­stand each oth­er. So dur­ing this year that I have been off I’ve been think­ing about how to teach both peo­ple who are trained in tech­ni­cal parts of data sci­ence, and also pol­i­cy­mak­ers, how we could have a com­mon lan­guage. And then that way we could have these con­ver­sa­tions so we could talk togeth­er. And that the point for these data sci­ence con­ver­sa­tions is to get a lev­el above all the tech­ni­cal jar­gon and dig into deeply about what is our rea­son­ing for our claims? How can we explain our­selves? And so that way peo­ple will know both what the evi­dence is and then what is supporting.

So let’s get back to think­ing about what data sci­ence was in the very begin­ning. Data sci­ence was a thing for sci­ence. If you imag­ine what hap­pened for a big sky sur­vey for astronomers, there’s some­thing click­ing every moment, tak­ing pic­tures of the sky. And they teamed togeth­er and need­ed to effi­cient­ly under­stand what they were look­ing at. So what’s inter­est­ing about this exam­ple for data sci­ence is sci­ence. People were all trained sim­i­lar­ly, so they came from a dis­ci­pli­nary sim­i­lar­i­ty. They were STEM-aware; they already were aware about sci­ence, tech­nol­o­gy, engi­neer­ing, and math—a homo­ge­neous group. And so they were just try­ing to be effi­cient in what they were doing. It was a pret­ty straight­for­ward thing.

Well, now data sci­ence moved into mar­ket­ing. And mar­ket­ing you know, you’re try­ing to fig­ure out which clients are valuable—very impor­tant. And you’re try­ing to fig­ure out which which cus­tomers are which, and how you can make mon­ey off of them. But again it’s still a sin­gle group. We know the famous case where Target was try­ing to fig­ure out who was going to have a baby next, and they dis­rupt­ed some peo­ple’s expec­ta­tions of ethics in that process. And this is what hap­pens when peo­ple can’t jus­ti­fy what is hap­pen­ing with­in the tech­nol­o­gy. There’s a kind of ugh fac­tor” that hap­pens when peo­ple have to be at the result of it. But still, this is mar­ket­ing. It was uncom­fort­able but basi­cal­ly there was no direct harm done.

When you start talk­ing about data sci­ence in the pub­lic sec­tor, we’re in a dif­fer­ent lev­el. So if it’s a mar­ket­ing prob­lem and I don’t get a coupon, I do get a coupon…like so what, right? But this is dif­fer­ent. There always is going to be some con­stituent in pub­lic pol­i­cy who votes who may or may not have got­ten treat­ment or some­thing that they think is impor­tant. The con­se­quences are much big­ger, and the politi­cians and peo­ple who work in the pub­lic sec­tor have to real­ly think about how they’re going to bal­ance these inter­ests together.

At the same time, in the pub­lic sec­tor there’s less mon­ey, there’s a great need. And data sci­ence brings incred­i­ble effi­cien­cies. There’s just no doubt that it brings a lot of ben­e­fits into large orga­ni­za­tions. But they have to bal­ance that.

So the ben­e­fits of data effi­cien­cy are just tremen­dous, right. You can improve your oper­a­tions. You can fig­ure out your process­es. You can pre­dict future or past trends. We know all these things. It can do a lot at once, and that looks real­ly good.

But there are some things that it does that we have to think about dif­fer­ent­ly. Once you have effi­cien­cy, you’re cre­at­ing a clear path towards a spe­cif­ic tar­get. Computers do this beau­ti­ful­ly. And com­put­er sci­en­tists like me do it beau­ti­ful­ly. Like you know, a dog on a bone. Like let me opti­mize this,” right. And we’re going to go straight after our target. 

What we have to think about are the risks of effi­cien­cy, and we saw this beau­ti­ful­ly in what the pre­vi­ous speak­ers were say­ing. You need to know where you’re going on that road and where you’re not going. You have to think about what you’re opti­miz­ing for, and what you’re not opti­miz­ing for. There will be sec­ond and third place win­ners as you move through these ideas. And so it’s impor­tant to start to think about all of these dif­fer­ent things that are going on when we’re talk­ing about data sci­ence and pub­lic policy. 

And that has start­ed to hap­pen. This is the court case— We just heard in Ravi’s talk about crim­i­nal jus­tice sys­tems, and my talk touch­es on that in a lit­tle bit. There was the case against Loomis, who had this issue that these scores were used against him and— Well, they were used in his case and he was unable to know exact­ly what they meant in terms of his case.

At the same time, in the sum­mer of 2016 the US Congress was con­sid­er­ing whether these types of scores should be used in fed­er­al prison, and there was a lot of talk about that. And in comes ProPublica. So, ProPublica came into this debate and did a very impor­tant arti­cle talk­ing about whether these scores were fair or not. And fair in all the beau­ti­ful ways Suchana already demon­strat­ed to you. There are lots of ways to think about fairness. 

Now, what it was that was at issue for ProPublica was fair­ly sim­ple, and this gets back also to what Ravi was talk­ing about. Who needs ser­vices? And how do you under­stand that? So let’s just take a hypo­thet­i­cal, because you’re think­ing about pub­lic safe­ty. So let’s say that Mr. Elliot who lives in EliteTown gets arrest­ed. Now, Mr. Smith who lives in the Sad City also gets arrested. 

Now, they’re going to look at these scores and try to fig­ure out who might need more ser­vices. Now, it turns out EliteTown has a lot of social ser­vices and it’s real­ly tak­en care of. So the risk score for Mr. Elliott might be very low. Whereas Sad City does­n’t have a lot going on, and there might be a greater need for Mr. Smith to have ser­vices because they’re not in his com­mu­ni­ty. So he might get adjust­ed dif­fer­ent­ly for that.

Now, what hap­pens is when the needs assess­ment gets used as a risk assess­ment, and now sud­den­ly Mr. Smith in Sad City is seen as a high­er risk. Now, in the case of the ProPublica arti­cle, what they found was indeed Mr. Elliott who lived in EliteTown— (Those aren’t their real names.) Mr. Elliot who lived in EliteTown actu­al­ly had a five-year record in a dif­fer­ent state. And Mr. Smith nev­er com­mit­ted anoth­er crime. But Mr. Smith had the high­er risk score. So you start to see that once you start break­ing down the rea­son­ing of how these things are built out, you can under­stand it in some dif­fer­ent ways.

What was fas­ci­nat­ing about this argu­ment about COMPAS scores between NorthPointe and ProPublica is that ProPublica said, Ding!” Northpointe said, Dong!” And they went back and forth. What was fas­ci­nat­ing is that data sci­en­tists chose to write about it. This is just a few of the arti­cles that I’ve been read­ing this year. A new one comes out about every six weeks. And they are hav­ing an argu­ment about how do we under­stand the same data set when we look at it dif­fer­ent­ly? What’s impor­tant to us? What are our argu­ments? All around that, but they are using the exact same data. And this got me to think about how we can start hav­ing these con­ver­sa­tions between each oth­er, look­ing at the rea­son­ing that I read in the COMPAS articles.

So, one inter­ven­tion for this is to imag­ine rea­son­ing about the pop­u­la­tions that are at risk. Who might win? Efficiency will always tell you who will win. In the ProPublica case it was very clear—everybody agreed—the algo­rithm found vio­lent offend­ers who are like­ly to reof­fend. That was no doubt. But who might lose? This is where ProPublica argued that there were cer­tain pop­u­la­tions that would lose, and it was this idea of under­stand­ing who was less like­ly to sur­vive from that. 

So part of this is think­ing about rea­son­ing about pri­or­i­ties, right. The pri­ma­ry pri­or­i­ty of Northpointe— They were hired, as we have heard from sev­er­al of my col­leagues. They were hired to build a sys­tem that did one thing. And they were like, We did it. We did the one thing.” There’s no doubt that’s true. But there are sec­ondary pri­or­i­ties that have to be tak­en care of. And those pri­or­i­ties can be a vari­ety of things. One might be the law, for instance. Or pub­lic pol­i­cy. Or reg­u­la­tion. And the rea­son I say that is that a lot of these tools are sold for mul­ti­ple indus­tries, and there­fore they might real­ly not know what are the sub­tleties of reg­u­la­tion in a par­tic­u­lar industry.

It’s also impor­tant to con­sid­er rea­son­ing about impact. Most effi­cien­cy algo­rithms are going to go straight to an imme­di­ate gain in effi­cien­cy for a par­tic­u­lar oper­a­tion. People are sav­ing mon­ey. Everyone’s hap­py. There’s some bot­tom line argu­ment that’s pret­ty imme­di­ate. When we’re talk­ing about risks over­all, the con­ver­sa­tion that needs to hap­pen is about long-term impacts or social impacts, and under­stand­ing this bal­ance between dif­fer­ent groups.

So as I cre­ate this cur­ricu­lum devel­op­ment that could be used in oth­er places, the point of this is to begin to start a con­ver­sa­tion about data sci­ence, and a con­ver­sa­tion where every­one can par­tic­i­pate and it’s not just peo­ple who know cer­tain skill sets. Really to under­stand who can win, who’s at risk, how do we think about the long-term impacts, and have this as an inte­grat­ed expe­ri­ence so that way we can all come to bet­ter solu­tions for data sci­ence that will impact both our soci­ety and our gov­ern­ing. Thank you.