Dan Taber: So on behalf of my group I’m extreme­ly excit­ed to be pre­sent­ing our prod­uct, which is known as AI Blindspot. I actu­al­ly want­ed to start by just reflect­ing a lit­tle bit. Looking back on March, we start­ed this pro­gram in the mid­dle of March with an ini­tial two weeks. And right after that, there was an amaz­ing flur­ry of activ­i­ty, where major tech com­pa­nies were mak­ing all these bold dec­la­ra­tions of major…statements that they were mak­ing of what they were doing to address AI ethics. 

In the span of two days, you had Microsoft announc­ing they were going to add an ethics check­list to their prod­uct release. You had Google announc­ing their now-infamous ethics pan­el. You had Amazon announc­ing a part­ner­ship with National Science Foundation to study AI fairness. 

But, as I was watch­ing these head­lines come out, you got the sense that this is what was real­ly going on: 

Because right about the same time, I was start­ing to have con­ver­sa­tions with sev­er­al tech com­pa­nies, with peo­ple who were involved in ethics ini­tia­tives, where even ones who were real­ly try­ing to do their best to imple­ment changes to study bias in AI sys­tems, and…frankly a lot of them just did­n’t know what they were doing, as they open­ly admit­ted. Because there were just no struc­tures or process­es for assess­ing bias in AI sys­tems. There were a lot of tools that had come out in the past year, includ­ing those that came out of Assembly, like the Data Nutrition Label and oth­ers like Model Cards for Model Reporting from Google. But you need both schools and you need struc­tures and process­es. And the struc­tures and process­es are what we decid­ed to address with AI Blindspot.

AI Blindspot is a dis­cov­ery process for spot­ting uncon­scious bias­es and struc­tur­al inequal­i­ties in AI sys­tems. There are a total of nine blindspots—and when I say blindspot I’m refer­ring to over­sights that can hap­pen in a team’s nat­ur­al day-to-day oper­a­tions dur­ing the course of plan­ning, build­ing, and deploy­ing AI systems. 

There are a total of nine blindspots you see in this dia­gram on the left. It all starts with pur­pose right in the mid­dle, because every­thing in AI sys­tems should always come back to a pur­pose and what it’s being designed for. And then if you start at the low­er left and go clock­wise around, the oth­er blindspots are rep­re­sen­ta­tive data, abus­abil­i­ty, pri­va­cy, dis­crim­i­na­tion by proxy, explain­abil­i­ty, opti­miza­tion cri­te­ria, gen­er­al­iza­tion error, and right to con­test. And again, these are all cas­es where over­sights can lead to bias in AI sys­tems that in most cas­es are gonna harm vul­ner­a­ble pop­u­la­tions due to unin­tend­ed consequences. 

But AI Blindspot, it’s not just a fan­cy dia­gram with lots of nice col­ors. We want­ed to turn it into an actu­al tool that teams could use. So we cre­at­ed these blindspot cards. And we cre­at­ed these because we want­ed to design some­thing that was a lit­tle bit more acces­si­ble. There are also a lot of impact assess­ment tools that are com­ing out, and I can say from per­son­al expe­ri­ence that they’re pret­ty cold and tech­ni­cal. We want­ed to cre­ate some­thing that was a lit­tle bit more light and acces­si­ble, that teams would be a lit­tle bit less intim­i­dat­ed by. 

By the way, this pho­to­graph is cour­tesy of our pro­fes­sion­al pho­tog­ra­ph­er Jeff and our pro­fes­sion­al hand mod­el Ania. 

The spread layout of a blindspot card

I’ll walk you through the lay­out of the cards. The left rep­re­sents the front side of the card, and the right rep­re­sent the back side. So it starts in the front with a descrip­tion of what this blindspot is, doing our best to phrase it in non-technical lan­guage so we can reach dif­fer­ent audiences. 

And then on the back side, we have a have you con­sid­ered?” sec­tion that talks about some of the steps you can take to address this blindspot. So in the case of explain­abil­i­ty exam­ples could include sur­vey­ing individuals—users—on whether they actu­al­ly trust rec­om­men­da­tions made by your AI sys­tem; con­sid­er­ing dif­fer­ent types of mod­els that are maybe more explain­able than oth­ers; fac­tor­ing in the stakes of the decision—are you just rec­om­mend­ing a movie to some­body or are you decid­ing whether some­body’s going to get a home loan or not? And then poten­tial­ly mod­el­ing coun­ter­fac­tu­al sce­nar­ios that enable peo­ple to see what would have to change in order to achieve a more desired outcome. 

And then we pro­vide a case study to give a real-world exam­ple of where this blindspot aris­es and potentially—or in many cas­es has harmed vul­ner­a­ble pop­u­la­tions due to over­sights the com­pa­nies made. 

And then there’s the have you engaged with?” sec­tion that high­lights spe­cif­ic peo­ple or orga­ni­za­tions that you may want to con­sult with due to their exper­tise, either with­in your own com­pa­ny or orga­ni­za­tion, or outside. 

And then there’s a take a look” sec­tion that pro­vides a QR code that’ll take you to dif­fer­ent resources that’ll help you address this blindspot. 

And then this shows our web site that… It’s amaz­ing actu­al­ly that this video I record­ed this morn­ing, it’s now out of date because Jeff keeps mak­ing so many changes to the web site. But it shows the cards, shows Ania’s hands, and then enables users to just explore the dif­fer­ent blindspot cards. And if you click on one, like explain­abil­i­ty here, it’ll show you the same con­tent of the card, and it’ll show you the actu­al resources that’re behind that QR code. You can take the link to dif­fer­ent places to learn more about this blindspot or how to address it. 

And then a what is miss­ing?” but­ton, where you can pro­vide sug­ges­tions. It’s a good thing we added this, because we’ve actu­al­ly got­ten feed­back already. We got our first feed­back from some­body at the University of Washington who I think most­ly had good things to say, fortunately. 

So with that we want to give an exam­ple of a case study of how this could be applied in the real world. this is a semi-fic­tion­al case study. It may or may not have been informed by an actu­al inci­dent that hap­pened at a major tech com­pa­ny I may have men­tioned ear­li­er in the presentation. 

But, so hypo­thet­i­cal­ly let’s say there’s a tech com­pa­ny that has a lot of inter­nal data on their his­tor­i­cal hir­ing prac­tices. And so they want to use AI to iden­ti­fy can­di­dates for soft­ware engi­neer­ing jobs. So they go their data sci­ence team and they say, Okay, we want you to build a mod­el that’ll help us screen through resumes so we can fill these soft­ware engi­neer­ing jobs.” 

So the data sci­ence team does that. They build a mod­el, they deploy it. But then they real­ize that they’re just get­ting white men being rec­om­mend­ed. So what hap­pened there? And more specif­i­cal­ly how could AI Blindspot have pre­vent­ed this? So I’m going to give exam­ples of one card from each of the three stages, the plan­ning, build­ing, and deploy­ing stage. 

The Purpose card, with text "AI systems should make the world a better place. Defining a shared goal guides decisions across the lifecycle of an algorithmic decision-making system, promoting trust amongst individuals and the public."

Again, it all starts with pur­pose and real­ly ask­ing your­self what are we try­ing to accom­plish here? This would involve talk­ing to the team about why do you want to use AI? Like are you just try­ing to get through resumes faster? Or are you try­ing to iden­ti­fy bet­ter can­di­dates? Or are you try­ing to increase diver­si­ty? And then real­ly ask­ing your­self is AI real­ly designed to achieve all those three goals. 

If you just want to get through resumes as fast as pos­si­ble then AI may be able to help you with that. But if you want to iden­ti­fy bet­ter can­di­dates, you would have to ques­tion your his­tor­i­cal hir­ing prac­tices. And cer­tain­ly if you want to increase diver­si­ty AI may not be the right tool for that so we would encour­age seems to real­ly ques­tion if AI is even suit­ed to their pur­pose. But in this case let’s say that the team says okay, it’s num­ber two. We real­ly want the best can­di­dates, and we real­ly think AI can do that.

The Discrimiation by Proxy card, with text "An algorithm can have an adverse effect on vulnerable populations even without explicitly including protected characteristics. This often occurs when a model includes features that are correlated with these characteristics."

So we move on to the build­ing stage and then address the issue of dis­crim­i­na­tion by proxy. That refers to sit­u­a­tions where you may not include fea­tures like race or gen­der or oth­er pro­tect­ed class­es in your mod­el, but you may have oth­er fea­tures that are so high­ly cor­re­lat­ed with race or gen­der, such as his­tor­i­cal­ly black col­leges or all-women’s col­leges, or sports lacrosse that white men are more drawn to. And fea­tures that are so cor­re­lat­ed that it ulti­mate­ly leads to dis­crim­i­na­tion. And we would encour­age the team to con­sult with social sci­en­tists or human rights advo­cates who are just more knowl­edge­able about his­tor­i­cal bias­es and can help you iden­ti­fy cer­tain fea­tures that may be prob­lem­at­ic that could lead to discrimination. 

The Generalization Error card, with text " Between building and deploying an AI system, conditions in the world may change or not reflect the context in which the system was designed, such that training data are no longer representative."

So let’s say the team has done that and now we move on to the deploy­ing stage. And in this case I’m even going to give the com­pa­ny the ben­e­fit of the doubt and say that they actu­al­ly want to increase diver­si­ty and they real­ize that AI can’t do that, so they real­ize they have to go back and fix their recruit­ing pipeline first by get­ting more diverse can­di­date pools. And then maybe they think okay now AI can help us increase diversity. 

But that’s not actu­al­ly the case because that brings up the issue of gen­er­al­iza­tion error, where if you have a his­to­ry of not recruit­ing diverse can­di­dates and now you do recruit diverse can­di­dates, the mod­el that was built on his­tor­i­cal data is not going to be set up to eval­u­ate new can­di­dates with dif­fer­ent back­grounds. So you’d have to con­sid­er some­thing like maybe an anom­aly detec­tor that enables you to iden­ti­fy cir­cum­stances like can­di­dates that have more unique back­grounds, where AI’s just not suit­ed to or where you do need a human to review. 

These are just sug­ges­tions but this gives you some ideas of ways that teams could work through the plan­ning, build­ing, and deploy­ment of AI sys­tems to iden­ti­fy their blindspots and brain­storm how to address them. 

So with that said what’s next for us as a team? We have a lot of ideas of poten­tial use cas­es for this, some of which we got from peers in our cohort. I can see a lot of poten­tial uses in dif­fer­ent set­tings such as a prod­uct man­ag­er lead­ing design sprint, and see­ing poten­tial use of the blindspot cards to help through the design think­ing process. 

It could be a new Director of Data Science at the start up where there aren’t real­ly struc­tures or process­es for how data sci­en­tists go about their job, and blindspot cards could poten­tial­ly help guide data sci­en­tists work. 

On the oth­er hand it could be a city task force that’s respon­si­ble for audit­ing AI sys­tems but has a less tech­ni­cal back­ground, and sim­i­lar­ly needs some sort of guid­ance on what blindspots to be look­ing for. 

An there could be oth­er poten­tial uses as well. So our plan for next steps is to engage with users, doing user stud­ies to fig­ure out what the best audi­ences were, and then hope­ful­ly get­ting tes­ti­mo­ni­als from orga­ni­za­tions where the blindspot cards have been help­ful in help­ing them assess bias in their AI systems. 

And then in the grand scheme hop­ing that this could be a part of some cer­ti­fi­ca­tion process through an orga­ni­za­tion like IEEE, pos­si­bly set­ting up some stan­dards where say if an orga­ni­za­tion has process­es like AI Blindspot com­bined with tools like the Data Nutrition Label, they could cer­ti­fy them­selves as using AI respon­si­bly. That’s kind of our long-term goals that won’t hap­pen any­time soon, but we see a lot of poten­tial for what this could do. 

But we want­ed to close with the jok­er card. That’s one of the blindspot cards. All of you, by the way, should pick up a set of blindspot cards. We have them at our table out­side. But the jok­er card kind rep­re­sents the idea of the unknown unknowns. You know, we’ve iden­ti­fied these nine blind spots but there are oth­er blindspots too that we prob­a­bly did­n’t think of. We’ve iden­ti­fied poten­tial use cas­es but there may be oth­er ones that we haven’t thought of yet that maybe some of you in the audi­ence think up. So, def­i­nite­ly come talk to us if you have ideas for where and how this could be applied, because we real­ly see poten­tial to kind of help those orga­ni­za­tions I was talk­ing about at the begin­ning that real­ly want to eval­u­ate their sys­tems for bias as best they can and just don’t know how to do it. So with that, I thank you very much.