Moritz Hardt: So I’m a com­put­er sci­en­tist, and I called this response Occupy Algorithms: Will Algorithms Serve the 99%.” And more than sort of a dis­cus­sion of what’s been said so far this is a kind of research pro­pos­al of what I would like to see hap­pen­ing at the inter­sec­tion of CS and this audi­ence.

So, what Frank is get­ting at in my opin­ion is sort of the ques­tion have albums in finance failed to deliv­er on the promise of greater effi­cien­cy, trans­paren­cy, and fair­ness? And so what I want to address in this response is gee, what have com­put­er sci­en­tist done about it, right, if any­thing? And I picked out three works that I think are very close­ly relat­ed to what Frank talked about.

The first one’s about ensur­ing fair­ness in clas­si­fi­ca­tion as a press­ing theme in Frank’s arti­cle. The sec­ond one is on thwart­ing gam­ing and manip­u­la­tion. And gam­ing came up I think so far in every talk and every response, so I think this is also relat­ed. And final­ly I want to talk about the role of com­pu­ta­tion­al com­plex­i­ty in finan­cial mar­kets and finan­cial prod­ucts, and the dif­fi­cul­ty that this intro­duces in the reg­u­la­to­ry process and the pric­ing process. And I hope you all remem­ber what com­pu­ta­tion­al com­plex­i­ty is from Bob’s talk and what the word intractable means.

So I should say this is not an exhaus­tive list. There are many more fas­ten­ing papers in the CS lit­er­a­ture. These are just some that I thought were real­ly relat­ed. And yeah, each is just just a two‐minute teas­er on these papers. I just have a few min­utes, so I prob­a­bly won’t make any true state­ments. But you can look at the papers if you actu­al­ly want to see the truth.

So, the first paper I want to men­tion, we called it Fairness Through Awareness.” This was joint work with Cynthia Dwork, Omer Reingold, Toni Pitassi, and Rich Zemel from about two years ago. And the prob­lem that we’re get­ting at from sort of a math­e­mat­i­cal per­spec­tive was how do we pre­vent a clas­si­fi­er or clas­si­fy­ing algo­rithm from dis­crim­i­nat­ing based on pro­tect­ed attrib­ut­es such as you know, race, or reli­gion, or gen­der? And the run­ning exam­ple in Frank’s arti­cle would be the cred­it loan appli­ca­tion where you’re con­cerned about for exam­ple steer­ing minori­ties into less‐advantageous cred­it card offers. So this was also a moti­vat­ing exam­ple for us, but also the tar­get­ed adver­tis­ing exam­ple that we saw yes­ter­day is real­ly good here, and fit­ting.

So, let me just sort of recall of what an insuf­fi­cient solu­tion con­cept is, just because a lot of com­pa­nies will actu­al­ly tell you that that’s what they’re doing. And I call this blind­ness to the pro­tect­ed attribute. So, peo­ple will tell you, Oh, we don’t even look at the race attribute. How could our algo­rithm be dis­crim­i­nat­ing if race doesn’t show up any­where in the data and we nev­er looked at that attribute?” And as every machine learn­ing per­son knows, there are lots of redun­dant encod­ings of race in the data. You don’t need to see the attribute to be able to reli­ably pre­dict it. So this doesn’t work giv­en our under­stand­ing of machine learn­ing. It’s just not a solu­tion.

So what we end­ed up doing is we came up with a new notion which we called individual‐based fair­ness. And it’s a for­mal­iza­tion of this idea here, which says that any two indi­vid­u­als who are sim­i­lar with respect to the spe­cif­ic clas­si­fi­ca­tion task that you’re inter­est­ed in should receive sim­i­lar out­comes or oth­er sim­i­lar dis­tri­b­u­tion over out­comes. And this is just an infor­mal state­ment of what our for­mal notion says. But what’s impor­tant here, what should strike you as impor­tant, is that it requires the def­i­n­i­tion of a task‐specific met­ric or task‐specific sim­i­lar­i­ty mea­sure between indi­vid­u­als that allows you to com­pare indi­vid­u­als.

So this is real­ly a cen­tral part of this def­i­n­i­tion, and so I want to empha­size this a lit­tle bit more. So what’s this met­ric? We think of it as exter­nal­ly imposed and open to the pub­lic. It’s some­thing that peo­ple thought about, and as such it pro­vides trans­paren­cy. This is some­thing that peo­ple who’re inter­est­ed in the issue of fair­ness could dis­cuss and influ­ence. So this is some­thing that Frank likes because it gives you a way to reg­u­late the clas­si­fi­ca­tion process.

On the oth­er hand, the met­ric does not dic­tate clas­si­fi­ca­tion. There could be many clas­si­fiers, many very dif­fer­ent clas­si­fiers, that all con­sis­tent with the same met­ric. So in par­tic­u­lar it still allows the bank to opti­mize their secret, pro­pri­etary strat­e­gy. And this is some­thing that Tal real­ly thinks is impor­tant for com­pe­ti­tion. So this met­ric gives you a way to go between these stand­points and get a trade­off.

But one open prob­lem that we always face and to me as a com­put­er sci­en­tist I can’t ful­ly solve is, in any con­crete set­ting who gets to define the met­ric, and how? And this is real­ly some­thing very inter­dis­ci­pli­nary where we need to rely on social insight and peo­ple from var­i­ous dis­ci­plines here in this room. And that’s sort of an open prob­lem I have for you. So this is where my work as a com­put­er sci­en­tist sort of ends. If you give me some­thing like this, if you can think about what it might be in var­i­ous set­tings that you’re inter­est­ed in, I can give you a use­ful algo­rith­mic frame­work.

So just as a sort of a car­toon of this thing, there is the set of indi­vid­u­als. There could be two indi­vid­u­als, they are relat­ed by the met­ric, so they’re at a cer­tain dis­tance. And then the frame­work would make sure that the dis­tri­b­u­tion over out­comes that they receive is sim­i­lar to the extent that they’re sim­i­lar with respect to the met­ric. This is just a car­toon. You can look in the paper for the actu­al def­i­n­i­tion.

So I want to move on, and I want to get to the sec­ond thing I men­tioned. And this is about gam­ing clas­si­fiers and how you might be able to pre­vent it on an algo­rith­mic lev­el. And one thing that I real­ly like is this wis­dom from finan­cial pol­i­cy­mak­ing called Goodhart’s Law. And this start­ed my inter­est in this ques­tion a long time ago. And Goodhart’s Law says that any observed sta­tis­ti­cal reg­u­lar­i­ty will tend to col­lapse once pres­sure is placed upon it for con­trol pur­pos­es.” And if you trans­late this into machine learn­ing jar­gon, what it says is if you have a clas­si­fi­er that relies on some sta­tis­ti­cal pat­tern, and you expose the clas­si­fi­er to the pub­lic, the pat­tern will go away. The pat­terns will change. And there’s sort of a corol­lary that banks always say is, Well gee, because of this thing we need to make sure every­thing we do is very secret, because once we expose our sta­tis­ti­cal insight it dis­ap­pears. It becomes use­less and the clas­si­fi­er becomes easy to game.” So this is sort of a case they’re mak­ing for opac­i­ty. And it’s not entire­ly false.

What I’m inter­est­ed in as sort of a math guy is well, is that real­ly true, right? Is Goodhart’s Law a law as in you know, the law of large num­bers. Something you can prove under some rea­son­able for­mal­iza­tion. Or is it not actu­al­ly reli­ably true? Can we some­how come up with clas­si­fies that you can expose to the pub­lic and they’re still hard to game?

And this is sort of an ongo­ing inves­ti­ga­tion with Brendan Juba and Omer Reingold. We sort of take a kind of a game the­o­ret­ic point of view. We fac­tor in the cost of chang­ing one’s attrib­ut­es in the clas­si­fi­ca­tion process. For exam­ple get­ting, a col­lege degree is much hard­er than buy­ing a bunch of books. So we take the cost of chang­ing one’s attrib­ut­es into account as well as the individual’s bud­get for any giv­en clas­si­fi­ca­tion task. People are only will­ing to expend so much effort or mon­ey.

And once we have these sort of game the­o­ret­ic ingre­di­ents we can talk about design­ing deci­sion bound­aries or clas­si­fiers in such a way that a typ­i­cal negatively‐classified per­son does not have the means or the bud­get to force a pos­i­tive clas­si­fi­ca­tion by any change of attrib­ut­es. And this is true even if the clas­si­fi­er is exposed. So that’s an impor­tant point. So it allows you to inves­ti­gate which clas­si­fiers are hard to game even if you expose them. I think this is a kind of start­ing point to get at some of the issues we’re talk­ing about here from a for­mal per­spec­tive.

So final­ly I want to men­tion a fas­ci­nat­ing paper that unfor­tu­nate­ly I didn’t coau­thor. But my advis­er did and a bunch of oth­er great peo­ple. So it’s by Sanjeev Arora, Boaz Barak, Markus Brunnermeier, and Rong Ge. And it’s about intractabil­i­ty in the finan­cial deriv­a­tive mar­ket. And the slides are actu­al­ly by Sanjeev.

So here is a quick recap of the finan­cial crash in 2008. So basi­cal­ly what hap­pened is that the deriv­a­tive mar­ket or it’s total val­ue had become far greater than the world’s GDP. And so the hard les­son that we learned is that even small errors in pric­ing deriv­a­tives lead to huge effect on the econ­o­my just because the mar­ket is so enor­mous.

And so what these guys were inter­est­ed in is the sort of intrin­sic com­pu­ta­tion­al com­plex­i­ty of pric­ing finan­cial deriv­a­tives. And here’s an infor­mal state­ment of the fas­ten­ing the­o­rem that they proved. They showed that eval­u­at­ing the fair val­ue of a derivative—a finan­cial deriv­a­tive such as a CDO is an intractable prob­lem. So I don’t want to quiz the audi­ence on what intractable means, but it means accord­ing to Bob that you can­not solve it in poly­no­mi­al time. So even with a sig­nif­i­cant increase in com­pu­ta­tion­al pow­er, even in twen­ty years you can­not solve this prob­lem.

Moreover, what’s even more oppress­ing here is that the design­er of the CDOs can tam­per the prod­uct in such a way that it is com­pu­ta­tion­al­ly hard to detect the tam­per­ing. And these two results rely on the com­pu­ta­tion­al intractabil­i­ty of the Dense Subgraph prob­lem. So this is a beau­ti­ful, well‐defined com­bi­na­to­r­i­al opti­miza­tion prob­lem that peo­ple have stud­ied for a long time and believe is quite hard.

23

And I want to just briefly show you where the Dense Subgraph comes in. So this a lit­tle bit of CS. So this is basi­cal­ly the point of view of the deriv­a­tive design­er. He has a bunch of asset class­es; there are N asset class­es. And he knows that a bunch of them are lemons. And because he’s the design­er of the deriv­a­tive he might have some infor­ma­tion about which asset class­es are lemons. And there’s the buy­er and the reg­u­la­tor who’s like, Well gee, which of these assets are lemons? I don’t know but I want to find out.”

Now what the design­er of the deriv­a­tive can do, he can cre­ate prod­ucts on these CDOs which are basi­cal­ly just bun­dles of asset class­es. So this is indi­cat­ed by these ver­tices on the bot­tom of these ver­tices on the top. And now basi­cal­ly what the design­er wants to do, he wants to pack as many lemons into as few prod­ucts as pos­si­ble, to get rid of the lemons. And of course this has a pret­ty dev­as­tat­ing out­come on the expect­ed default rate of these tam­pered prod­ucts.

And this dense sub­graph, this pack­ing between lemons and prod­ucts, that’s exact­ly what’s hard to find. So if the graph is large enough, then this dense sub­graph that’s con­tained in this bipar­tite graph is actu­al­ly hard to detect. So the reg­u­la­tor will say, Oh no, it’s the Dense Subgraph prob­lem. I can’t detect this.”

So this is just a lit­tle bit of com­bi­na­to­r­i­al opti­miza­tion, how it tells you some­thing about how hard it is to price finan­cial prod­ucts. And not just sort of on an intu­itive lev­el, like fun­da­men­tal­ly, math­e­mat­i­cal­ly there’s noth­ing you can do about it. You can’t fig­ure out what the fair price is.

So let me wrap up. So here’s my pro­pos­al, again it’s Occupy Algorithms. And what I want to sug­gest is that you guys should help us com­put­er sci­en­tists in find­ing for­mal def­i­n­i­tions and mod­els of the prob­lems that you care about. Because as Bob said yes­ter­day, we’re not going to solve a prob­lem that’s not well defined, that’s not in a math­e­mat­i­cal mod­el. On the oth­er hand find­ing mod­els that’s you know, often most of the work. That’s extreme­ly dif­fi­cult and we need peo­ple who have the domain knowl­edge to to tell us what the right mod­els might be. We can do the for­mal part but we need insight on what a good mod­el should do. So this is an inter­dis­ci­pli­nary thing.

And if we do that we can make social con­straints a first‐order prob­lem in algo­rithm design. And this is actu­al­ly not unre­al­is­tic. This has hap­pened in the past, for exam­ple there’s lots of lit­er­a­ture on privacy‐preserving algo­rithm which I find quite fas­ci­nat­ing. And on truth­ful mech­a­nism design, where the goal is to cope with strate­gic behav­ior. So there are areas in CS where yes, social con­straints a first‐order con­straints and we can cre­ate more of these cas­es. So with that I want to thank you for your atten­tion.

Further Reference

The Governing Algorithms conference site with full schedule and downloadable discussion papers.


Help Support Open Transcripts

If you found this useful or interesting, please consider supporting the project monthly at Patreon or once via Square Cash, or even just sharing the link. Thanks.