danah boyd: I com­mend the EU Parliament for tak­ing up the top­ic of algo­rith­mic account­abil­i­ty and trans­paren­cy in the dig­i­tal econ­o­my. In the next ten years we will see data-driven tech­nolo­gies recon­fig­ure sys­tems in many dif­fer­ent sec­tors, from autonomous vehi­cles to per­son­al­ized learn­ing, pre­dic­tive polic­ing, to pre­ci­sion med­i­cine. While the advances that we will see will cre­ate phe­nom­e­nal new oppor­tu­ni­ties, they will also cre­ate new chal­lenges, and new wor­ries. And it behooves us to start grap­pling with these issues now so that we can build healthy sociotech­ni­cal systems.

I want to focus my remarks today on a provoca­tive state­ment: I believe that algo­rith­mic trans­paren­cy cre­ates false hope. Not only is it tech­ni­cal­ly unten­able, but it obfus­cates the pol­i­tics that are at stake.

Algorithms are noth­ing more than a set of instruc­tions for a com­put­er to fol­low. The more com­plex the tech­ni­cal sys­tem, the more dif­fi­cult it is to dis­cern why algo­rithms inter­act the way they do. Putting com­plex com­put­er code into the pub­lic domain for every­one to inspect does lit­tle to achieve account­abil­i­ty. Consider, for exam­ple, the Heartbleed vul­ner­a­bil­i­ty that was intro­duced into OpenSSL code in 2011 but was­n’t iden­ti­fied until 2014. Hundreds of thou­sands of web servers relied on this code for secu­ri­ty. Thousands of top-notch com­put­er sci­en­tists worked with that code on a reg­u­lar basis. And none of them saw the prob­lem. Everyone agreed about what the right out­come should be, and tons of busi­ness­es were incen­tivized to make sure there were no prob­lems. And still, it took two and a half years for an algo­rith­mic vul­ner­a­bil­i­ty to be found in plain sight, with the entire source code pub­licly available.

Transparency does not inher­ent­ly enable account­abil­i­ty, even when the stars are aligned. To com­pli­cate mat­ters more, algo­rith­mic trans­paren­cy gets you lit­tle with­out data. Take for exam­ple Facebook’s News Feed. Such sys­tems are designed to adapt to any type of con­tent and evolve based on user feed­back, for exam­ple clicks and likes. When you hear that some­thing is per­son­al­ized, this means that the data that you put into the sys­tem is com­pared to oth­er data already in the sys­tem shared by oth­er peo­ple, and that the results you get are rel­a­tive to the results that oth­ers get. People mis­tak­en­ly assume that per­son­al­ized means that deci­sions are based on your data alone. But quite to the con­trary, the whole point is to put your data in rela­tion­ship to oth­ers’. Even if you require Facebook to turn over the News Feed algo­rithm, you’d know noth­ing with­out the data. Asking Facebook for the data would be a vio­la­tion of user privacy.

Your goal isn’t to have trans­paren­cy for trans­paren­cy’s sake. You want to get to account­abil­i­ty. Most folks think that you need trans­paren­cy to achieve account­abil­i­ty in algo­rithms. I’m not sure that’s true. I do know that we can’t get clos­er to account­abil­i­ty if we don’t know what the val­ues are that we’re aim­ing for. We think that if the process is trans­par­ent, we could see how unfair deci­sions were being made. But we don’t actu­al­ly even know how to define our terms. 

Is it more fair to give every­one equal oppor­tu­ni­ty, or to com­bat inequity? Is it bet­ter for every­one to have access to con­tent shared by their friends, or should hate speech be cen­sored? Who gets to decide? We have a lot of hard work to define our terms, that in many ways sep­a­rate the hard work of under­stand­ing algo­rith­mic process­es from the hard work that we have to deal with in terms of our social issues. If we can’t define our terms, we’re not going to be able to suc­ceed in algo­rith­mic accountability.

Personally I’m excit­ed by the tech­ni­cal work that is hap­pen­ing in an area known as fair­ness, account­abil­i­ty, and trans­paren­cy in machine learn­ing. An exam­ple rem­e­dy in this space was pro­posed by a group of com­put­er sci­en­tists who were both­ered by how hir­ing algo­rithms learned the bias­es of those who were in the train­ing data. They renor­mal­ized the train­ing data so that pro­tect­ed cat­e­gories like race and gen­der could­n’t be dis­cerned through prox­ies. To do so they relied heav­i­ly on legal frames in the United States that define equal oppor­tu­ni­ty in employ­ment, mak­ing it very clear what the terms of fair­ness are. And they could com­pu­ta­tion­al­ly pro­tect them through the same tech­ni­cal mech­a­nisms as the law. This kind of rem­e­dy shows the ele­gant mar­riage of tech­nol­o­gy and pol­i­cy to achieve agreed-upon ends.

No one, least of all a typ­i­cal pro­gram­mer, believes that com­put­er sci­en­tists should be mak­ing the final deci­sions about how trade­offs are being used to decide soci­etal val­ues. But at the end of the day, it’s com­put­er sci­en­tists who are pro­gram­ming those val­ues into the sys­tem. And if they don’t have clear direc­tion, they’re going to build some­thing that upsets some­body, around the world. Take for exam­ple sched­ul­ing soft­ware. Programmers have been told to max­i­mize retail­er effi­cien­cy by spread­ing labor out as much as pos­si­ble. This is the goal that they are told to opti­mize for. But it means that work­ers’ sched­ules are all over the place, that chil­dren suf­fer, that work­ers do dou­ble shifts with­out sleep, etc. The prob­lem isn’t the algo­rithm. It’s how it’s deployed. What max­i­miza­tion goals it uses. Who got to define them. And who has the pow­er to adjust the sys­tem. If we’re going to deploy these sys­tems, we need to clear­ly artic­u­late what the val­ues are that we believe are impor­tant. And then we need to hold those sys­tems account­able for build­ing to those standards.

The increas­ing wide­spread use of algo­rithms makes one thing crys­tal clear: our social doc­trine is not well-articulated. We believe in fair­ness, but we can’t even define it. We believe in equi­ty, but…not if cer­tain peo­ple suf­fer. We believe in jus­tice, but we accept process­es that sug­gest oth­er­wise. We believe in democ­ra­cy, but our imple­men­ta­tion is flawed. Computer sci­en­tists depend on clar­i­ty when they design and deploy algo­rithms. If we aren’t clear with what we want, account­abil­i­ty does­n’t stand a chance.