danah boyd: I commend the EU Parliament for taking up the topic of algorithmic accountability and transparency in the digital economy. In the next ten years we will see data-driven technologies reconfigure systems in many different sectors, from autonomous vehicles to personalized learning, predictive policing, to precision medicine. While the advances that we will see will create phenomenal new opportunities, they will also create new challenges, and new worries. And it behooves us to start grappling with these issues now so that we can build healthy sociotechnical systems.
I want to focus my remarks today on a provocative statement: I believe that algorithmic transparency creates false hope. Not only is it technically untenable, but it obfuscates the politics that are at stake.
Algorithms are nothing more than a set of instructions for a computer to follow. The more complex the technical system, the more difficult it is to discern why algorithms interact the way they do. Putting complex computer code into the public domain for everyone to inspect does little to achieve accountability. Consider, for example, the Heartbleed vulnerability that was introduced into OpenSSL code in 2011 but wasn’t identified until 2014. Hundreds of thousands of web servers relied on this code for security. Thousands of top-notch computer scientists worked with that code on a regular basis. And none of them saw the problem. Everyone agreed about what the right outcome should be, and tons of businesses were incentivized to make sure there were no problems. And still, it took two and a half years for an algorithmic vulnerability to be found in plain sight, with the entire source code publicly available.
Transparency does not inherently enable accountability, even when the stars are aligned. To complicate matters more, algorithmic transparency gets you little without data. Take for example Facebook’s News Feed. Such systems are designed to adapt to any type of content and evolve based on user feedback, for example clicks and likes. When you hear that something is personalized, this means that the data that you put into the system is compared to other data already in the system shared by other people, and that the results you get are relative to the results that others get. People mistakenly assume that personalized means that decisions are based on your data alone. But quite to the contrary, the whole point is to put your data in relationship to others’. Even if you require Facebook to turn over the News Feed algorithm, you’d know nothing without the data. Asking Facebook for the data would be a violation of user privacy.
Your goal isn’t to have transparency for transparency’s sake. You want to get to accountability. Most folks think that you need transparency to achieve accountability in algorithms. I’m not sure that’s true. I do know that we can’t get closer to accountability if we don’t know what the values are that we’re aiming for. We think that if the process is transparent, we could see how unfair decisions were being made. But we don’t actually even know how to define our terms.
Is it more fair to give everyone equal opportunity, or to combat inequity? Is it better for everyone to have access to content shared by their friends, or should hate speech be censored? Who gets to decide? We have a lot of hard work to define our terms, that in many ways separate the hard work of understanding algorithmic processes from the hard work that we have to deal with in terms of our social issues. If we can’t define our terms, we’re not going to be able to succeed in algorithmic accountability.
Personally I’m excited by the technical work that is happening in an area known as fairness, accountability, and transparency in machine learning. An example remedy in this space was proposed by a group of computer scientists who were bothered by how hiring algorithms learned the biases of those who were in the training data. They renormalized the training data so that protected categories like race and gender couldn’t be discerned through proxies. To do so they relied heavily on legal frames in the United States that define equal opportunity in employment, making it very clear what the terms of fairness are. And they could computationally protect them through the same technical mechanisms as the law. This kind of remedy shows the elegant marriage of technology and policy to achieve agreed-upon ends.
No one, least of all a typical programmer, believes that computer scientists should be making the final decisions about how tradeoffs are being used to decide societal values. But at the end of the day, it’s computer scientists who are programming those values into the system. And if they don’t have clear direction, they’re going to build something that upsets somebody, around the world. Take for example scheduling software. Programmers have been told to maximize retailer efficiency by spreading labor out as much as possible. This is the goal that they are told to optimize for. But it means that workers’ schedules are all over the place, that children suffer, that workers do double shifts without sleep, etc. The problem isn’t the algorithm. It’s how it’s deployed. What maximization goals it uses. Who got to define them. And who has the power to adjust the system. If we’re going to deploy these systems, we need to clearly articulate what the values are that we believe are important. And then we need to hold those systems accountable for building to those standards.
The increasing widespread use of algorithms makes one thing crystal clear: our social doctrine is not well-articulated. We believe in fairness, but we can’t even define it. We believe in equity, but…not if certain people suffer. We believe in justice, but we accept processes that suggest otherwise. We believe in democracy, but our implementation is flawed. Computer scientists depend on clarity when they design and deploy algorithms. If we aren’t clear with what we want, accountability doesn’t stand a chance.
Transparency ≠ Accountability, an adapted version of these comments.