This discussion follows from Lucas Introna’s presentation of his draft paper “Algorithms, Performativity and Governability,” and responses from Matthew Jones and Lisa Gitelman. A recording of Gitelman’s response is unavailable, but her written comments are available on the Governing Algorithms web site.
Solon Barocas: Thanks so much. I will offer Lucas the opportunity to respond, if he cares to?
Lucas Introna: Yeah, I just want to be clear that I’m not saying that the details of the algorithms are irrelevant. In a way they can matter very much, and you know, in a certain circumstance, in a certain situated use, it might matter significantly what the algorithm does but we can’t say that a priori. So we need to both open up the algorithms, we need to understand them as much as possible, but we must not be seduced to believe that if we understand them therefore we know what they do. That’s the shift, that’s the dangerous shift.
So for example I think it’s really relevant that I know that in the Turnitin detection system, it essentially uses a certain technique for identifying the sequence of character strings. And because I know that, I can understand how certain editing procedures by students when they write over their text, how that might make them either detectable not detectable. And that helps me to understand the sort of performativity that might flow from the actual use of that. So I do think knowing the algorithm is necessary, but it’s irreducible, of course.
I think the point about the proxies is really valid. And I didn’t sort of make the point at the end but I think there is a real issue with the fact that we will only know… Yeah. We’re all in the same space of ignorance, as it were. And we will only know what we govern when we engage with it. And when we engage with it, we will of course also be enacting changes and there would be a response to that, etc. So in a sense governance is experimentation in a certain way. So there’s a certain experimentation that is implied in governing, which I think was a good point you [Matthew Jones] made.
Barocas: Great. So I should have mentioned that people can line up. I’ll take questions from the floor. And just in the interest of time I’m actually going to take two at a time, and we’ll let the respondents handle them in one go. So, please.
Audience 1: Okay, two at a time. This is related to the knowing the algorithm and being governed by algorithms. I just wanted to point out sort of an analogue. Right now it’s potentially possible that we can catch every single time you speed going down the highway. Every single time you go over 65, I can put a black box recorder and ticket you every single time. The second your parking meter goes off, I can ticket you. And that’s going to be very very very possible when cameras are three dollars and you have these progressive things.
Here’s the question. I was talking to you, Solon, about this. What is our desired sphere of obscurity? Or non-scrutinizability? Do we want to guarantee ourselves a certain amount of lawbreaking? One of the things the algorithms— More data is available about us. You say you can read into a person’s personal preferences because it’s now exposed. Before we had obscurity because we just didn’t know. Now, do we want to try and guarantee some level of obscurity for people, some freedom like that? It’s not necessarily the algorithm’s fault, it’s because the data is now available for the algorithms to use. This can be used in many spheres, not just search. We can now put cameras and watch everybody working, all your work email is monitored, we can do app monitoring. We do— But companies choose not to look at it because they don’t want to know. And that’s what kind of creates this sphere now. But I don’t know how we would address that. How do we create you know, a sphere of obscurity?
Lev Manovich: Lev Manovich, professor of computer science, CUNY Graduate Center. So, my question is about what I see as maybe the key kind of dimension of this day so far, with transparency versus opacity, and I think your notion of flux connects to that. So as it was already pointed out, right, no real software system involves a single algorithm, right. There are hundreds of algorithms. Plus servers, plus databases. So that’s one challenge. The second challenge, the systems are very complex, right. So Gmail’s about 50 million lines of code. You know, Windows, hundred million lines of code. So, no single programmer can actually examine it.
But I want to point out with her challenge, which I think is kinda the elephant in the room because I haven’t heard anybody address it so far. Most…a very large proportion of contemporary software systems—search engines, recommendation systems, reservation systems, pricing systems…they’re not algorithms in a conventional sense where there’s a set of instructions, you can understand them. So, even if you publish those algorithms…it doesn’t do you any good because we use what in computer science is called supervised machine learning. Meaning that there is a set of inputs, it goes into a black box, and the black box produces output, and in some cases there’s a formal model. In most cases, because those black boxes turn out to be more efficient when we don’t produce a formal model, right, you don’t know how a decision has been made. [indistinct] with neural networks networks [indistinct] it becomes much worse.
So basically, millions of software systems in our society are these black boxes where there is nothing to see, right. Even if you tried to make them transparent. And that I think was kind of the elephant in the room which I hope you can address. So things are much more serious and dark than we imagined.
Introna: Yeah. So yeah, the issue of the sphere of obscurity I think is a really important one. Because one of the areas of research I’m interested in is privacy. And one of the classical arguments for privacy is that we need privacy for autonomy because in a sense if we have obscurity, if we have spaces where we’re not observed, we feel free to act in the ways which we would want to act. But if we are aware— I mean, this is this Foucauldian point. The point of the panopticon is that if we’re always observed, then we internalize that observation to the point that we observe ourselves on the behalf of the others. And normalize ourselves. And so in a sense, as we become tracked, profiled—
I mean, I felt— You know, the point about the Amazon…if I got to Amazon and I want to buy a book and I go to the bottom and I look “other people who bought this book also looked at these titles,” I look at those titles and I think hmm, maybe I should be reading those things. Maybe there’s something in them that I’m missing. And I’m starting to conform, to become, the person in that strange category of people who read you know X and Y, etc. So, there’s a certain sense in which I become normalized through these systems, and I do think there is a point that we need, a zone of obscurity. In the new EU regulations there’s a whole data protection regulation. There’s whole issue of what they called the “right to be forgotten,” yeah? And I think this tries to speak to that but that’s deeply problematic.
I think the issue of machine learning is obviously very absolutely correct. I mean, one of the areas of research that Helen and I have done is looking at facial recognition systems. And one of the things that the research has shown is that facial recognition systems are better at identifying dark-skinned people than white-skinned people. And you know, you can imagine how that might play out in terms of race and so forth. And so we asked the programmers, the people, why. And they said, “Well we don’t know,” right. So we have these sets, these algorithms learn through being exposed to these sets. You know, we can open the box, but there are just these layers of variables and you know, we don’t know why but for some other reason they are better at identifying dark-skinned people than— We have a hypothesis, we have some suggestions why that might be the case but we just don’t know. Yeah, so I do agree that that’s a really serious issue.
Jones: I’ll just say one thing I think that’s challenging in thinking through the issue of obscurity is that…many people who have strong intuitions about personal privacy outside the realm of thinking about these algorithms don’t really have very good intuitions about how easily that obscurity can be destroyed by [traces?]. And I think it means that in thinking about obscurity and privacy we also need to think about what it is that consent is when people don’t have an imagination of what is possible because of extremely powerful algorithms. And I think part of a discussion and indeed part of an informational role that people like the group here can have is to begin to understand that there’s something very different about consent, in all sorts of ways. That we might all agree on the sort of privacy, but that’s easily violated through us consenting to things that seem to us perfectly trivial but have turned out not to be. That were trivial fifteen years ago that aren’t today.
Gitelman: Yeah, maybe I’ll riff on that for one second, on the question of the sphere of obscurity. Because I’ve always wondered about the speeding question. You know, because we’ve had turnpikes for a long time, and E‑ZPass for a little while. And I think you know, it actually would not take an algorithm to catch us all speeding, it’d just take a subtraction problem. You know, because we go a certain distance and we get through with our ticket on the turnpike in too short an amount of time. So I guess what I’m saying is that as much as I’m you know, embracing all the intricacies of this conversation about algorithms from its very multidisciplinary perspectives, we also can’t let the problem of algorithms get mystified to the extent that it obscures us from seeing things we can see without the question of algorithms, too, just a little subtraction problem.
Jones: Yeah, my first calculus textbook in fact had the use of the mean value theorem to catch speeders, I remember.
Barocas: Right, the next two.
Daniel McLachlan: Hi, I am Daniel McLachlan. I’m a technologist at The Boston Globe. It seems like in a lot of these discussions of algorithms and governance, a lot of the concerns that come up are concerns that exist about large organizations and large bureaucracies even without, or sort of before algorithms enter into the discussion. And the increase in the usage and the power of algorithm seems to have two main effects. I mean, the first is obviously that it allows to theoretically catch every speeder. It sort of multiplies the power of the bureaucracy. But on the other hand, I’m interested in teasing out what your thoughts are on how the at least notional transparency of the algorithm is an object, as opposed to a kind tangle of roles and rules enacted by people in an organization change how those organizations behave and how people envision them. Does it make it…you know, does it help or does it hurt?
Daniel Schwartz-Narbonne: Hi, I’m Daniel Schwartz-Narbonne. I’m a post-doc here at Courant. I already introduced myself. So, couple things. First of all, when you had your bubble sort algorithm are you sure it shouldn’t be a less-than or equal to in the for loop? [audience laughs]
And second of all, a lot of the stuff that people have been talking about has been…you know, these are problems that have already existed, right. Law is an algorithm. When the IRS decide you know, will they allow this particular tax dodge or not, and then the lawyers come up with some new way around it, they’re actually playing off an algorithm that’s simply implemented in a human head instead implemented on a computer. And I think the real difference is not you know, are we dealing with algorithms or not. The real difference is the relative cost of doing various things.
So, there’s a lot of stuff where we never really worried about it because it wasn’t…practical, right. We didn’t worry about the huge amount of information that was in some government database because you literally had to send some guy to go photocopy it to get it out and so that was not a risk to your privacy. And now that it’s on the Web and you can scrape it, that same data is now a risk to privacy because the cost of getting it is a lot lower. And in general the costs are dropping, and just an example with the deanonymization— I don’t know if people are familiar with deanonymizing the Netflix data. But Netflix released their data in order to allow people to have a competition to improve their recommender algorithm, and it turned out that you can actually figure out who people are from simply a list of what movies they watched at what times and what rankings they gave them, and then use this to predict other movies by looking at things like people’s blogs. So, the ability to collect all this data has become huge. And that I think is really the big question we have to look at you know, as the cost of doing things is changing. But the fundamental question of dealing with algorithms doesn’t seem to me to have really changed from when we were dealing with the law.
Introna: Yeah, those are two really good points. I think your point is almost an answer to the person before you. And that is, what’s the difference, what changed? We always have had bureaucracies and we’ve always been concerned about these things. But you know, in algorithms, because the cost of doing it has reduced so significantly for exactly that reason—the issue of electronic voting, for example. Why are we so concerned? People would say “Well you know, when we had paper voting, people could also rig the election so why have we got all these hugely-complex processes to try and verify the algorithm for the electronic voting, you know? We don’t have these huge processes when we do paper voting.” Well yeah, but you can not really rig the election if you have a couple of people get together and— You know you, it’s quite costly to go and find the ballot papers, get hold of them illegally, then put all the crosses on and get them all in the box. It’s quite a complex process. Whereas if you get to the algorithm, you could change the election. I mean, you could change a million votes in…a click. So you know, the cost is really the issue. And because that is the case, it really matters where these algorithms sit, what they do, etc. So, I think your sort of point is almost an answer to his, the relative cost point.
Jones: I would just say one of the things that I didn’t comment on in Lucas’ paper but I did in my written thing is that it’s enormously helpful in making us look very c—of not fetishizing the algorithm. That is, in many cases things we’re going to claim are differences of scale. And the place where we need to look is the material and social conditions under which these algorithms are being deployed. And that’s where the continuity with say, bureaucratic or legal procedure— And I think that’s enormously analytically and very practically important. It’s also important to get at those moments precisely when there has to be something different about how we think of them, those moments in which the fact that it is something being done with computers and [indistinct] is qualitatively distinct from what might’ve happened with bureaucratic procedure. I suspect there are fewer of those than we expect. Because it’s easy to get caught up in the technological determinist narrative of the necessity of these sorts of things. But I think it focuses our attention on the one hand on what is it that enables algorithms and material conditions, and then on those special examples, what is distinct about them. And I think that’s important, both analytically and then very practically.
Nick Seaver: Hi, I’m Nick Seaver again. Thanks for a bunch of really interesting papers. I have a question about another thing that is an old question but feels sometimes knew about expertise. And it was really interesting to me to hear how all three of you touched upon this knowledge question. I think Matthew you stated it very straightforwardly in “even if they handed us the algorithm, we wouldn’t know it in any way that really matters.” And what that gets at seems like to me is this question that sort of animates a lot of this discussion in that we’ve got two different camps of expertise, right. You’ve got people who know algorithms, and people who know society, law, whatever, ethics on the other hand. And we want to somehow bridge this gap between these two sets of people. And personally I wonder whether that assumption is unfounded and that you’ve got interesting ethical and legal thinking that happens on the side of them, and you’ve got interesting sort of algorithmic thinking that happens on the side of us. But I’m also wondering what that means for say, when speaking as a “we” on the like ethics, whatever, side, want to talk about algorithms, what do we make of these claims to expertise about what they are like, and about how they work? And how do we sort of reconfigure our question say, for example if it turns out that the bubble sort algorithm per se may not actually be what we mean when we say “I care about the Google algorithm” or something like that. How do we redefine our questions in response to this sort of presumed expertise of others?
Helen Nissenbaum: Hi, I’m Helen Nissenbaum, NYU MCC and ILI. This is more an invitation to reflect on what I’m calling a defense that says “it’s better than nothing.” And that is the way in order to run certain experiences through the algorithms we have, we have to perform the reduction that Lisa talks about. And then we find that the results are good—like you know, take Turnitin. All these millions of documents it’s able to adjudicate. And it’s true that it may not capture all the plagiarists, but it’s gonna capture many of them, so isn’t that better than nothing? And we have—you know, even say with the facial recognition study that you mentioned, Lucas, you might say well okay, so it recognizes darker faces better than lighter faces, but at least it’s recognizing faces, so what’s the problem? And I think it also gets to the question that Sasha was asking last night of Claudia Perlich, and that is Claudia was saying well, we only get say a 4% bump in accuracy by doing this entire backend machinery and then targeting. But that’s good enough for me. You know, that can make my business run.
So I think these are issues of justice at root, but I still don’t know how we’re gonna defend ourselves against this rejoinder that these algorithms, as imperfect as they are are better than nothing.
Introna: Yeah. That’s what my colleagues tell me all the time when I don’t want to use Turnitin. I think it’s… You know I— Well… We can use Turnitin after we’ve had the debate on why we’re doing this. Which we don’t do, we just use the technology. And that’s the problem for me. The “better than nothing” is… What’s interesting about the defense for Turnitin some of my colleagues have is they say the reason why we use this is because the non-native speakers, when they plagiarize, when they copy, we can identify it easily because we can see there’s a change in the style of the writing. But the native speakers, they have the linguistic ability to write over the stuff they copy in such a way that it becomes indistinguishable from the text around it. And therefore actually the reason why we should use Turnitin is because it’s fairer, you know. It’s fairer because it catches everybody equally.
It seems to me one of the things there is that—and some people have touched on this—is the idea that there’s sort of a mathematical or computational objectivity. And that this computational objectivity somehow is valuable enough so that you know, it’s better than nothing, we do catch some of them. Yes, but what about the ones we don’t catch? And the consequences for the people who are caught, against those who are not caught… I mean, in most university systems you get expelled, right? So is this a matter of justice? And is it justice for all?
So my response to my colleagues is let’s first have a debate. Let’s understand the limitations of the system. Let’s understand what it does and what it doesn’t do. And if we have that debate, and we then use it, and we can use it in a formative way, if we use it in a way that is not punitive, that’s not legalistic and we say “Let’s use it to identify the students that copy. Let’s talk to them about copying and why they copy. And let’s use that as an opportunity to educate them in terms of the sort of writing that we expect from them,” etc., now that’s a completely different sociomaterial configuration that we’re putting together. So yes, I think it can serve a purpose, but that purpose needs to be understood within the way in which it operates within those situated practices.
And similarly you know, yes, we want to catch people who speed. But do we understand how that technology operates? Do we understand the conditions under which it operates? Have we had a discussion of what we’re really trying to do here? Are we really just— Are we trying to help people drive safely, or are we simply trying to make money? And in the UK, most local authorities will tell you speeding is a serious form of income for them, and they want speed cameras. The more they have the better because the more money they make. This is not about road safety.
So you know, I think what we need to understand is the sociotechnical practices within which it operates. Why it operates in the way it does. So yes, better than nothing, but.
Gitelman: I guess I would agree with that. I think the way that we could transpose that “better than nothing” rejoinder into a kind of acceptance of “good enough” and there to sort of press the conversation of if you say good enough for your detecting faces or cheats, what’s good, right, and what’s enough? To really kind of push those issues there, the good enough can be a question of optimization. So broaden that discussion and try and get people engaged, not letting a single vendor, say, answer the question. I think by and large that’s it’s an argument or a discussion that we can persuade people to have. I mean, I think that we could make some rhetorical adjustments to the “better than nothing” that might make a more productive channel there.
The they/we question…I mean, the other kind of strategic, rhetorical question that I think is really hard to address. I mean I do, just over the last day and a half to have a kind of an intuitive response that the they/we…you know, is something that we need to run from, to find ways around. And I mean really this is Helen’s incredible talent with her co-organizers of putting so many different people in the room together who don’t make a single they and a single we. And to somehow sort of go forward with that and think strategically about how that happens and how that can happen in more settings.
Jones: Yeah. I would…just combining the two questions. I mean, I guess third, what has just been said, that considering any situation of something being judged to be good enough, or better than nothing, it’s not that an algorithm is necessarily neutral but it’s probably not the right place to look when making that decision. And that’s asking for the kind of expertise of people who look at sociomaterial conditions.
But the expertise of the people who actually build algorithms I think is also useful. It’s the people in between who celebrate them without much understanding that are the sort of— Because if you ask the people who build the algorithms, or you ask data min—you know, industry or machine learning people, what you get is refreshing candor about limitations. The whole field’s about like, you know, we don’t know… You know, “How does this work? We don’t know.” Or these complicated models. That refreshing candor, that conversation, it’s a rich resource for saying this is the wrong kind of thing to be doing if we want to regulate this sort of system. Even if we agreed that it was the value we wanted to have.
So I think actually getting into these different pockets of expertise and away from sort of rather unreflective celebration or denunciation is going to be more powerful way to think about this.
Katherine Strandberg: So, Kathy Strandberg from NYU Law School. So I just had two comments. One was I thought maybe there would something that could be added to the list of what are we concerned about about algorithms. Because I think in many but not all cases of concern about algorithms, in addition to the secrecy concern and the automated concern…and maybe even more important in many cases, it is this fact that in many applications algorithms are using probabilistic inference to make decisions or have implications for individuals. It seems to be that’s something we haven’t really talked about, and that might be okay in some circumstances and not in others.
The second thing I wanted to do was suggest that one concept that might be helpful to us in thinking about this whole area is a conception from economics of “credence goods.” So, sometimes in the case of algorithms, we are in a situation where we’re getting output and we don’t know how to evaluate whether this output is good or not. So Google says “These are the top ten search results” and you know, we don’t know whether we’d like some other arrangement of search results better. We can only say okay, it seemed alright. And that’s actually an area that we’ve— So many cases I think we’re in that situation. And that actually is a situation that at least in the law we’ve dealt with quite a bit, but we don’t we’re not thinking there about… You know, nobody’s too bothered about the fact they don’t understand exactly what’s going on inside their television set, right? Because you know, you see the TV show or you don’t see the show, and it’s working or it isn’t.
So instead of thinking about technologies like that, I think we should be thinking about people like lawyers, and doctors, people who are providing things that even after you get it you can’t really tell whether it was good or not. And legally, we deal with those things in a couple different ways. So one is certain kinds regulatory regimes. But one of the big ways that we deal with this kind of issue is through professional ethics. And I’m wondering if sort of the fact that that isn’t really happening, or we don’t know how to make that happen with some of these things that really are the equivalent of credence goods is part of what’s disturbing us.
So for example I think the Turnitin example is interesting because if the output is plagiarism or not, and we don’t know anything about what they’re doing, then it’s a credence good. Once we know they’re counting a certain number of characters, we might or might not think that’s a good way of measuring plagiarism, but we’re in a situation of we can decide whether we think it’s good—we can evaluate it.
So…going on for too long. But anyway, I think maybe that point about can we evaluate the output is an important one.
Barocas: So I’d love to take another question but I’m afraid we probably have to end there with questions. But please, panel.
Introna: Yeah. Thank you. Yeah, I absolutely agree with you about the probabilistic inference. I think that’s a really important point. And indeed this is something I think where people are really concer— When you go to Google, you can do the dashboard, right. And you can look at who they think you are. So you go to the dashboard and there’s an option; you can see what are the categories under which they have classified you. So I went there and I discovered that I was a woman. And I was younger than I am. So I thought that’s not a bad classification. [audience laughs] But clearly that’s what they use to serve me ads. So maybe that’s not such a great idea. So I do think that’s a really important point.
The issue of evaluation…yeah. I just think professional ethics is not really the way to go. I mean, not that I think there’s a problem with professional ethics. But one of my areas of research is business ethics. And one of the areas in which business ethics have gone for a long time now is the whole notion of codes of ethics and the idea that organizations have codes of ethics and that employees sign up for the codes of ethics and so forth. The problem is those very codes of ethics become the way of avoiding to do ethics, right. So we can say we have a code of ethics but yet you know, the practices don’t conform. But if you question the practices you’ll always be referred back to “Well, we have a code of ethics.” So I think professional ethics is a complex thing, and I don’t think it’s a sort of simple… Well I’m not suggesting you’re saying it’s simple, but I think it’s a very complex route and may even become a way of avoiding addressing the issues that we want to address.
Barocas: Okay. Well I think we’re exactly on time. Please join me in thanking the panel for a good session.