Suchana Seth: So today I wanted to talk to you about what does it mean to speak of fairness in the context of machine learning algorithms and how do we decide what is fair?
I think all of us have had a feed of news coverage over the last couple of years about all the different ways in which algorithms can be biased. We have heard stories about how algorithms that are used in the criminal sentencing procedures can end up biased against certain racial or ethnic communities. We have heard stories about how for instance Google’s photo tagging software fails to identify certain communities as even human—it even ends up labeling them as gorillas, for instance. We have heard stories about how Twitter bots can end up racist or misogynistic. We know that instances of bias can creep into hiring algorithms.
So that’s a whole lot of bad news. Now the question is what are we doing in the industry, or what is the machine learning research community doing, to combat instances of algorithmic bias? So I think there is a certain amount of good news, and it’s the good news that I wanted to focus on in my talk today.
The first piece of good news is that we can identify, measure, and correct for instances of algorithmic bias. One particular example that I wanted to share with all of you is about an algorithm called word2vec. I don’t know how many of you are familiar with word2vec but it’s an algorithm that’s used to power things like machine translation. It’s used to power some of the search results that you see. And what this algorithm does is it’s very good at looking at large bodies of text, for instance text from news articles, and it’s good at picking up patterns that it can leverage to power things like automated translation.
Now, it’s interesting that this algorithm picks up on a lot of gender stereotypes, instances of bias. For instance, it would pick up something like this idea that man is to woman as professor is to homemaker, just to give sort of a very canonical example. And I’m sure we can all anticipate all the ways in which this might result in biased outcomes down the algorithm chain. So when researchers were looking at ways to combat this kind of bias they found that we can take an algorithm like word2vec and we can in some sense compensate for the bias that it learns from the data. So even though in this instance we cannot actually correct the data, what we can do is sort of measure and compensate for the amount of bias that’s present in the data.
And it’s worth noting that algorithms like this and strategies for combating bias like this allows us to make bias transparent in a way that perhaps wasn’t possible before. It allows us to add to our arsenal of computational social science tools and look at the ways in which human systems are biased that he wouldn’t have been able to quantify earlier.
Now, the next piece of good news that I wanted to share is that the machine learning research community has come up with many different ways of making algorithms fairer. So we can start by asking how can we make the data less biased? We can look at ways of choosing fairer inputs. So to give you just one example. Sometime back data scientists at Uber came up with this interesting correlation between the level of battery in our smartphone and our willingness to pay surge price and accept a ride on Uber. Figures, right? Your phone’s dying, you want to go home quick. Makes sense. But the question is do we really want Uber to be using a variable like the level of battery in our smartphone in order to predict how willing we would be to pay for things? Maybe, maybe not? Who gets to decide that?
So maybe it makes sense to invest some effort in choosing fairer inputs for the machine learning algorithms that we are using. We have ways to come up with auditing algorithms in a way that lets us say, is this algorithm being fair in its outcome for every possible person? Is it being fair in its outcome for every possible group?
So these are again pieces of good news because we have all these many different possible definitions of fairness that we can now use to start combating algorithmic bias. But then choosing the right definition of fairness is not that easy. So first, because many of these definitions of fairness are mutually exclusive; they don’t play nice with each other. This tension sort of stems from the fact that the cost that we associate with false positives and the cost that we associate with false negatives is different. So when an algorithm makes a prediction it can go wrong in different ways. It’s not going to be 100% accurate, and when it makes mistakes, depending on the kind of mistake it makes and the kind of cost that’s associated with that mistake there are different kinds of fairness metrics that we could use, and not all of these fitness metrics can be applied simultaneously. So that’s the first challenge in figuring out what is the right kind of fairness metric that we want to use in a given application.
The next question is how do we decide what the right trade-off between fairness and accuracy is? Because fairness often comes at the cost of how accurately we can predict something. So if fairness dictates that we choose not to use certain variables to predict something then it means we might lose out on some accuracy we would have got had we used those variables.
And then most importantly to my mind there is this question of who gets to choose what’s fair? So, there’s this cautionary tale that I like to sort of keep in mind when I think about this issue. Sometime in 2009, a group of people decided to play around with the search engine results you get when you search for Michelle Obama. They decided to attach her name to some unsavory pictures and decided to make sure that those results got upgraded in the search engine results.
And in this case Google began to intervene at some point and say, “Okay, no. We have to downgrade these search engine results,” until the popular opinion, interestingly enough, veered around to a debate between where do we draw the line between offensive search results and protecting free speech? At that point Google sort of backed off and chose not to intervene any further.
Now, two years down the line when there was a terror attack in Norway, a bunch of people used a very similar strategy to discredit the terrorist’s sort of brand image, if you want to call it that. They decided to game the search engine results by associating some unflattering pictures with this terrorist’s name. And interestingly enough, in this case Google chose not to intervene at all.
So what’s worth noting here is that we did have the technical tools at our disposal to have intervened in both of these cases, but in one case we did and the other case we didn’t, and the choice in this case was sort of…you know, controlled by this platform, by Google. So this question of what is fair and who gets to decide is something that we should be thinking about very very hard.
I think we are making some progress in tackling these issues and in trying to understand what kind of accountability structures work best to answer them. So professional bodies like IEEE, ACM, are trying to come up with standards for combating algorithmic bias. We have instances of regulation like GDPR in the European Union that’s trying to grapple seriously with issues like this. Recently we had organizations like Microsoft and Amazon and IBM, etc. come together to form a partnership on AI which looking at issues of AI ethics and AI governance. So I would say that we are making a certain amount of progress and there are still a bunch of open questions, but I think the most important thing we need to address is how do we get the right stakeholders in the room, and how to get them to contribute to this conversation? Thanks.