Ethan Zuckerman: So, this is the moment in the pro­gram where we have a seam­less tran­si­tion from one tricky top­ic to anoth­er tricky top­ic. In this par­tic­u­lar case, we’re tran­si­tion­ing from the idea of research that in many cas­es makes us deeply uncom­fort­able, research that in many cas­es we haven’t been able to take on for a com­pli­cat­ed wealth of eth­i­cal rea­sons, social rea­sons, as well as legal rea­sons. We now have a pan­el which is look­ing pri­mar­i­ly at legal bar­ri­ers to research. And this is a pan­el that real­ly asks the ques­tion, why can’t we do that? Why aren’t we allowed to take on cer­tain par­tic­u­lar­ly press­ing research questions? 

To mod­er­ate this pan­el, we have some­one who over the last five years I have heard dozens of times, Why can’t we do that?” because he is my doc­tor­al stu­dent, Nathan Matias, who is doing absolute­ly ground­break­ing work around ques­tions of dis­crim­i­na­tion, harass­ment, and online behav­iors that are dan­ger­ous and detri­men­tal to com­mu­ni­ties, and help­ing com­mu­ni­ties try to fig­ure out how to find their way through it. 

So I’m very very hap­py to hand you over to Nathan Matias who’s going to lead us through this next set of the con­ver­sa­tion. He in turn is look­ing for some­thing that could advance slides, and I will hand it to him and hand the stage to him at the same time.

J. Nathan Matias: Thank you very much, Ethan. As Ethan said, quite often when we’re ask­ing these dif­fi­cult ques­tions we’re ask­ing about ques­tions where we might not even know how to ask where the line is. But in oth­er cas­es, when researchers work to advance pub­lic knowl­edge, even on uncon­tro­ver­sial top­ics, we can still find our­selves for­bid­den from doing the research or dis­sem­i­nat­ing the research. Especially at moments when the research we’re doing, or the work of spread­ing it, comes into ten­sion with the busi­ness­es who are involved in the issues we study, and in the very work of shar­ing knowl­edge. At such moments, we can find our­selves tied to the mast, not as Cory Doctorow said ear­li­er today, of our prin­ci­ples, but tied to the mast of laws that have been set up to pro­tect those inter­ests and which can get in the way of impor­tant work and pub­lic knowledge. 

Here today, we’re going to be hear­ing from speak­ers who have done work that touch­es up against laws relat­ed to cyber­crime, and laws relat­ed to copy­right. In the areas of cyber­crime we have laws like the Computer Fraud and Abuse Act, which while it was designed to help peo­ple and com­pa­nies be pro­tect­ed against cer­tain kinds of unlaw­ful access to com­put­ers, has also turned out to be a pow­er­ful pro­tec­tion against account­abil­i­ty as researchers try to under­stand the pow­er of AI and the impact that machine learn­ing can have, not just on our every­day lives but also on fun­da­men­tal prin­ci­ples and val­ues of equal­i­ty, fair­ness, and discrimination.

And even when we’ve pro­duced that knowl­edge, when we’ve com­plet­ed our research, when we’ve pub­lished it, we can dis­cov­er that not every­one is able to access that research. That when we do work in the pub­lic inter­est, when we add to knowl­edge, often the peo­ple who are most able to access it are the peo­ple with the most resources, the peo­ple and the insti­tu­tions who are able to pay. At those moments, we basi­cal­ly have three choices.

Our first option is to just give up and walk away from the chal­lenge of advanc­ing pub­lic knowl­edge. Our sec­ond option—which is what many of us do—is to dis­obey qui­et­ly, to dis­re­gard the rules and hope that we won’t get caught, in lots of small and every­day ways. And it takes great courage to see the third option, which is to try to solve the issue, not just for our­selves but for whole fields and soci­eties. And I’m excit­ed today that we’re going to be hear­ing from two researchers who have done remark­able work to do just that.

Firstly we’ll be hear­ing from Dr. Karrie Karahalios, a pro­fes­sor of com­put­er sci­ence at the University of Illinois. Karrie has been a pio­neer­ing researcher of ways that social tech­nol­o­gy is shap­ing our lives as soci­eties, design­ing sys­tems and help­ing expand our the­o­ries to under­stand our rela­tion­ship to each oth­er and how that’s medi­at­ed through social net­works, through com­mu­ni­ca­tion tech­nolo­gies. She’s also an expert on algo­rith­mic account­abil­i­ty, and a leader in early-stage efforts across acad­e­mia to under­stand the role that machine learn­ing sys­tems are play­ing in our every­day social lives. Just three weeks ago she joined togeth­er with researchers at the University of Michigan, Northeastern University, First Look Media, and the American Civil Liberties Union to file a law­suit aimed at clar­i­fy­ing whether researchers should or could be able to do impor­tant research to under­stand algo­rith­mic account­abil­i­ty and discrimination. 

After Karrie, we’re going to be hear­ing from Alexandra, who is a Kazakhstani researcher who’s stud­ied every­thing from neu­ro­science, com­put­er sci­ence, to his­to­ry of sci­ence, and is most well-known for her work on a sys­tem called Sci-Hub, which pro­vides over fifty-one mil­lion schol­ar­ly arti­cles online for free by pool­ing the access cre­den­tials of aca­d­e­mics from many uni­ver­si­ties and mak­ing those arti­cles avail­able freely to download.

As we think about the work that Karrie has been doing to bet­ter study the impact of social tech­nolo­gies and algo­rithms on our lives, and the legal our legal bound­aries that for­bid that work, and as we think about the legal chal­lenges that make it dif­fi­cult for every­one to access pre­cious knowl­edge com­ing out of research, we’ll also be hav­ing a con­ver­sa­tion about the chal­lenges of mak­ing that step to try to solve a wider prob­lem for a wider field and com­mu­ni­ty, as well as those ten­sions we find our­selves in with busi­ness and with law, at the bound­aries of law and research. Karrie. Please wel­come Karrie.

Karrie Karahalios: Oh, we have Alexandra. 

Matias: We were hav­ing dif­fi­cul­ties with Alexandra join­ing us. It now looks like those have been resolved. So actu­al­ly, let me ask you to please wel­come Alexandra Elbakyan to share her work with us.

Alexandra Elbakyan: Okay, so Sci-Hub. Let me start with a prob­lem. So the prob­lem of dis­sem­i­na­tion of research results, today it’s very lim­it­ed because prices to read research papers are very high. So these prices ren­der the research papers to be inac­ces­si­ble by indi­vid­ual read­ers and even those peo­ple who nor­mal­ly should have access to research papers such as sci­en­tists, stu­dents, and med­ical doc­tors, they also don’t have access. 

So I cre­at­ed Sci-Hub, a web site where you can get these papers for no pay­ment. And [?] is only prob­lem. So, web sites like Sci-Hub, they are cur­rent­ly pro­hib­it­ed by gov­ern­ment, and that means oper­a­tion of such web sites is cur­rent­ly not law­ful because research papers are con­sid­ered to be some kind of pri­vate prop­er­ty, and hence the free dis­sem­i­na­tion of them can con­sid­ered as some kind of theft. 

So let me look at some­thing about the pre-history of the web site. Yes, I myself expe­ri­enced a prob­lem while work­ing on my research projects in Kazakhstan, and I searched online and I found many places where peo­ple helped each oth­er get lit­er­a­ture and cir­cum­vent fire­walls. And many such places were in English and Russian. For exam­ple, you can see on the slide such a forum in Russian. 

So, in 2009 I became a mem­ber of the neu​ro​science​.ru online forum. And this forum had a sep­a­rate top­ic where every­one asked about how to get such and such paper, and oth­er peo­ple who had access, they helped. So I got an idea to make this more auto­mat­ic and more con­ve­nient. For exam­ple, you could auto­mat­i­cal­ly send uploaded papers to email. So I asked the forum admin­is­tra­tor to join the team of devel­op­ers, but he was not very fond of the idea, and par­tic­u­lar­ly because of cor­po­rate law.

In 2011 I became a mem­ber of anoth­er research forum, and they had imple­ment­ed what I was think­ing about two years ago. So they had research papers auto­mat­i­cal­ly uploaded to the forum and sent to emails. So there were some kind of very elab­o­rate rules devel­oped, and also there was a sys­tem imple­ment­ed in Perl. But the main devel­op­er aban­doned the sys­tem, so when I found it it was basi­cal­ly bare­ly working. 

This is anoth­er project of that kind. It’s called Супер-мега скацивалка, or Super-mega down­load” in English. It was cre­at­ed by stu­dents of one big Russian uni­ver­si­ty. Here you could put in a DOI and it would be down­loaded using uni­ver­si­ty access and give out a PDF direct­ly. Here the devel­op­er says that it was suc­cess­ful for 41% of request­ed papers. It was a closed and non-public project, and to access it you need­ed to com­plete a Russian [prob­lem?]. So only peo­ple from Russian roots could access it.

So in 2011 I cre­at­ed Sci-Hub. The first iter­a­tion was draft­ed in a cou­ple of days. What I did was I took open source soft­ware that was cre­at­ed to cir­cum­vent ordi­nary web site blocks and then mod­i­fied it to work with uni­ver­si­ties. So if not for open source, I would not be able to imple­ment the idea quick­ly. And what Sci-Hub did was allowed users to browse research web sites as if they were brows­ing from uni­ver­si­ty com­put­ers, and hence they could down­load via uni­ver­si­ty sub­scrip­tion. And at first Sci-Hub didn’t rely on any CrossRef DOI

A few months lat­er, I was con­tact­ed by Library Genesis admins, and they’d just cre­at­ed a sec­tion on their web site to store research papers and had start­ed upload­ing some papers there, but not in very big amounts.

At first Sci-Hub was used pri­mar­i­ly by for­mer Soviet Union coun­tries, and I blocked access for the United States and some of Europe, to keep the project safe. After one year of [?], peo­ple in China and Iran became aware of the project. And then traf­fic became so high that our uni­ver­si­ty access start­ed being blocked. So I had to block China and Iran from the web site, too. But now all these coun­tries are unlocked and working.

So in 2014 I start­ed to proac­tive­ly down­load papers from research web sites and upload them to Library Genesis’ col­lec­tion and to Sci-Hub’s own stor­age. So by that time, the project got its own stor­age. Well, once a paper is down­loaded once it’s basi­cal­ly free and you don’t need to down­load it again. So for major pub­lish­ers now, we have more than 90% of papers freed, and so tech­ni­cal­ly the prob­lem of pay­walls is solved. So what is left is to make all this legal.

…the pro­fessed aim of all sci­en­tif­ic work is to unrav­el the secrets of nature
James Clerk Maxwell [pre­sen­ta­tion slide]

So why does sci­ence have to be open? There are many rea­sons, but what is most impor­tant is that the nature of sci­ence is about dis­cov­er­ing secrets and not about keep­ing them.

And when sci­ence is open, how can it ben­e­fit research? Let’s see. Today’s sys­tem makes sci­ence be con­cen­trat­ed in big orga­ni­za­tions, so that small com­pa­nies and indi­vid­u­als sim­ply can­not have sub­scrip­tions, and that means they can­not make cre­ative con­tri­bu­tions. The research is con­cen­trat­ed in big insti­tu­tions, and these insti­tu­tions tend to com­ply to stan­dards. And hence sci­ence became stan­dard­ized and new ideas can­not devel­op. So once sci­ence is open, per­haps there will be big progress for­ward. Okay. Done.

Matias: Thank you, Alexandra. Next we’ll hear from Karrie, then we’ll have a con­ver­sa­tion before open­ing it up to the floor. Please wel­come Karrie Karahalios.

Karrie Karahalios: So, I’m going to talk about the need for audit­ing algo­rithms. A lot of this work start­ed for me around 2011, 2012, along with col­leagues Christian Sandvig, Kevin Hamilton, and Cedric Langbort​ when we found­ed the Center for People & Infrastructures. And we decid­ed we real­ly real­ly want­ed to inves­ti­gate the algo­rithms that shape people’s lives and sociotech­ni­cal systems. 

One of our first projects was a case study look­ing at a sur­veil­lance sys­tem with cam­eras in Europe on a train sta­tion. And one of the things we encoun­tered was dif­fi­cul­ties like imag­ine a sys­tem with machine learn­ing where an oper­a­tor might always stop and pause the cam­era look­ing at a spe­cif­ic type of per­son, whether they be black or white. In this case we were wor­ried that they might always stop when some­body was black. What hap­pens when a sys­tem learns this and takes bias and builds it into a machine learn­ing algo­rithm from train­ing data and keeps grow­ing? In con­trast, this is a case where peo­ple real­ly wor­ried that peo­ple that were black would quick­ly be observed.

In almost real­i­ty, what we start­ed see­ing was infra­struc­tures and sys­tems where peo­ple black peo­ple weren’t seen at all. So I don’t know if you remem­ber this case. Desi and Wanda made a YouTube video where they’re look­ing at a new HP cam­era that was cre­at­ed. And it turns out that this cam­era was sup­posed to track peo­ple. And it tracked Wanda per­fect­ly. Wanda would go to the left, she would go to the right, she would zoom in and she would zoom out. The cam­era did a beau­ti­ful job. Desi, who hap­pened to be black, would go in, move around, jump, move back and forth; absolute­ly noth­ing would hap­pen. He went so far as to claim face­tious­ly that HP Hewlett cam­eras were racist. He said this in a very very humor­ous tone, but they touched on a real­ly big impor­tant problem. 

Another case that hap­pened here in Boston was with Street Bump, July 2012. This was an inter­est­ing sys­tem that was released. You had smart­phones in cars and taxis. And with the accelerom­e­ters on these phones, you can tell where there are pot­holes. This was laud­ed as a huge huge crowd­sourc­ing suc­cess, and it was in many ways.

In 2014 it was revealed, though, that some poor parts of Boston were not ser­viced by this, and pot­holes were only get­ting fixed in rich, wealthy sub­urbs, and not in some of the poor neigh­bor­hoods. What was real­ly great about this infor­ma­tion com­ing out was that the prob­lem was solved. They said they fixed it. I don’t know how they fixed it, but they said they fixed it.

Another case, much more recent, 2016, Amazon Prime same-day deliv­ery. One of the most stark exam­ples here again is in Boston when you look here, Roxbury, pret­ty much all parts of Boston are get­ting same-day deliv­ery from Amazon Prime. Roxbury is not. Same thing hap­pened in a bunch of cities. When this infor­ma­tion was revealed and made avail­able to the pub­lic, and also made clear to Amazon that it was unac­cept­able, imme­di­ate­ly this was reme­died in Boston, New York City, and Chicago. Hopefully it’ll be reme­died across the coun­try. But this infor­ma­tion needs to get out there for change to happen.

Something else you’ve prob­a­bly seen very recent­ly around the same time in the news­pa­pers, pre­dic­tive polic­ing. Again this needs to be audit­ed. We need to see if there’s bias in who’s being pre­dic­tive­ly observed, who’s being maybe stopped by police, or what zones are being pro­duc­tive­ly observed. Are they poor areas, are the wealth­i­er areas?

And prob­a­bly even most recent­ly in terms of the game Pokémon Go why is it that there are few­er Pokémon stops in poor neigh­bor­hoods, and so many in rich, wealthy areas? There’s lots of rea­sons for this, and again crowd­sourc­ing plays a role here. We need to think about the big­ger con­se­quences. What hap­pens so we have self-driving cars and the car has to make a deci­sion about whether the own­er of the car dies, the per­son on that cor­ner, or the per­son on that cor­ner? How do we make sure we look at these algo­rithms and objec­tive­ly audit them so that we can then find solu­tions for them?

So in that vein we decid­ed with our cen­ter to explore fair hous­ing. Are racial minori­ties less like­ly to find hous­ing via algo­rith­mic match­ing sys­tems? And there’s an inter­est­ing study done by Edelman where he found that if you are black and you were rent­ing an apart­ment on Airbnb, you will get less mon­ey for your unit than some­body who was a white that had a sim­i­lar unit that was con­trolled for price and loca­tion. Does this work both ways?

So it turns out that in look­ing at this work, there is a prece­dent and full sup­port for audit­ing for hous­ing. The Civil Rights Act of 1964 decreed that race sim­ply could not be con­sid­ered in some sit­u­a­tions. The Fair Housing Act, ’68, said that you could not dis­crim­i­nate based on race, col­or, reli­gion, sex, or nation­al ori­gin. Took twen­ty years lat­er, unfor­tu­nate­ly, to add dis­abil­i­ty and famil­ial sta­tus. I say unfor­tu­nate­ly because it took too long. And again lat­er in 1987, the Housing and Community Development Act declared that the Department of Housing and Urban [Development] can enforce the FHA. And specif­i­cal­ly, on top of that, it can build spe­cial projects, includ­ing devel­op­ment of pro­to­types to respond to new or sophis­ti­cat­ed forms of dis­crim­i­na­tion against per­sons protected.

So, how did they do this? They did what’s called a tra­di­tion­al audit today. They would pair two peo­ple togeth­er, let’s say one white per­son, one minor­i­ty. They would match them on fam­i­ly and eco­nom­ic fea­tures. They would have them suc­ces­sive­ly vis­it real­tors, and then they would record the out­comes, what happened.

US Department of Housing and Urban Development, Housing Discrimination Against Racial and Ethnic Minorities 2012 [chart appears on page 11]

The first one of these paired test­ing exper­i­ments hap­pened in 1977. The most recent one hap­pened in 2012. And one of the things that they found was that blacks, Hispanics, and Asians were told about few­er units than white peo­ple. Since then we’ve had lots and lots of online sys­tems that find hous­ing for us. There’s Zillow, there’s Trulia, there’s homes​.com. So many of them I can’t even begin to list them all in this ten-minute ses­sion. But we want­ed to explore what hap­pens on these online sites with these sociotech­ni­cal sys­tems that might have crowd­sourc­ing impli­ca­tions as well.

So we wrote this paper about audit­ing algo­rithms, and method­olo­gies for doing so. And we tried to apply this for fair hous­ing. The first method­ol­o­gy we came up with was get­ting the code. But this can be real­ly wicked hard. People don’t want to give you their code; we’ve tried. That said, even if you were to get it, it would be real­ly hard to make sense of it sometimes. 

But researchers were very will­ing to give us their code. This was a very famous paper in 1996 by David Forsyth and Margaret Fleck. They glad­ly gave us the code imme­di­ate­ly. And they became famous writ­ing an algo­rithm called Finding Naked People. And if you look at the code, there’s actu­al­ly lines of code in here that they told me they actu­al­ly had to add a line of code because their find­ing naked peo­ple algo­rithms did not find black peo­ple. They actu­al­ly had to go in there and make it find more peo­ple. Some might argue that the act of doing this actu­al­ly caus­es eth­i­cal con­cerns, and I’m not going to get into that here. But get­ting the code is hard, and mak­ing sense of the code is hard.

Another option is to ask the users. And we’ve done this a lot with our own work. In hous­ing, it’s very hard to get a large group of peo­ple, a good sam­ple, to actu­al­ly go and look for hous­ing for you and to do this on the scale—nationally, in the hun­dreds of thou­sands. We’ve done this in the field of algo­rith­mic aware­ness, and I can tell you that we faced many many chal­lenges doing this. It took us over a year and a half to get a non-probability sam­ple, which even that is not ideal. 

Another approach is to col­lect data man­u­al­ly. And that is also extreme­ly dif­fi­cult. In the case of hous­ing it may even be impos­si­ble to find this hous­ing data online, because peo­ple aren’t going to pro­vide it. We’ve tried hav­ing peo­ple crowd­source it using Mechanical Turk and sim­i­lar sys­tems. Not only is it not reli­able, but peo­ple do not pro­vide it.

Latanya Sweeney, how­ev­er, did an amaz­ing study where she looked at search­es for peo­ple online. She did this man­u­al­ly, I want to add. So she would put names of white peo­ple, and she would say, Look, Carrie Swigart: found.” You put in the name of some­body who hap­pens to not be white, and it says Keisha Bentley: arrest­ed” as opposed to being found. She did this with a thou­sand peo­ple man­u­al­ly, by hand. And one of the things we’re going to see a bit lat­er is how this might even be a vio­la­tion of the Computer Fraud and Abuse Act, doing this man­u­al­ly and some sites.

You can scrape every­thing. Again, this is a vio­la­tion of the Computer Fraud and Abuse Act. But we’re all— I don’t say we’re all doing this. Many researchers are doing this today. I mean, peo­ple are scrap­ing left and right. And it’s some­thing that you see in hackathons every day. It’s some­thing that you see in com­put­er sci­ence class­es. It’s some­thing you see in data min­ing. And it’s some­thing pre­dom­i­nant in con­fer­ences like KDD, dub dub dub, CIKM, and so forth.

Our study with bias in social media feeds came to an end in April of 2015 because the API was dep­re­cat­ed. So we had the option to scrape there. We chose not to, because we want­ed to main­tain a good rela­tion­ship with Facebook. We decid­ed that the project could come to a rea­son­able end and we could work on oth­er things. But we could have con­tin­ued the exact same work if we’d scraped the site. The inter­face would look almost iden­ti­cal. We had lots of peo­ple in com­put­er sci­ence depart­ments using our tool. Lots of peo­ple using our tool in art and design depart­ments inter­na­tion­al­ly. But we shut it down. And we inten­tion­al­ly shut it down because of the terms of service.

And last but not least—actually it’s the sec­ond to last, but not least—sock pup­pets. Sock pup­pets, as you might imag­ine, is an approach where you cre­ate fake iden­ti­ties and you put them into a sys­tem and see what out­comes you get from these mul­ti­ple sock pup­pets. So you might make an account for some­body and say this per­son is a female who’s 46, white, makes this much income. Make oth­er account and say this per­son is black, 45, makes this much income. And then let these lit­tle bots loose on the Internet and see what hap­pens and mea­sure the dif­fer­ences that you get in terms of adver­tise­ments, in terms of hous­ing, and so forth.

This approach can work real­ly real­ly real­ly well. Christo Wilson and his col­leagues have used this very very well to find price dis­crim­i­na­tion. So they found, for exam­ple, that if you buy things on your mobile phone it costs more than if you buy it using a desk­top. He’s found that there’s price steer­ing hap­pen­ing from the cook­ies that you leave on your com­put­er. Sometimes the more times you go to a site, it might be more expen­sive than just clear­ing every­thing and start­ing from scratch. What they did is they made three hun­dred sock pup­pets, but it wasn’t enough to get the work done— I’m sor­ry, they used three hun­dred real peo­ple, and made addi­tion­al thou­sands of sock pup­pets to actu­al­ly find sta­tis­ti­cal, mean­ing­ful data. 

And final­ly, you could do a col­lec­tive audit. And this is hard and this would be a dream, but I pose this as some­thing that we should all strive for in the future, where you get lots of peo­ple work­ing togeth­er to pro­vide com­mon infor­ma­tion to be able to do this type of audit that we talk about.

So, we got real­ly real­ly excit­ed when this paper was cit­ed by a White House report on big data. They named five nation­al pri­or­i­ties that were essen­tial for the devel­op­ment of big data tech­nolo­gies. And one of them was algo­rithm audit­ing. Among some of the oth­ers, includ­ed the right to appeal an algo­rith­mic deci­sion, eth­i­cal coun­cils, and clear trans­paren­cy in how you cre­ate algo­rithms. Very specif­i­cal­ly, what they said about audits is that they want­ed to pro­mote aca­d­e­m­ic research and indus­try devel­op­ment of algo­rith­mic audit­ing and exter­nal test­ing of big data sys­tems to ensure that peo­ple are being treat­ed fair­ly.” And we got super excit­ed and were like this is great, let’s get to work.

And there was one block, and that’s the Computer Fraud and Abuse Act. This was found­ed in 1986. Some say it was a response to the movie WarGames which came out just a few years before. And at the time it was a very dif­fer­ent envi­ron­ment than the envi­ron­ment we’re liv­ing in today. Keep in mind that we didn’t have web browsers at the time, most peo­ple did not have broad­band or even access to email. 

And so as a response to this, to con­tin­ue our work and to con­tin­ue it with­out ambi­gu­i­ty in the law, to con­tin­ue with infor­mal­i­ty, with the ACLU and col­leagues we sued the US gov­ern­ment. And this hap­pened, like Nathan said, just a few weeks ago and we’re wait­ing now to get a response from the gov­ern­ment to see what happens.

Specifically, the Computer Fraud and Abuse Act says that it pro­hibits unau­tho­rized access to a pro­tect­ed com­put­er. And a pro­tect­ed com­put­er implies any gov­ern­ment com­put­er, any inter­state or inter­na­tion­al for­eign com­merce or com­mu­ni­ca­tion sys­tem. It also includes any web site acces­si­ble on the Internet. Even the term pro­tect­ed com­put­er” can be ambigu­ous for many. The thing that is most con­fus­ing is the term that you exceed unau­tho­rized access.” And this is specif­i­cal­ly the point they we’re tar­get­ing in the law­suit. In this case, many many courts to date have repeat­ed­ly assert­ed that vio­lat­ing terms of ser­vice exceeds autho­rized access.

So the first vio­la­tion of this is a one-year max­i­mum prison sen­tence and a fine. Subsequent vio­la­tions result in a prison sen­tence of up to ten years and a fine. Unfortunately, there’s no require­ment of intent to cause harm, or actu­al harm stem­ming from the pro­hib­it­ed con­duct. And so your inten­tions here don’t real­ly play a role. This echoes what Cory was say­ing ear­li­er this morn­ing with DMCA

And so terms of ser­vice, like why is this such a big deal? And I may be over­step­ping my bounds here, but I imag­ine many of us are vio­lat­ing terms of ser­vice every day with­out even real­iz­ing it. So, some of that the obvi­ous things from terms of ser­vice, no auto­mat­ed col­lec­tion. The top here is from Facebook. It says you can­not use auto­mat­ed means such as har­vest­ing bots, robots, spi­ders, or scrap­ers, etc. On the bot­tom we’ve got Pokémon Go. Essentially the same thing, includ­ing but not lim­it­ed to the PokéStop data­base and oth­er infor­ma­tion about users and gameplay.

In addi­tion to auto­mat­ed col­lec­tion, they don’t want you to be able to cre­ate these sock pup­pets that we talk about. So you can’t manip­u­late iden­ti­fiers to dis­guise the ori­gin of any con­tent trans­mit­ted through the Twitch ser­vice. So if you want to do some­thing a region­al, you can’t do that. You can’t imper­son­ate or mis­rep­re­sent your affil­i­a­tion with anoth­er per­son or entity.

Even more con­fus­ing is in this case specif­i­cal­ly they banned man­u­al col­lec­tion of data. So here, again the spi­der, the deep link, no scrap­ers or oth­er auto­mat­ed means, method­ol­o­gy, algo­rithm or device or man­u­al process for any pur­pose. I don’t even know what that means. I can’t begin to explain that. But one might inter­pret this as just sit­ting down in front of a com­put­er, tak­ing a pen and paper and just writ­ing some­thing down, and that would be a violation.

Some oth­er things that you can­not do, reverse engi­neer­ing. Bunnie talked about that ear­li­er this morn­ing. And one of the oth­er rea­sons why this makes this so frus­trat­ing is because terms of ser­vice are not sta­t­ic. For exam­ple this com­pa­ny, FrontApp reserves the right to update and mod­i­fy the Terms of Use at any time with­out notice.” So just by going to that web­site, you agree to the terms of service.

Again, and I apol­o­gize for the many Pokémon Go ref­er­ences, in this case unless you opt out, you basi­cal­ly give up your right to tri­al by jury or to par­tic­i­pate as a plain­tiff or a class mem­ber in any pur­port­ed class action or rep­re­sen­ta­tive pro­ceed­ing with the site. Just by going to this. And who knows to opt out, because how many of you have ever read the terms of ser­vice before you’ve played Pokémon Go?

So, why is this so impor­tant? Researchers, jour­nal­ists, every­body, essen­tial­ly, is affect­ed by this. As an instruc­tor, as a lec­tur­er, as a pro­fes­sor, I care a lot about pro­tect­ing my stu­dents, in class­es, hackathons, doing research. Violating terms of ser­vice in many cas­es can lead to unpub­lish­able work. You can spend years and years on a dis­ser­ta­tion and pos­si­bly not be able to pub­lish it. There’s incon­sis­ten­cies in who pub­lish­es, who doesn’t. Or who allows one per­son to pub­lish and doesn’t allow some­body else to pub­lish with same method­olo­gies. IRB may not approve of your work. Your rep­u­ta­tion could be at stake. Finding employ­ment; let’s say you want to work at Facebook, but some­thing gets in the way maybe because of a study that you did. Research fund­ing, that comes up a lot so I won’t bela­bor that point.

But I do want to start briefly talk­ing about the impor­tance of norms before I end. And norms are cru­cial here because I men­tioned ear­li­er that lots of com­put­er sci­en­tists scrape, lots of com­put­er sci­en­tists use bots. And they’ve done this for social good. And dis­cus­sions are hap­pen­ing. Last year there was an amaz­ing event here called Freedom to Innovate. And I was a lit­tle frus­trat­ed at the time about ACM and how their pro­fes­sion­al code of con­duct for­bid­ded you vio­lat­ing terms of ser­vice, essentially. 

I’m ecsta­t­ic to announce that because of Amy Bruckman and her peers—Amy Bruckman from Georgia Tech, for those of you who don’t know—there now exists as of just a few weeks ago, the new ACM SIGCHI Ethics Committee. Again, the goal of the com­mit­tee is not to resolve eth­i­cal issues—it’s not a court—but it’s to facil­i­tate shared under­stand­ing emerg­ing from the com­mu­ni­ty. And they’re address­ing the fact that what the com­mu­ni­ty does mat­ters and what the com­mu­ni­ty cares about matters.

That said, going going back and forth it was hard to come up with a sys­tem of goals. CFAA vio­la­tions are hap­pen­ing. Researchers do not want to artic­u­late this but they are hap­pen­ing, and dis­cus­sions are hap­pen­ing online and in blogs. And one of the things that Amy writes in her blog that I high­ly rec­om­mend is that maybe doc­u­ment­ing what you do is not so smart with respect to terms of ser­vice. Because, as her friend Mako Hill not­ed, that could get peo­ple more trou­ble. It asks peo­ple doc­u­ment their intent to break terms of ser­vice. Under some cir­cum­stances, break­ing terms of ser­vice is eth­i­cal, yet not strategic.

And so these dis­cus­sions are hap­pen­ing offline, they’re hap­pen­ing in blogs. And it’s nice that the com­mu­ni­ty is start­ing to come togeth­er to dis­cuss this. And like I said, they’ve always faced eth­i­cal ques­tions, and we have frame­works to address them in the face-to-face world. Given con­ver­sa­tions with indus­try, with the gov­ern­ment, with researchers, we can find frame­works to address them in the online world as well. And the approach that we’re final­ly going towards right now is pol­i­cy change, pol­i­cy cre­ation. We’ve tried talk­ing to indi­vid­ual com­pa­nies. We’ve tried ask­ing for data. And we real­ly hope that this next step for­ward helps for­mal­ize not just us but every­body know­ing what they can do online. 

Matias: Thank you very much for that Karrie and Alexandra. We’ll take a lit­tle bit of time to chat with each oth­er. And I’ll just note that in many cas­es Alexandra will be speak­ing to us via Russian, and we have a love­ly trans­la­tor who will be then trans­lat­ing her words into English. So that will set the pace of our conversation.

I’m real­ly fas­ci­nat­ed by the dif­fer­ent ways in which the two of you have been dis­obe­di­ent in respect to the law. Like Karrie on one hand, you have these legal risks around the Computer Fraud and Abuse Act. You both have this sit­u­a­tion where there are com­mon prac­tices. Researchers are scrap­ing, peo­ple are con­stant­ly vio­lat­ing these terms but maybe doing it under the radar. And in your case, Karrie, you’re dis­obey­ing by par­tic­i­pat­ing in this law­suit to try to change the rules. You’re par­tic­i­pat­ing in pro­fes­sion­al soci­eties like the Association for [Computing] Machinery to make this wide­spread prac­tice per­haps have few­er legal risks. While, on Alexandra’s side, there is a very dif­fer­ent kind of response. The answer is to kind of build on this breadth oth­er plat­forms, oth­er efforts that peo­ple are doing to also share research and build a sys­tem that does it even more effec­tive­ly and in a more wide­spread way. 

I’m curi­ous to hear both of you—maybe Karrie first and then Alexandra—talk about how you see the kind of estab­lished insti­tu­tions and your rela­tion­ship to the rules, to the kind of pow­ers that be. In your case the com­pa­nies, the law, the research orga­ni­za­tions. And Alexandra, in your case, how you see the work of Sci-Hub in rela­tion to the aca­d­e­m­ic pub­lish­ing industry.

Karahalios: So with respect to the uni­ver­si­ty, com­pa­nies, col­leagues, you com­plain a lot and you end up on com­mit­tees to do some­thing about it. So for exam­ple insti­tu­tion­al review boards have lots of com­plaints about should they have any juris­dic­tion over any legal issues, or just eth­i­cal issues. And many would argue that the eth­i­cal is orthog­o­nal to the legal, and IRB should stick to the eth­i­cal and not the legal. By doing that, I end­ed up on the IRB board. Which was surprisingly—maybe not surprising—it was very reward­ing. I hope I had some impact. It was nice to see stud­ies specif­i­cal­ly in com­put­er sci­ence that could get approved in a day ver­sus six to nine months, the way it was before.

Matias: Karrie, can you help us, for those of us who are less famil­iar with IRB, can you tell us what that is and what role that plays in the story?

Karahalios: Sorry. Institutional review boards exist at many uni­ver­si­ties. Originally they were put in place so that any­one who received fed­er­al fund­ing had eth­i­cal approval for any research they did with gov­ern­ment mon­ey. In most research insti­tu­tions IRB approval needs to be obtained for any research study that hap­pens regard­less of whether it’s fund­ed by the US gov­ern­ment or not. So if some­body wants to do a study, before they can do any stud­ies with a human sub­ject, they have to file an appli­ca­tion. This goes to the board. It gets reviewed. Some boards meet every month, some have expe­dit­ed reviews that might hap­pen on a week­ly basis. But you need to get IRB approval before you do any study. If you’re caught not hav­ing IRB approval, the entire insti­tu­tion could be shut down and all research stopped until the sit­u­a­tion is reme­died. And so it was nice to see an open­ing dis­cus­sion at the uni­ver­si­ty lev­el with IRB and how they were help­ing us move research forward. 

With respect to work­ing with indi­vid­ual orga­ni­za­tions, we’ve had mixed results there. We want­ed to work with one spe­cif­ic com­pa­ny, and we tried to be nice. We asked them, We don’t have a lot of data that we need. Instead of wast­ing your time and our time col­lect­ing this data, can you just give it to us and we’ll promise not to touch your servers ever again?” They said no. That led to peo­ple not want­i­ng to ask again, because once you get that no it makes it even worse to actu­al­ly go and col­lect it after offi­cial­ly hav­ing that no answer.

Moving on to big­ger cor­po­ra­tions, I’ve had great dis­cus­sions with a data sci­en­tist at Facebook. They’re among one of the best groups that I’ve ever inter­act­ed with, and they’ve been very sup­port­ive of our work. However, the plan to get the data to us did not come to pass. And that’s not any­thing to do with the data sci­en­tists, it had to do with the orga­ni­za­tion as a whole and legal issues in a big cor­po­ra­tion that I hon­est­ly don’t understand.

So I guess what I’m try­ing to say is we try to do our due dili­gence, and we feel like we’ve knocked on many dif­fer­ent doors. And we need­ed a new one to try to stop hav­ing to knock on all of these doors.

Matias: Which is where the legal case comes in. So I’m curi­ous, Alexandra. Often when peo­ple dis­cuss your work with Sci-Hub, their first reac­tions are to rehash the open access debates and think imme­di­ate­ly about their own insti­tu­tions, their own con­text, their own coun­tries and cul­tures. But it strikes me that Sci-Hub is very much an out­sider project for peo­ple who find them out­side of those struc­tures. Do you have a vision for chang­ing the pub­lish­ing indus­try? Or do you see your­self as just sur­vey­ing the peo­ple who are access­ing and shar­ing arti­cles through your resource?

[Alexandra responds here in Russian. Subsequent ref­er­ences are via the interpreter.]

Matias: Alexandra, if you could allow us some paus­es in the mid­dle of your respons­es, so our trans­la­tor has a chance to catch up with you, that would be great. But if you’re able to give a try on that.

Elbakyan: So, I think that in terms of Sci-Hub if it con­tin­ues to exist, I think that the sci­en­tif­ic pub­lish­ing indus­try will have to adjust because they will not be able to gain the huge prof­its they are gain­ing right now from sub­scrip­tions. If you ask my per­son­al opin­ion about open access, I’m for it. I’m try­ing to pro­mote it.

Matias: Thank you. 

Interpreter: That’s all I got. 

Matias: I’m curi­ous about this dynam­ic that’s in play in both of these sit­u­a­tions, where we have a large num­ber of peo­ple already doing the thing. And to some sense both you Karrie, and you Alexandra, have become more vis­i­ble on these issues because you’re try­ing to address the prob­lem in a sys­temic or a large-scale way. How has that been— Maybe Alexandra to start out. Did you expect that atten­tion? And has that changed how you do your work at Sci-Hub?

Elbakyan: Well, I have to say that from the very begin­ning of Sci-Hub’s exis­tence there was a lot of atten­tion paid to it, but you might not know it because it was large­ly cov­ered by local media and maybe not so much by inter­na­tion­al media.

Matias: And Alexandra, you’ve also in the United States faced legal action from US pub­lish­ers. How does that affect you as some­one not liv­ing in the United States? How does that affect Sci-Hub as well?

Elbakyan: Well, for obvi­ous rea­sons there are a lot of chal­lenges like domain reg­is­tra­tion, for exam­ple. Our sci​-hub​.org domain was closed down through a legal action, but for some oth­er resources it’s not even nec­es­sary to take legal action. And obvi­ous­ly, I’m not going to the United States or Europe. I’m try­ing to be cau­tious about that.

Matias: Well we’re very grate­ful that you’re able to speak to us from where you are right now. I’m curi­ous, Karrie, part of your legal strat­e­gy involves invit­ing oth­er researchers who’ve found them­selves lim­it­ed from doing this work or who faced those chal­lenges and ask­ing them actu­al­ly to come for­ward. Are you able to tell us more about what that entails and why peo­ple might find that dif­fi­cult, or why they should?

Karahalios: Yeah. Well, it’s sim­i­lar to Amy Bruckman’s blog post in a way. My col­league Christian Sandvig put out a call on Facebook ask­ing peo­ple to share their sto­ries. When I talk to the media, they always ask me to find more peo­ple. And there’s many many peo­ple; it’s not my job to out them. However, I know many many col­leagues that have done this work, have gone cease and desist let­ters, have stopped doing the work in in most cas­es. But peo­ple don’t want to come out and admit this. They’ll talk about it in small cir­cles, but they def­i­nite­ly don’t want to announce it to the pub­lic for fear that there might be some­thing to come out of it. 

And I can tell you that at least in our uni­ver­si­ty, six out of my eight PhD stu­dents are not US cit­i­zens. I work with many fac­ul­ty mem­bers who are not US cit­i­zens. For any of your that have had stu­dents who strug­gle get­ting a visa every year or who whose visa gets revoked for unex­plained rea­sons, it’s very very hard to admit to doing any­thing like this. It’s hard to ask a stu­dent to do any­thing like this. 

Matias: Thank you. We have time for just one or two ques­tions. So if you’re able, please come up to the micro­phones in the mid­dle and on the sides, and I’ll knowl­edge you, and you’ll just want to allow some time for the video and the translation.

Audience 1: Thank you. So I had a ques­tion about these terms of ser­vice. Because I feel like Pokémon Go isn’t telling you not to scrape their data because they don’t you to iden­ti­fy racial bias in the stops, right. It’s because they’re pro­tect­ing their com­mer­cial inter­ests. And I’m curi­ous what sort of changes you think can be made to change those com­mer­cial inter­ests so that you don’t need to wor­ry about—so that they might not need to be as strict about enforc­ing the terms of ser­vice or might have dif­fer­ent terms of ser­vice, and try to get at the prob­lem a few steps back.

Karahalios: I’m sor­ry, could you rephra— Can you—

Matias: So for exam­ple, if the Computer Fraud and Abuse Act relies on terms of ser­vice, the­o­ret­i­cal­ly if you could just get com­pa­nies to change their terms of ser­vice, that might allow researchers to audit them and maybe hold them account­able. Is that a viable strategy?

Audience 1: Yeah, and and then also sort of chang­ing the com­mer­cial forces that moti­vate the com­pa­nies to set the terms of service.

Karahalios: Yeah. You know, that’s an excel­lent point. One of the things that we’ve not­ed is that lots of terms of ser­vice are cut and past­ed from lit­er­al­ly oth­er terms of ser­vice. And so there seems to be like a tem­plate that peo­ple start out with and then just add to. So that’s an approach we have not tried. I have not talked to a com­pa­ny and asked them to change their terms of ser­vice. I’ve talked to lots of sci­en­tists, and I’m tolk that the lawyers do not want to touch the terms of ser­vice, the terms of ser­vice need to be there. So I might ask a lawyer about that ques­tion. They’re prob­a­bly more edu­cat­ed to answer that than I am. My attempts to talk to data sci­en­tists always end up going to lawyers, and they stop there.

Matias: One more question.

Audience 2: So first of all, Alexandra thanks for the great work that you’ve done for the com­mu­ni­ty. Now, I’ve been won­der­ing whether you get any sup­port from acad­e­mia itself for your web site. And in par­tic­u­lar from the top insti­tu­tions like MIT. So the ones that actu­al­ly do have the access. But do you feel like acad­e­mia is sup­port­ing your actions? And since I antic­i­pate that the answer is no, do you think that there is some­thing that these insti­tu­tions could and should do, and that extends to to the oth­er pan­el mem­bers as well. Like do you think some­how uni­ver­si­ties could lever­age their pow­er to help in these cas­es and to pro­mote the caus­es that we care about?

Elbakyan: I’ll try to take paus­es while I’m answer­ing this. First of all, I think that MIT has poor access to pub­li­ca­tions, or allows poor access to pub­li­ca­tions. At least that’s what the fre­quent user com­plaint was. Second, as far as sup­port is con­cerned I think that there is sup­port on the part of the users who make dona­tions, but obvi­ous­ly there is no offi­cial sup­port for obvi­ous legal reasons. 

Matias: Well, please join me in thank­ing Alexandra and Karrie for a fas­ci­nat­ing con­ver­sa­tion and panel. 

Ethan Zuckerman: Thanks so much Nathan, thanks Karrie, thanks Alexandra, for join­ing with us.

Help Support Open Transcripts

If you found this useful or interesting, please consider supporting the project monthly at Patreon or once via Square Cash, or even just sharing the link. Thanks.