Moderator Okay. Welcome to the col­lo­qui­um tonight. The col­lo­qui­um tonight is enti­tled The End of Virtual: Digital Methods” by Richard Rogers, and it’s a very impor­tant aspect that Richard will be talk­ing about, name­ly what hap­pens when mate­r­i­al, ana­log mate­r­i­al, moves into the dig­i­tal space but even more impor­tant­ly what hap­pens when the mate­r­i­al that more and more we deal with is born dig­i­tal. So how do the meth­ods actu­al­ly change? How do we need to think about research meth­ods when the mate­r­i­al is all dig­i­tal? How do we research what’s hap­pen­ing in the Internet? How do we research the cul­tur­al aspects of the Internet, of peo­ple con­nect­ing, issues of peo­ple com­bin­ing dif­fer­ent aspects? How do we come up with dif­fer­ent meth­ods that real­ly make sure that we can research that appro­pri­ate­ly? But also, can we just trans­fer exist­ing meth­ods, schol­ar­ly meth­ods, into the dig­i­tal realm? Or do we need to devel­op new meth­ods? Is it some­thing that also the new meth­ods might trans­late back into the more ana­log realm? 

So these are all the ques­tions, and many more, that Richard will address. Let me briefly intro­duce him. Richard is a pro­fes­sor at the University of Amsterdam, pro­fes­sor of media stud­ies. And he’s the chair of the New Media and Digital Culture pro­gram at the University of Amsterdam. He’s also the direc­tor of gov​com​.org, and that’s a group that’s respon­si­ble for the Issue Crawler. Some of you might have seen that, a very inter­est­ing visu­al­iza­tion tool for the Internet, and oth­er polit­i­cal tools. And he’s also one of the founders of the Digital Methods Initiative, and that’s rework­ing the Internet and meth­ods for Internet research. 

He has pub­lished quite a few books. One of them is Information Politics on the Web, and that’s MIT Press, 2004. That was also award­ed the best book award of the year by the American Society of Information Science & Technology. And he’s work­ing on a new book that’s called Digital Methods. Hence the title of tonight’s talk. And that also is going to appear at MIT Press. So please join me in wel­com­ing Richard Rodgers.


Richard Rogers: What I’m going to do today is sit­u­ate dig­i­tal meth­ods as an approach, as an out­look, in the his­to­ry of Internet-related research. I’d like to divide up the his­to­ry of Internet research large­ly into three eras, the first being where we thought of the Web as a kind of cyber­space. And these par­tic­u­lar peri­ods that I’m going to tell you about, they’re transhistorical—they over­lap. But I think there’ve been some changes over the last ten, twen­ty years in how we do research with the Internet. So this is what I would like to— I’d like to high­light the changes in the dom­i­nant ways of thinking. 

So, in the ear­ly days we had arguably this idea of the Web as cyber­space, where the dom­i­nant form of Internet-related research was kind of cyber­cul­tur­al stud­ies. And one of the inter­est­ing things about cyber­cul­tur­al stud­ies was look­ing at and pro­mot­ing the Internet as being some­thing very very dif­fer­ent. In fact as being a kind of oth­er realm. In fact as being a vir­tu­al realm where at the time—sort of see­ing the Web as cyber­space treat­ed the Internet, and the Web, as a vir­tu­al realm, as some­thing that stood apart. And also it was pro­mot­ed and thought of as being quite transformative—it would trans­form iden­ti­ty, it would trans­form cor­po­ral­i­ty, it would trans­form ideas of pol­i­tics, etc. 

Now, around 1998 with the Steve Jones vol­ume Doing Internet Research, and 1999, 2000 with a cou­ple of impor­tant mono­graphs by vir­tu­al ethno­g­ra­phers, in par­tic­u­lar by Slater and Miller, they in some sense sought to debunk all of the var­i­ous claims of the Internet as being trans­for­ma­tive. So in marched the ethno­g­ra­phers first and lat­er the social sci­en­tists. And they sur­veyed, and they vis­it­ed Internet cafes. And what they did was in some sense grounded…it ground­ed Internet-related research. 

And inter­est­ing­ly enough, the move that they made in doing user stud­ies was to go offline. So they inter­viewed, they observed, and what they found was that all of the var­i­ous trans­for­ma­tive qual­i­ties were a lit­tle bit dif­fer­ent than one had pre­vi­ous­ly thought. So, one’s iden­ti­ty is not just root­ed in the online but is in fact also root­ed in the offline. All of these things are a bit mixed.

Now, this went on for some time and it’s still going on. The social sci­en­tif­ic impact on Internet-related research has been great. But what I would like to argue is some­thing hap­pened some­time around 2007, 2008. And this is the first time when I came up— I saw a num­ber of the devel­op­ments that went on and I came up with a term called online ground­ed­ness.” And online ground­ed­ness is a term that I coined in order to try to think about research that takes data, online data, about the real and does research about soci­ety using the Internet. Right, so no longer is the Internet this realm apart, this vir­tu­al space, this cyber­space. No longer do we go offline in order to find out about what’s going on online. But rather nowa­days, arguably we’ve moved into a peri­od where the online… Or online data sets—so the Web is data. Online data sets serve as a means to study not just online cul­ture but rather cul­ture, and society. 

So this is the move that I’m mak­ing with dig­i­tal meth­ods. So let me just get direct­ly to an exam­ple so you know what I mean. It was in August 2007 when I read quite an inno­cent arti­cle in a Dutch news­pa­per. And inves­tiga­tive jour­nal­ists wrote that they were research­ing hate, basi­cal­ly. And the Internet of course has always been this…beginning with Cass Sunstein’s obser­va­tion in Republic​.com, the Internet’s always been the site for hate and extrem­ism research. 

In any case, Dutch inves­tiga­tive jour­nal­ists asked the ques­tion of whether or not Dutch cul­ture is hard­en­ing. And in order to answer that ques­tion, they did­n’t go native. So they did­n’t embed them­selves like jour­nal­ists do study­ing hooli­gan­ism, for exam­ple, in writ­ing a book about hooli­gan­ism. They did­n’t go native. They did­n’t vis­it the social his­to­ry library and the spe­cial pam­phlets col­lec­tion and the look­ing up hand­bills and things like this. They did­n’t inter­view extrem­ism experts. 

They went online. And they in fact went to the Internet Archive and looked at web pages. And looked at the his­to­ry of about a hun­dred dif­fer­ent web sites. They com­pared right-wing web sites—right-of-center web sites—with extrem­ist web sites. And they looked, and they saw that over time the lan­guage on the right-wing sites was begin­ning to approx­i­mate the lan­guage on extrem­ist sites. So right-of-center web sites them­selves in their word choice and in the issue lan­guage that they would use and the slo­gans etc., was becom­ing more and more extrem­ist. And there­by on the basis of study­ing web sites, they con­clud­ed that Dutch cul­ture is hardening. 

Now for those of us who have spent the last…I dun­no ten years hear­ing about and think­ing about the Web as a vir­tu­al realm, as Web as cyber­space, as some­thing with an aster­isk on it, for those of us who’ve only gone to the Web to study online cul­ture, this was rad­i­cal. Right, so using the Web to make a find­ing about what’s going on in society. 

Now, inter­est­ing­ly enough, they ground­ed their claim—and now this is the tricky point, and this is where a lot of peo­ple get a lit­tle bit…well, start ask­ing ques­tions. They ground­ed their claim with online data. So they ground­ed their claim—so the claim that the Dutch cul­ture is hard­en­ing, and using the data of web sites—they ground­ed it in the online. So this is why I came up with this term online ground­ed­ness. They used the online as the base­line, as the means of cal­i­bra­tion. Which is radical. 

So I’m going to give you a few oth­er exam­ples of this, just so you’ve seen them, or just so you can think about them in these terms. Now, you will have heard of Google Flu Trends. Google Flu Trends is very very inter­est­ing because Google Flu Trends uses search query log data. Those folks search­ing for flu and flu-related symp­toms online, their loca­tions are found—they’re locat­ed, and then the places of flu are there­by plot­ted. So they’re using online data, data gained through what I would call reg­is­tra­tional inter­ac­tiv­i­ty, data gained through search engine logs, to find out where flu’s at. 

Now the inter­est­ing thing about Google Flu Trends is that imme­di­ate­ly there was an out­roar. So hang on. This method is very very dif­fer­ent from the tra­di­tion­al method. The tra­di­tion­al method is we rely on emer­gency room reports, oth­er tra­di­tion­al data-collecting that then is fed to the Center for Disease Control, and they come out with offi­cial­ly where flu is at. And where oth­er dis­ease is as well. Google Flu Trends, interestingly—and this is why there was such a great deal of inter­est around it—antic­i­pates flu by…they’re sort of sev­en days approx­i­mate­ly ahead of the Center for Disease Control. 

However, before they could make claims about how well their data…well, how well they work, they had to check it against the CDC data, right. So they had to ground their claims in the tra­di­tion­al data. So it’s an inter­est­ing project because it finds out what’s hap­pen­ing in soci­ety and cul­ture through Web data, yet it does­n’t use the Web as the base­line. So it’s not ground­ing the find­ings online. I just want to put— I mean, this was a few… This was last year. Google Flu Trends has now expand­ed to some­thing like fif­teen, sev­en­teen countries. 

This pro­jec­t’s dif­fer­ent. I don’t know if you saw this…2009, I think. The day before Thanksgiving. A series of graph­ics were pub­lished in The New York Times. And what you see here is where peo­ple were query­ing a par­tic­u­lar pop­u­lar recipe site, allrecipes.com—I don’t know if you use it, all​recipes​.com. I think it’s prob­a­bly the most pop­u­lar in the US. I use the BB— There’s also Epicurious, is the oth­er one? Yeah. And then there’s the BBC—anyway. This is the biggest one. And what you see here is of course a map of the US and the dark­er the area, the more pur­ple, is the high­er the inci­dence of queries for a recipe. And so this is sweet pota­to pie. This is the day before Thanksgiving. People look­ing for mac­a­roni and cheese, which I liked. Sweet pota­to. Corn casse­role, you see the Corn Belt. Green beans. Turkey brine. Yams. 

So what you see here is a kind of geog­ra­phy of taste, if you will. A geog­ra­phy of taste, geog­ra­phy of pref­er­ence. And when I was look­ing at this I thought to myself well how else would you do this? You know, we can… How else would you sort of chart geog­ra­phy of taste? I mean, we could get super­mar­ket data. We could inter­view. We could sur­vey. And then I thought to myself, are those types of activ­i­ties, are they actu­al­ly fund­able? Quite difficult. 

However, this is quite— You know, I mean there’s a lot of valid­i­ty check­ing to be done, etc. But what you have before you sud­den­ly is a means by which one can do research about pref­er­ence, dis­trib­uted pref­er­ence, using online data. Which you prob­a­bly could­n’t do, at least as quick­ly, in any oth­er way. 

Now, what I want to do very very briefly is then con­trast dig­i­tal methods—which I will go into in more detail in a minute—with the oth­er par­a­digm, if you will, that came out of the social sci­ence begin­ning with anthro­pol­o­gists and then lat­er with a very impor­tant research pro­gram in Britain called the Virtual Society Program from 1997 to 2002 which I write about in a lit­tle book­let called The End of the Virtual. These are the sort of stan­dard ways in which one does kind of Internet-related research, with vir­tu­al meth­ods.” And what I would like to argue is that a lot of these vir­tu­al meth­ods are being in some sense port­ed onto, or trans­ferred onto, the Internet with­out nec­es­sar­i­ly the need­ed sen­si­tiv­i­ty of dig­i­tal cul­ture. And increas­ing­ly what’s hap­pen­ing with these kind of meth­ods is that what’s result­ing are not nec­es­sar­i­ly find­ings or ground­ed find­ings but rather indi­ca­tors. So we live in an age where the out­put of a lot of Internet-related research using vir­tu­al meth­ods are indi­ca­tors.”

And so what I want to talk a lit­tle bit about is how the meth­ods might change, or per­haps even should change. Or at least how oth­er meth­ods can live along­side vir­tu­al meth­ods. Now, one of the things that inter­ests me is fact-checking. Because fact-checking…I mean, not only in the US con­text where… Was it after—which pres­i­den­tial debate was it when factcheck​.org went down the next day or that evening because every­one was check­ing factcheck​.org and then Soros took over the domain for the night and it all became quite messy. And not because of fact-checking and its tra­di­tion­al asso­ci­a­tion with the blo­gos­phere. But rather as an every­day sort of method, right. Either for inves­tiga­tive jour­nal­ism more for­mal­ly, or for a lot of dif­fer­ent work that we do. 

It’s inter­est­ing that tra­di­tion­al­ly we ask at the end of the inter­view if it’s gone well, who else do we inter­view? And we ask the sec­ond per­son about what we found out from the first per­son. And this is how we snow­ball. Now, when we think about the online being mixed into this, we can look up peo­ple in advance. So I don’t know if you’ve looked up you know, me before this talk or what­ev­er. But then the ques­tion is does the order of check­ing now change? So after all of this, do you now— After the inter­view, do look the per­son up again to check the verac­i­ty or the con­text of what the per­son said in the inter­view, right. So where’s the base­line? Where’s the ground­ing going on? 

So, what I would like to talk about is to think about how the meth­ods, or at least how the sort of phi­los­o­phy or the­o­ry of meth­ods, might change if we begin to take the online a lot more seri­ous­ly. If we begin to take online data, Web data, more seri­ous­ly. Now what I would like to do is I would like to intro­duce to you a kind of method­olog­i­cal phi­los­o­phy which I have called dig­i­tal meth­ods.” And what dig­i­tal meth­ods does is it has a num­ber of principles. 

And the first one is, or the major one is, to fol­low the medi­um. To fol­low the medi­um. And to think that the medi­um itself has meth­ods built in, has in-built meth­ods. And so to think about what the medi­um has to offer in terms of meth­ods. And specif­i­cal­ly, dig­i­tal meth­ods has a par­tic­u­lar out­look or approach. What it does…like many soft­ware projects, what it does is it looks for what are the natively-digital objects that are avail­able. Links, tags, date stamps, edits, reversion—whatever; loads of them. It looks at what kind of natively-digital objects are on offer online. 

And then it asks itself the ques­tion of, how to the dom­i­nant devices han­dle these objects? What do search engines do with links, for exam­ple? How do the dom­i­nant devices online han­dle these objects? And then sub­se­quent­ly the ques­tion is, how can you repur­pose the meth­ods of the medi­um for social and cul­tur­al research? So it is a ques­tion of look­ing at how do we repur­pose a search engine? How do we repur­pose Facebook? How do we repur­pose Wikipedia? How do we repurpose…you name it. What can we build on top of these things? Or beside them. Or how can we learn from how they han­dle the natively-digital objects? 

And then the tricky part comes. When we make our find­ings, the ques­tion is are they ground­ed in the online? So we’re con­stant­ly in some sense play­ing epis­te­mo­log­i­cal chick­en. Do we need to go offline to ground? Or can we ground them in the online? And how con­fi­dent are we when we ground them in the online? 

So what I’m gonna do is I’m gonna take you through dig­i­tal meth­ods from sort of like the ground up, if you will. From some of the more basic ele­ments of the Web. Natively-digital objects: links, tags, etc. How do you study links, and make find­ings about them for social and cul­tur­al research? So I’ll go from like, the micro to the macro. So from the link… So how does Google…or how do search engines treat the links, and how can you learn from them? And what else can you do with them? The web site…I treat the web site as an archived object, and ask myself the ques­tion how does the Internet Archive, how does the Wayback Machine treat web sites? And how can we repur­pose how they treat web sites for oth­er pur­pos­es of research? Engines, etc. 

And what I’m gonna do, I mean I have a cou­ple in paren­the­ses. I won’t have time to treat them all but I’m gonna go through the link, the web site, the engine. I’ll just tell you that I also study spheres: the blo­gos­phere, the web­sphere, the news­sphere, tagos­phere, the image­sphere, the videosphere. I see spheres as engine-demarcated spaces. 

The webs; the Web these days is no longer in the sin­gu­lar but rather plur­al, large­ly because of geolo­ca­tion tech­nol­o­gy so that we have the emer­gence of nation­al webs. You’re in France typ­ing google​.com and you get redi­rect­ed to google​.fr. You’re sent home by default. So with geolo­ca­tion tech­nol­o­gy we now have the rise of webs.

I’ll talk about social net­work­ing sites and intro­duce you to a research prac­tice called post-demographics. How do you study Wikipedia? How do you repur­pose Wikipedia? How do you repur­pose Twitter? These are some of the things that the dig­i­tal meth­ods research pro­gram does. Each of these par­tic­u­lar lev­els, if you will, all have asso­ci­at­ed PhD can­di­dates with them and will be attend­ing the MIT 7 con­fer­ence in a cou­ple of weeks. 

The link. How are links sort of nor­mal­ly stud­ied, and how else can we study them using the insights from dig­i­tal meth­ods? Well, links tra­di­tion­al­ly have been stud­ied sort of like, two or three ways. From hyper­text the­o­ry of course you will know that the links have been thought of as sort of paths that when applied to the Web…sort of off-author paths, where the surfer authors one’s own sto­ry through the Web. It’s…a bit old-fashioned. I mean it’s old-fashioned not only because of the fact that surf­ing is dead. So there’s not habit­u­al vis­i­ta­tion of web sites. People longer surf, they… However they do WWILF. This is a sort of British term, WWILFing, I don’t know if you’ve heard of it. Stands for what was I look­ing for?” WWILFing. 

And also this speaks to these ideas of the cog­ni­tive impact of the Web and of engines. And also because engines increas­ing­ly orga­nize our paths, right. So it’s not the surfer with that will, but rather the engine as an order­ing device. Nevertheless, links are also tra­di­tion­al­ly stud­ied through small worlds and path the­o­ry, where what’s stud­ied is the opti­mal route. The opti­mal path between two— I mean, it’s inter­est­ing the…it was Barabasi in Linked: The New Science of Networks who wrote that Bill Clinton asked Vern Jordan to get Monica Lewinsky a job after the inci­dent, because Vern Jordan was the clos­est dis­tance of any­one to the Fortune 500 CEOs. Something like…they cal­cu­lat­ed this, he was 2.2 hand­shakes away. So this is path. That’s the path. 

And of course the social net­work analy­sis is clas­sic, is then one’s position…not the path, but one’s posi­tion. Is one cen­tral, is one periph­er­al, is one high­ly between, etc. And there­fore are you a bro­ker, or are you… 

What does the medi­um do? How does the medi­um treat links? And what can we learn from them? Well Google as the dom­i­nant medi­um device treats link as rep­u­ta­tion mark­ers, as rel­e­vance mark­ers. So what we did is we decid­ed to cap­ture links. And this is a pic­ture from 1999. This is one of the ear­li­er maps that we made where we’re look­ing at how sites link to one anoth­er num­ber. And on a very micro, a very fine-grained lev­el. You know, you’ve seen these sort of mas­sive link maps, right? And you’re like, what do they say? 

Well, I mean if you zoom, what they tell you about is a kind of micro pol­i­tics of asso­ci­a­tion, if you will. And it’s very nor­mal, as well. So who links to whom, and who does­n’t. The miss­ing links. So this is a clas­sic one. This is the multi­na­tion­al in yel­low links to Greenpeace; Greenpeace does­n’t link back. No way. And then both the multi­na­tion­al cor­po­ra­tion and the large NGO link to gov­ern­ment—those are all sort of gov­ern­ment or inter­na­tion­al orga­ni­za­tions. And gov­ern­ment does not link back, no way. And this is all very normal. 

This is an out­put of the Issue Crawler, issue​crawler​.net. It’s soft­ware that I devel­oped. Recently had its ten year anniver­sary. It’s a crawler. So you insert URLs. It crawls them. It grabs all the out­links of each of the URLs you’ve insert­ed. And then it does hyper­link analy­sis and it out­puts a vari­ety of visu­al­iza­tion. This one’s the clus­ter map. And what we’re map­ping here is the Armenian NGO space. So we inputted at all these Armenian NGOs, and they are in blue and red. And you see the net­work they orga­nize where the blue and red ones are quite inter­linked, and then they also link to a lot of inter­na­tion­al orga­ni­za­tions. A lot of UN orga­ni­za­tions. And a lot of donors and fun­ders. So all of the Armenians link to all the fun­ders and donors, and all the fun­ders and donors don’t link back. 

This is anoth­er map. On the left is the FATA net­work, on the right is the Hamas net­work. We took all FATA-related URLs, crawled them, and what you see here in FATA is a sort of civic web of links to news­pa­pers, media sources. Links to also local NGOs as well as inter­na­tion­al NGOs. Hamas is kind of under­ground. A very very differ—sort of under­ground way. It link only to sort RSS read­ers. That’s a very very dif­fer­ent style of link­ing, indi­cat­ing a very very dif­fer­ent style of com­mu­ni­ca­tion. And also one can draw— I mean, if one com­pares var­i­ous groups… If you com­pare Hamas to Hezbollah, they’ll have the same sort of link­ing behav­iors. All to RSS feeds, for subscribers. 

[indis­tinct ques­tion from audi­ence member]

Location-free. Hamas-related web sites and FATA-related web sites. 

[indis­tinct ques­tion from oth­er audi­ence member]

Well no— I mean, it’s… Well. So, Hamas has… And also a lot of the orga­ni­za­tions of that RSS ilk, have…yeah, about ten, fif­teen, twen­ty, twenty-five web sites, and then they’re in a vari­ety of lan­guages and a vari­ety of coun­tries. A vari­ety of top-level domains, coun­try domains. And then when you crawl them, what you find is that they only link to one anoth­er, and only link to RSS read­ers. They don’t link to any­thing else. Whereas FATA, all the FATA-related web sites, so those and those as well. They dis­close a very very dif­fer­ent kind of net­work, link­ing to the newspaper—and a very very dif­fer­ent kind of info­cul­ture, if you will, link­ing to news­pa­pers, to local NGOs, to inter­na­tion­al NGOs. 

What else can you do with links? This is work that I did for the OpenNet Initiative, which is the Internet cen­sor­ship researchers, the Berkman Center and the University of Toronto. Those folks asked me to try to come up with a way in which to con­tribute to Internet cen­sor­ship research using link analy­sis. And this par­tic­u­lar piece of work was inspired by an obser­va­tion that was in the Reporters Without Borders, rsf​.org, Paris-based orga­ni­za­tion, an obser­va­tion made in the cyber dis­si­dent hand­book (I think it was 2005) where the Saudi Minister of Information boast­ed that they were block­ing or cen­sor­ing 400,000 web sites. And the OpenNet Initiative, in their tra­di­tion­al method­olog­i­cal way, tra­di­tion­al sam­pling oper­a­tion, was check­ing 2,000 web sites per coun­try. And so I was like, well if they’re boast­ing that they’re block­ing 400,000 and you’re only check­ing 2,000 per coun­try, how do we build out the list? How do we dis­cov­er pre­vi­ous­ly unknown cen­sored web sites? 

What I did is I took one of the cat­e­gories of their web sites, put it into the Issue Crawler, crawled the web sites, and then I anno­tat­ed it, the map. And so what you see here are nodes in red that are blocked, cen­sored, in Iran. This is for Iran. In blue, sites that are not blocked. And then in red with those lit­tle pins on them, are sites that we dis­cov­ered were blocked, pre­vi­ous­ly unknown cen­sored web sites. 

How did we do it? Very very sim­ple. We ran them through one of our tools that we built, which just checks prox­ies. And this is the tricky thing, right. Can we ground this just through this kind of tool or do we need to go to Iran and sit at a com­put­er there and know for sure that it’s blocked? In any case, what the researchers in Toronto…they were check­ing the BBC and they kept con­tin­u­al­ly find­ing that the BBC was not blocked. And on our link map, the BBC page that was linked to as being most rel­e­vant accord­ing to the net­work actors was actu­al­ly the Persian lan­guage page. The reg­u­lar BBC site gets a response code of okay” where­as the Persian-language one is blocked in Iran. 

I’m just gonna move along, and if there are ques­tions we can prob­a­bly them at the end. The web site. How is the web site nor­mal­ly stud­ied? The web site is nor­mal­ly stud­ied in sort of usabil­i­ty cir­cles. I mean there’s a debate…or maybe the debate’s over, between the don’t make me think” school of thought ver­sus the poet­ics of nav­i­ga­tion. Actually I mean, it’s a nev­erend­ing debate—I guess it’s not over. 

Also the col­or… I don’t know if you know that the Web is blue, or pre­dom­i­nant­ly blue, if you do a sort of col­or analy­sis of the Web. And it’s inter­est­ing because even in sector-specific areas… So, med­ical sites, envi­ron­men­tal sites…you know, you think they would be pre­dom­i­nate­ly green but there’s a lot of blue in there. 

Eye track­ing. I don’t know if peo­ple are famil­iar with this work. This is a very famous heatmap with eye track­ing… The more red it is, the more atten­tion to the par­tic­u­lar spot. Then you see imme­di­ate­ly a sense of the real estate of a web page. This is a Western Web visitor. 

Site opti­miza­tion; SEO. Also try­ing to detect to opti­mize sites. Whether or not you can detect. So first of all there’s opti­miza­tion. And then there’s manip­u­la­tion. And then whether you can detect manipulation—that’s quite tricky, actu­al­ly. Whether or not— People say, Oh you know, search engine results, they’re all manip­u­lat­ed any­way.” Well…show me. It’s quite difficult. 

Site fea­tures. Now this is a clas­sic from a lot of social sci­ence and not even social sci­ence, where one makes a sort of code book with a long list of site fea­tures and you go through a num­ber of sites and you check off whether or not it has a fea­ture. And then you try to draw con­clu­sions. And some of the ones that I’m most crit­i­cal of are ideas that the more inter­ac­tiv­i­ty a set of sites have, the more par­tic­i­pa­tion, and then the more democracy…these sorts of things. Anyway, site fea­ture analy­sis is one of the more dom­i­nant forms of analysis. 

Now, I showed you this heatmap. I don’t know if you remem­ber the day when Google moved its menu upper left. I thought that was a sort of con­crete out­come of heatmaps. 

How else to study the web site? Now, fol­low­ing the dig­i­tal method sort of prin­ci­ples or pro­to­col, you think okay, what’s the dom­i­nant device? And for this one, arguably it’s the Wayback Machine, of the Inter— Arguably it’s the Internet Archive, and the way you get to the Internet Archive is through the Wayback Machine. So if you think about how does the Wayback Machine sort of orga­nize web sites? Well, it orga­nizes web sites…I showed you a pic­ture ear­li­er. It orga­nizes a web sites— You type a URL, hit return. And you see the his­to­ry of a web site as sort of columns; which ones are available.

And so what strikes the user of the Wayback Machine, for those accus­tomed to using search engines, is you type in a URL, not key­words. And so you type in a URL, you hit return, and then you get the pages from the past of this par­tic­u­lar URL. So in some sense the Wayback Machine has a par­tic­u­lar inbuilt his­to­ri­og­ra­phy. It orga­nizes the his­to­ry of the Web in single-site his­to­ry, like a kind of bio­graph­i­cal approach if you will. 

So what I thought to do was well, how can we fol­low the medi­um? How can we learn from the dom­i­nant device that treats web sites. And then how can we repur­pose it for the pur­pos­es of social research? So I’m gonna show you the out­come, it’s a three and a half minute video. What you can do is you can cap­ture a site’s his­to­ry, and you can replay it like time-lapse pho­tog­ra­phy, show­ing in the sort of bio­graph­i­cal tra­di­tion, show­ing how the life and times of a web site as also encap­su­lat­ing the life and times of the Web, in the clas­sic bio­graph­i­cal approach. 

So let me just show you… I’m want to pref­ace this very very briefly by ask­ing you, do you remem­ber the Google Directory? You know what a direc­to­ry is. A direc­to­ry is human edi­tors orga­niz­ing the Web accord­ing to sub­ject mat­ters and then per sub­ject mat­ter there are a series of web sites. I mean Yahoo pio­neered this, then there was lat­er DMOZ, the Open Directory Project. No? But it’s inter­est­ing because as the year go by—and this is the sub­ject of this short sketch, peo­ple don’t remem­ber that there were direc­to­ries, because it’s been tak­en over by the search engine. So I’m just gonna play this for you. 

This is inter­est­ing, maybe you saw the Google anniver­sary time­line, the ten-year time­line. It was some­thing that Google made. Anyway, this was specif­i­cal­ly an alter­na­tive his­to­ry to Google’s his­to­ry. And I want­ed to point out some­thing about the rise of the back­end just very very briefly. Now, if you go to Yahoo these days, they still have the direc­to­ry. It’s become increas­ing­ly com­mer­cial­ized, less and less robust, and Open Directory Project sim­i­lar­ly now becom­ing com­mer­cial­ized. Fewer and few­er expert volunteers. 

Anyway, if you go to Yahoo what’s inter­est­ing when talk­ing about or think­ing through the impact of the rise of the back­end and the rise of algo­rith­mic cul­ture, this is the list of human rights orga­ni­za­tions in Yahoo. And you’ll notice that…I don’t know if you can notice but per­haps, by default they’re list­ed by pop­u­lar­i­ty. By default. Not in alpha­bet­i­cal order. So the egal­i­tar­i­an alpha­bet­i­cal order list­ing, well-known from the his­to­ry of library sci­ence, ency­clo­pe­dias, etc., has giv­en way to the algo­rithm, to the hier­ar­chy based on rel­e­vance. However, it in this case is measured. 

The engine. Second to last one. How are engines nor­mal­ly stud­ied? Engines ini­tial­ly were stud­ied by the famous arti­cles in 1998 and 1999, Lawrence and Giles, one in Nature, in Science, as being…not that com­plete in their cov­er­age. I don’t know if you remem­ber these. So they came out, it was on all the news chan­nels, that engines only index some­thing like 30% of the Web. So the result of that was the cre­ation of a few ideas that still per­vade us. And one is the Dark Web. So there is this oth­er Web, the Dark Web. Which is also a sad Web, because it’s dark because it’s not linked to. So they’re orphan sites. So there’s all these sort of par­tic­u­lar kinds of aes­thet­ics asso­ci­at­ed with the Dark Web. 

But the oth­er one, more a sort of kind of info-political cri­tique was that engines, not only do they pro­vide infor­ma­tion, but they also exclude. So they exclude by not includ­ing. They exclude by not index­ing. And they also bury sites by not list­ing them very high up. 

That’s num­ber one. Number two I mean, often­times engines are stud­ied accord­ing to—and this is sort of Nicholas Carr and this inter­est­ing idea, that they encour­age atten­tion deficits. I mean like, yet anoth­er thing that does, right. But any­way, the engines in the way they are used encour­age atten­tion deficit. Why? Well, if you go to the stud­ies of how engines are used what you’ll find out is that increas­ing­ly over the last…I don’t know, eight years now I think, peo­ple are look­ing at few­er and few­er engine result pages, and click­ing on high­er and high­er results. And one of the things that Nicholas Car said—and this was not in The Shallows but in the Is Google Making Us Stupid? piece in The New Yorker—not The New Yorker—asked him­self the ques­tion whether engines are encour­ag­ing this kind of behav­ior in click­ing and all was caus­ing us to no longer be contemplative. 

Googlization…so Siva, a sort of col­league of mine whose last name I can nev­er pro­nounce. V, Siva V., com­ing out with a book called The Googlization of Everything very. So Googlization is a term that was coined by…well it’s a library sci­ence cri­tique. And it was coined right around the time when Google came out with the books project. And that was it. That’s when they crossed the line. You enter the library, and now we’re gonna start talk­ing about you in these kinds of terms, googlization. 

Googlization con­notes glob­al­iza­tion, hege­mo­ny, these sorts of ideas. And thus turns the Web more gen­er­al­ly and cer­tain­ly Google in par­tic­u­lar, into objects of mass media cri­tiques, right. So sud­den­ly, there’s talk of media con­cen­tra­tion. There’s a polit­i­cal econ­o­my cri­tique of the Web. There’s a dom­i­nant engine. In fact, there’s a dom­i­nant algo­rithm. Bing and Yahoo are basi­cal­ly try­ing to read your PageRank. And all alter­na­tive algo­rithms are in decline. Even the highly-touted Wolfram Alpha that came out not so long ago when every­one was like okay, this is an old-school kind of 50’s-sounding name, you know, real old infor­ma­tion retrieval. No. 

Surveillance stud­ies. Google often­times, or search engines, inter­est­ing­ly enough as bring­ing into being a new sub­ject. And that is the data body. I don’t know if this stuff was in the press a lot. The 2006 AOL search engine log data release, at the infor­ma­tion retrieval con­fer­ence in Seattle in 2006, AOL Labs, being good sci­en­tists, gave a gift to the sci­en­tif­ic com­mu­ni­ty that were logs. A lot of data. 500,000 users over three months—or was it six months?—of all their queries. And then each of the users was anonymized. A num­ber was put to them. So ever since then…in search engine stud­ies this is an exam­ple of how not to anonymize, but any­way. They were anonymized with a number. 

Now, just to give you a sense of these sorts of— User 3110455: how to change brake pads; Florida state cham—; how to get revenge on an ex; how to get revenge on an ex-girlfriend; how to get revenge on a girl­friend; replace­ment bumper for Scion xB. The inti­ma­cy on the one hand, and all the ama­teur detec­tive work that was then sub­se­quent­ly— I mean, peo­ple were fig­ur­ing out who these user— I mean, first The New York Times did it most famous­ly. But then lots of oth­er peo­ple as well. 

So any­way, engines by virtue of sav­ing log files and some­times releas­ing them and some­times not doing so well in their anonymiza­tion prac­tices, cre­ate anoth­er data body. So anoth­er col­lec­tion of data that rep­re­sents you, or is you, or can stand in for you, can have in some instances greater agency, like in iden­ti­ty theft. 

Now, I just want to touch real­ly real­ly briefly on—there’s a cou­ple of sort of solu­tions to this prob­lem. I don’t know if you’ll ever use them or if you know about them. 

Scroogle. Does any­one use Scroogle? Only those real geeky kind of para­noid folks. Scroogle sits on top of Google and you can query it, and it does­n’t place a cook­ie does­n’t. It does­n’t know your loca­tion. It’s sort of a covert user’s Google. And TrackMeNot—this is Helen Nissenbaum and col­leagues at NYU in Neil Postman’s for­mer depart­ment made a Firefox exten­sion that instead of the queries…in the back­ground, when you’re query­ing Google sends also ran­dom queries to Google. 

Okay so, how do we do Google? So I’ve been spend­ing a lot of time build­ing stuff on top of Google. And Google does­n’t like that. And they blocked me, a lot. And I’ll show you why. Apart from in the evi­den­tiary are­na, this is I think the first fully-documented case of the appar­ent removal of a site from Google results. So what you see before you is the PageRank for three web sites. The PageRank being if they’re top site they would get the rank of 1 in a results query. And you know, engines only serve a max­i­mum of a thou­sand results. So it says 6,700,000 and then some­one says oh, it would take thir­teen life­times to go through those results. No, they only serve a thou­sand results. So it would take you not very long. 

These are the result count, or the rank, of a site in Google for a par­tic­u­lar query. The green one is the New York City gov­ern­ment. The red one is 911truth​.org. And the blue one is The New York Times. The query is 9/11.” So since about ear­ly 2007 we’ve been sav­ing Google results for the query 9/11.” And also a bunch of oth­er queries too. 

And what you see here on September the 17th—this was in 2007911truth.org sud­den­ly went from its top-five rank­ing, to 200, to off the charts. And they stayed there for about two weeks. And then they returned to the top. 

So this opens up all sorts of ques­tions. Why did this hap­pen? The inter­est­ing thing is if you go to 911truth​.org they also noticed. But 911truth​.org of course, if you are famil­iar with them, is sort of quite a conspiracy-style orga­ni­za­tion. So they come up with this huge con­spir­a­cy the­o­ry of why it was that they were removed. So it’s quite tricky to enter into that realm when you have all this kind of con­spir­a­cy talk around why it is that they were removed. 

I think I know why. And that has to do with the web site tem­plate and the fact that 911truth​.org is a fran­chise orga­ni­za­tion. So you could start one up: mem​phis​.911truth​.org. And when you start one up, you auto­mat­i­cal­ly link to all its oth­er fran­chis­es. San Francisco, Boston, what­ev­er. And around this time, around the anniver­sary of 911, I sur­mise that a num­ber of fran­chis­es were start­ed. So it look like sud­den­ly, 911truth​.org and all their fran­chis­es were get­ting a lot of links, an arti­fi­cial­ly high count, and so there­fore they were demot­ed. That’s my the­o­ry. It is not a con­spir­a­cy the­o­ry. However we also blogged about it more seri­ous­ly than 911truth​.org. So it could be the Google read our blog. 

How else to repur­pose Google? I want to just very very briefly show you a new tool. And I built this I think about two years ago, and now it’s pret­ty sta­ble. It sits on top of Google, it’s called the Lippmannian Device. It’s named after Walter Lippmann. And in fact it answers a call that Lippmann made in quite a famous— Well not his most famous book…so the follow-up to the Public Opinion book, called The Phantom Public which is my per­son­al favorite—for Lippmann fans, it’s prob­a­bly your per­son­al favorite as well—where he goes on about not only cri­tiquing the means by which pub­lic opin­ion is formed, but also begins to call for what we end­ed up call­ing new equip­ment for inter­pret­ing and map­ping soci­etal controversies—I don’t want to just…throw around the word democ­ra­cy” too eas­i­ly. New equip­ment. And in par­tic­u­lar to pro­vide a means by which one can get a coarse sense of par­ti­san­ship. Is an actor par­ti­san or not? 

And so we built the Lippmannian Device. What it does is it sits on top of Google. And it mea­sures res­o­nance. So I’ll just show you imme­di­ate­ly. So this a source cloud. And what it shows is the num­ber of times a par­tic­u­lar source men­tions a par­tic­u­lar name. And the name in this par­tic­u­lar case is Craig Venter. You may know him, he’s the guy who sup­pos­ed­ly wants to take out patents on life. The syn­thet­ic biol­o­gy pio­neer. Has a few real­ly famous TED Talks. I mean if you get into the hier­ar­chy of TED Talks, Craig Venter is quite close to the top of them.

So what we did is we queried syn­thet­ic biol­o­gy,” we got all of the sources, the most impor­tant sources for syn­thet­ic biol­o­gy. Then we queried each of them indi­vid­u­al­ly for this name. So you see a sort huge dis­tri­b­u­tion of who rec­og­nizes Venter, who men­tions Venter, who pur­pose­ful­ly does not. So you get a sense of the extent to which Venter is impor­tant, sig­nif­i­cant, per source. 

Let me just show you how to do this. I’m going to show you very very briefly about the cli­mate change skep­tics. It’s every­one’s favorite. What we did is we tried to find out what are the most impor­tant sources on cli­mate change, and then do these sources rec­og­nize the skep­tics. Can we fig­ure out whether or not we can detect or diag­nose skeptic-friendly sources, quick­ly. So we queried Google. In fact, we queried Scroogle; this is what Scroogle looks like. The rea­son why we queried Scroogle is because it does­n’t give you per­son­al­ized results. It gives you pure Google results, if you will. There’s noth­ing pure about Google. And there’s noth­ing organ­ic about the results, noth­ing nat­ur­al about them, they’re all very high­ly syn­thet­ic. But any­way. It gives you deper­son­al­ized Google results. 

And they kin­da look like this. So what I did is I copied them. Select all, copy. I past­ed them into a tool called The Harvester. The Harvester is a real­ly fan­tas­tic tool because you can paste in all this stuff, includ­ing URLs. And then hit har­vest,” and it just gives you a clean list of URLs. This is a work­ing tool, which you can just use. You don’t need logins. 

[inaudi­ble ques­tion from the audience]

Yeah. I’ll tell you at the end. dig​i​tal​meth​ods​.net. I’ll tell you now. 

You take all those URLs, put them in the top box. The bot­tom box, put the names of the most promi­nent cli­mate change skep­tics. We got these names… You can get them a vari­ety of ways, we tri­an­gu­lat­ed three sources. Those names found in at least two sources we returned. And there you have the graph­ics, the output. 

Sally [Bariones?] you see gets men­tioned by hard­ly any of the top cli­mate change sites. But mar​shal​.org stands out. mar​shal​.org is a major skep­tic fun­der. It funds the skep­tic con­fer­ences togeth­er with the Heartland Institute. I’ll just show you these briefly. This is an inter­est­ing one, cli​mate​science​.gov jumps out. So you can get a sense of issue com­mit­ments, or par­ti­san­ship quite quick­ly, per source, using this tech­nique. Using the Lippmannian Device. 

Okay, the last one. Social net­work­ing sites. How are they are often stud­ied? The num­ber of times Erving Goffman is cit­ed in rela­tion to social media is quite a lot. And Presentation of Self, this kind of thing. That is one of the dom­i­nant approach­es. Another one is to sort of think of social net­work­ing sites as some­how reen­act­ing dif­fer­ent sort of cul­tur­al clash­es. I mean, my favorite one is a sto­ry that was told in one of danah boy­d’s blog post­ings, about how the US mil­i­tary banned MySpace and did not ban Facebook. And MySpace was used pri­mar­i­ly by the enlist­ed folks, where­as Facebook was used pri­mar­i­ly by the offi­cers. So again, you get this sort of class strug­gle enact­ed. There’s also the dis­tinc­tion between friends and friend­ed friends. It’s also the impact of defriend­ing, the ampli­fi­ca­tion effects, these sorts of things. 

How else might they be stud­ied? Thinking through, fol­low­ing the dig­i­tal meth­ods prin­ci­ples of okay, fol­low the medi­um. What natively-digital objects are avail­able? How are they treat­ed by the dom­i­nant devices? We came up with the notion of post-demographics. The sort of natively-digital object dom­i­nant in social media is the pro­file, if you will. Now what’s inter­est­ing about pro­files is that they pro­vide all these dif­fer­ent inter­ests. Kind of media inter­ests. And then, pro­files have friends. 

So what we did… I mean this was more of an art project—this was in a few art magazines—is we cre­at­ed a means by which we can see what the inter­ests are of the friends of Obama and McCain, in this par­tic­u­lar sense. I mean, you can… We also did what the inter­ests are of the friends of Islam and Christianity, for exam­ple. You can do a range. But any­way, just to give you a sense.

So this sat on top of MySpace until MySpace change their query string. We can’t tweak it again. They kind of just shut us down, basi­cal­ly. Nevertheless. What we did is we took in this case the top 1,000 friends of Obama and the top 1,000 friends of McCain. We aggre­gat­ed their pro­files. And we then ranked the inter­ests and pro­vid­ed aggre­gate pro­files of the friends of the politicians. 

And then we also did a com­pat­i­bil­i­ty check. Will the friends of Obama have sim­i­lar inter­ests to the friends of McCain. And we call this post-demographics. So it’s in some sense the study of the orga­ni­za­tion of groups, not accord­ing to age, gen­der, income, lev­el of edu­ca­tion. But rather accord­ing to the data that’s reg­u­lar­ly giv­en online through social media. Interests, movies, books, etc. 

Anyway. So I just want­ed to men­tion real­ly real­ly briefly Obama, the friends watch on TV The Office, The Daily Show, Lost. And the friends of McCain are into Family Guy, Project Runway, America’s Next Top Model, Desperate Housewives.

So you get a real sense. And then you can do this… I mean, so it’s like are there divi… I mean, you see quite a divide here, cul­tur­al divide, between— And you can do this for oth­er cul­tures. I mean I did this also for Fatah and Hamas, odd­ly enough. And you see far more over­lap. Same inter­est, same movies. 

Okay just to con­clude, the idea of dig­i­tal meth­ods is to take seri­ous­ly Web data. And to think about the Web, or the Internet, not as this sep­a­rate realm, not as the vir­tu­al, not as some­thing that has an asterisk—not some­thing that you only study for its cul­ture in and of itself, but rather to take Web data seri­ous­ly as means by which one can study soci­ety and cul­ture more generally. 

But how to do that? Well, one way of doing it is not nec­es­sar­i­ly to import the stan­dard meth­ods or port them onto the Internet, because what you get are only indi­ca­tors. And you get a lot of prob­lems as well. But rather I pro­pose that a research prac­tice where you actu­al­ly fol­low the medi­um and think about the meth­ods in the media. And I have laid out for you a prac­tice where­by one looks at the natively-digital objects, how dom­i­nant devices han­dle them, and then how you can learn from them and repur­pose them in order to under­take social research.

And then the last sort of trick… And it’s going to be end­less­ly tricky, and end­less­ly debat­ed, is whether or not you can ground your find­ings in the online. Or whether you need to go offline in order to ground them. If we have anoth­er chance at some oth­er time, venue, place…happy to tell you about approach­es to study­ing these oth­er things. [Severs?], webs, Wikipedia, as well as Twitter. But for now, thank you. 


Moderator: Thank you very much for a fas­ci­nat­ing talk. Questions? Comments? 

Audience 1: Richard, thanks very much. So one… I’m curi­ous, your chronol­o­gy says around 2007 things changed. And indeed they do in a lot of ways. And the tools you’re show­ing here are one sign of those changes. The emer­gence of tools like… Oh, stuff like what, Newsglobe, News Positioning System, MediaCloud—I mean there’s dozens of these things that sort of scrap news, that sort of look at the feeds, whether it’s from the world’s var­i­ous wire ser­vices or whether it’s the des­ti­na­tion and tar­gets cities— There’s a lot of real­ly inter­est­ing ways to play with the data. 

And I won­der if that hasn’t…you know, this is coin­ci­dent with the rise of this crit­i­cal dis­course of Googleization. That oh, Google’s so flat and so com­mer­cial, and so one-size-fits-all. And I won­der if it has­n’t been relieved of a bur­den to actu­al­ly be sharp­er or be…pretend to be more objec­tive or what­ev­er that objec­tiv­i­ty would be. In oth­er words, isn’t there a kind of rela­tion­ship between the rise of all these highly-specialized tools that allow us to make data dance and allow us to… We have quite a bit of inde­pen­dence about where we draw our data from. With on the oth­er hand the kind of both demo­niza­tion and flat­ten­ing of some­thing like Google. It’s that rela­tion­ship I guess I’m inter­est­ed in.

Rogers: So… Yeah thanks. So I mean I think what’s… I mean…does any­one wan­na answer that? I think— Because it’s a real­ly dif­fi­cult ques­tion. I mean, first of all, Google has tak­en itself off the hook recent­ly. And they’ve done so in an extreme­ly clever way. I wrote a piece called The Inculpable Engine, and it’s about Google. And the rea­son why I call it the incul­pa­ble engine is because now we are coau­thors of our results. So, with the rise of per­son­al­iza­tion, now the results are part­ly our own, of our own mak­ing. And then we stud­ied it empir­i­cal­ly and that’s anoth­er sto­ry. But in any case.

So, there is no longer one set of Google results that one then can cri­tique for the new hier­ar­chies. So I mean, this is how I start­ed my work on infor­ma­tion pol­i­tics, as it is called. The book that came out in 2004, 2005. I start­ed that book with the obser­va­tion right around—I think it was 2003 I typed ter­ror­ism” into Google. Terrorism. And what I got back was white​house​.gov, civ​.gov, fbi​.gov, Heritage Foundation… CNN and Al Jazeera. The top 20. And I said oh gee you know, it’s just like the TV news. So Google is begin­ning to align itself with the kind of—or out­put sources which are quite famil­iar to us. And so then could be cri­tiqued. So no longer was the Web pro­vid­ing a diver­si­ty of view­points, etc., if one saw the Web as some­thing that was most sig­nif­i­cant­ly in some sense orga­nized or even offered by engines. 

However, all those inter­est­ing cri­tiques that could be made no longer apply as force­ful­ly because of per­son­al­ized results. So any­way. So I think Google—cleverly—has tak­en itself off the hook, has become increas­ing­ly incul­pa­ble in terms of the cri­tique of the results in a sort of infopo­lit­i­cal sense. I mean, it’s become the object of cri­tique in many many oth­er sens­es, but its core, what it does apart from serve adver­tis­ing and sort of…you know, info results. So it’s becom­ing increas­ing­ly sort…I don’t know, incul­pa­ble is the term that use. 

That’s one thing. But then the rise of the… So the oth­er thing that struck me is the rise of the tools and all the visu­al­iza­tion. So the rise of infoviz and dataviz, alright. So there’s just huge, real­ly huge areas. And the’re only now begin­ning to be cri­tiqued. I mean, there are a lot of pent-up, I think, there’s a lot of pent-up cri­tique wait­ing to burst out. I don’t know, maybe it’s well-developed here. But in a lot of cir­cles that I’m famil­iar with like, peo­ple are dying to hate the rise of infoviz but they don’t really…haven’t for­mu­lat­ed it yet, you know. 

Well, I mean there are a num­ber of cri­tiques of infoviz. One that is begin­ning to emerge for me is the amount of spu­ri­ous­ness, or the amount of…it’s the cel­e­bra­tion of ama­teur data analy­sis is what’s quite inter­est­ing. Gapminder. So with Gapminder you can take any two vari­ables…any two. Any two… I may have made my point. 

Okay. I mean, but the rela­tion between Google and the rise of datav— I mean, Google also of course does a lot of dataviz and infoviz. I mean, I haven’t thought through that rela­tion­ship yet, but anyway. 

Are there oth­er questions? 

Feel free to bring up any­thing. We have a Internet…guy.

Audience 2: Well prob­a­bly along the lines of the tools and espe­cial­ly the cri­tique of the visu­al­iza­tion tools and infoviz and so, as you men­tioned this is has had a huge rise. And last year we had a visu­al­iza­tion con­fer­ence on visu­al inter­pre­ta­tions and actu­al­ly also the cri­tique of that here—

Rogers: Oh, good.

Audience 2: —at MIT. And Johanna Drucker, you know, offers a very dis­tinct cri­tique of the data that’s being fed into those tools as on the one hand being already at the inter­pre­ta­tion, or not mak­ing it trans­par­ent where the inter­pre­ta­tion part comes in. So you know, that’s one of the ques­tions also here, you know. Sort of, to what extent can we see it also in the research, what is the data that’s fed into these tools that then give us those results? That’s one question. 

Another ques­tion that I had in terms of the dark side of the Web. What do we do with the oth­er 70%?

Rogers: That’s no longer true. 

Audience 2: Yes. 

Rogers: It’s no longer true. If you talk to Google engi­neers, they’ll tell you that they’ve basi­cal­ly indexed it all. I mean, that the Web that’s not indexed is only one click away. So it’s…pretty much all indexed. I mean of course that’s just mas­sive. But that’s no longer the case, at least accord­ing to the web sci­ence I know. But… 

Okay so, dataviz, infoviz… So let me just say a lit­tle bit about my research prac­tice in rela­tion to what one might think of when one thinks of dataviz and infoviz. So num­ber one is I make bespoke tools, or what Clay Shirky once called—I mean, I thought this is a very clever term some years ago, and peo­ple don’t use it: sit­u­at­ed soft­ware. So, it’s soft­ware where the research ques­tions in some sense, and the approach, are all built in. Now, that’s very very very very dif­fer­ent from Many Eyes or what­ev­er, where they’re tool­box­es, right. 

So the stan­dard way of think­ing about it is that here all these worl— Go and visu­al­ize away. And then— I mean Many Eyes is kind of inter­est­ing because it gives every­one a lit­tle les­son in the kind of data sets that match with cer­tain visu­al­iza­tion types. I think that’s one major con­tri­bu­tion of Many Eyes, is actu­al­ly teach­ing that. But any­way, so my research prac­tices is very bespoke, is very sit­u­at­ed, in the sense that the meth­ods are built in. And then the oth­er thing that’s dif­fer­ent is that they do the data col­lec­tion, the analy­sis, and the visu­al­iza­tion all togeth­er. So it does­n’t sep­a­rate the data col­lec­tion: go out there and get the leaves and the acorns or what­ev­er; bring em back; and lay them out. So that’s the dif­fer­ence. These are all-in-one tools, and not all-purpose. So that’s a real­ly big dif­fer­ence. I mean I only showed two tools. I showed the Issue Crawler, and I did­n’t real­ly show you how it worked, I just showed you some out­put. And I showed you the Lippmannian Device and I showed you how to use it. 

But if you go to dig​i​tal​meth​ods​.net you’ll notice that there are about thir­ty tools at dif­fer­ent… Yeah, some are very sim­ple. In fact I find them all very sim­ple. All very sim­ple things. But any­way, they’re open and usable and we main­tain them all. 

Audience 3: [indis­tinct] —you just said if— On the point of your data being used with­in the tool, the method and the data togeth­er. Does that mean that it’s… I’ll make sure I under­stood it. You could only do that with born-digital materials. 

Rogers: Right.

Audience 3: You could­n’t do that with data that you’ve digitized…

Rogers: Correct.

Audience 3: …and applied a tool to afterwards.

Rogers: I’m real­ly glad you said that, yeah.

Audience 3: Is that right?

Rogers: Yeah.

Audience 3: Okay. 

Rogers: But I mean that’s very— I mean, maybe that’s some­thing that I should make even more explic­it. Thank you for that. So, all of this work that I pre­sent­ed to you is analy­sis of the native­ly dig­i­tal. I mean I use that term, it sounds very provoca­tive or very—“natively dig­i­tal.” But any­way. But that term does is it makes extreme­ly clear, I hope, that it’s not dig­itized. So a lot of the dig­i­tal human­i­ties work, arguably…all of it, or most of it, relies on dig­i­tized books and dig­i­tized— I mean cul­tur­al ana­lyt­ics as an approach. Lev Manovich; you will have heard of his, per­haps. It’s all dig­i­tized mate­r­i­al. So, Rothko paint­ings, cov­ers of Time mag­a­zine. And then we have this dig­i­tized mate­r­i­al. And then we use—which is a sep­a­rate data set. Import that into visu­al­iza­tion tools. It’s not the research prac­tice that I do at all. I do the…well I explained it. I hope. 

Audience 4: Although increas­ing­ly in most cul­tur­al sec­tors, there is digitally-born films and video as opposed to the stuff that’s been port­ed back over. Causing no end of mis­ery for the folks work­ing on it. So, as a his­to­ri­an maybe just kind of a naive ques­tion but… I mean Pentium is what, 93, 94, Mosaic. So we’re not even talk­ing about a twenty-year win­dow here. And you’ve mapped a tra­jec­to­ry of kind of steps. Is what we’re see­ing here about the affor­dances of bet­ter band­width, faster proces­sors, the abil­i­ty to manip­u­late more mate­r­i­al that we have access to more quick­ly? Is this about a gen­er­a­tional shift? Folks who’ve grown up in this era and have a kind of flu­en­cy and facil­i­ty that some oth­er folks lack? Is it about— I mean it seems to me that it’s tak­ing the con­tours of a kind of epis­te­mo­log­i­cal shift in terms of what con­sti­tutes knowl­edge, the way we’re ask­ing ques­tions about it. But…probably a bunch of oth­er ways to think of it. But were you to sort of look for causal­i­ties or fac­tors that help to chart that move­ment from say the 93, 94, the emer­gence of the Web and where we are now, how would you account for that in broad terms?

Rogers: So, I mean, one of the efforts— I mean I’m not gonna— I could’ve rehearsed the argu­ment I made but one of the ini­tial efforts was to try to show over the last I don’t know, fif­teen years what’s changed in think­ing about what to do with the Internet in terms of research. And I think there has been a shift and I think that it’s hap­pened fair­ly recent­ly, where­by we—and it’s tak­en a long time, where­by we no longer nec­es­sar­i­ly first think about the Internet or the Web as being the realm of pirates, pornographers…you know…rumormongers…the jun­gle… You know, all this stuff. But it’s still there, you know. I mean you hear it in the argu­ments in Congress often about using these sorts of…“Oh, did you get that from some blog­ger or what­ev­er.” So the his­tor­i­cal­ly low epis­te­mo­log­i­cal val­ue of the Internet in gen­er­al, I think that is slow­ly start­ing to change. And that’s very very recent. So I would say it’s about the…sort of from a his­tor­i­cal point of view, the slow nor­mal­iza­tion. It still has the aster­isk, still a lit­tle bit different—I mean, the slow nor­mal­iza­tion of this tech­nol­o­gy in the his­to­ry of tech­nol­o­gy and the his­to­ry of indi­vid­ual tech­nolo­gies. I mean, who’s the most famous his­to­ri­an that talked about this, Thomas Hughes maybe? talked about… And then in sci­ence it was Kuhn of course, with nor­mal science. 

So the slow nor­mal­iza­tion of the Internet and then sud­den­ly peo­ple say­ing okay, so…the estab­lish­men­t’s online. Online is also very nor­mal so we can use the—you know, we can go online and it’s trust­wor­thy. We can go to a gov­ern­men­tal site, it’s trust­wor­thy. So that took a while, and now I’m think­ing about okay, we can use the data online. That’s what’s dif­fer­ent. So it’s com­put­ing pow­er, although it helps. I think that’s dif­fer­ent. I think it’s the mind­set that’s changed. Or think it’s the— The end—well, I call it the end of cyber­space. Or I call it the death of cyber­space, in fact. I’m a lit­tle bit more dra­mat­ic about it. 

I call it the death of cyber­space, and there’s a num­ber of rea­sons why it’s died. The fir— I mean it start­ed— I mean, shall I just…briefly? It start­ed when there was a law­suit by two Jewish groups, Jewish NGOs in Paris against Yahoo, because Yahoo was mak­ing avail­able on their web site pages for Nazi mem­o­ra­bil­ia. This was 2000. And they were sued in France—Yahoo USA, or Yahoo was sued. 

And what came out of that was geolo­ca­tion tech­nol­o­gy. Like, specif­i­cal­ly. So French users would be loca— Okay, you’re French. So you can’t see these pages. And so from there came sort of the rise of what I call the nation­al webs. I did­n’t get into it in my talk, but… So what we have is slowly…with the reg­u­la­to­ry frame­works, with the legal frame­works, etc. being applied to the Internet, we had the slow and grad­ual but indeed steady and sud­den death of cyber­space. So I think that’s the dif­fer­ence. So cyber­space needs to die before we can use the Web as a data set. 

Audience 5: I was won­der­ing how we access the Internet. I was think­ing about wire­less devices. So how we access the net, would that impact your con­cep­tu­al­iza­tion of dig­i­tal methods?

Rogers: Good ques­tion. Um… 

I don’t have an answer to that ques­tion. I mean, I’ve thought about it a bit—

Audience 5: [indis­tinct comment]

Rogers: No no, but I mean…um… So… So I mean, since I’m at MIT… So, one thing that I will say it is very inter­est­ing to look at, in devel­op­ment stud­ies cir­cles, the debate between the One Laptop Per Child ver­sus the mobile phone, right. So obvi­ous­ly, if you have far more users of mobile phones, which we do, than of com­put­ers, then one would think about the need to study the data gen­er­at­ed through mobile phone use, broad­ly con­ceived, whether it’s kind of mobile Internet-related or not. And you know, think­ing through specif­i­cal­ly okay, what if we applied these method­olog­i­cal prin­ci­ples to those to that sit­u­a­tion. So that’s the chal­lenge. And that’s how I’ll answer your ques­tion. It’s a challenge. 

Audience 6: There might be anoth­er aspect to because there’s cur­rent­ly a debate going on, it was spear­head­ed by Tim Berners-Lee, that the app-driven mobile phones are destroy­ing the Web. So peo­ple are no longer using search engines or the Web in gen­er­al in order to find infor­ma­tion but highly-specialized apps to tap into spe­cif­ic pieces of infor­ma­tion that lie on the Internet. So that might have an influ­ence also on the research meth­ods and also how peo­ple per­ceive also what’s out there on the Web. 

Audience 7: The com­mod­i­fied side of the tools [indis­tinct]

Audience 8: Or a dif­fer­ent ver­sion of the death of cyberspace.

Rogers: But I mean Tim Berners-Lee is the great pro­tec­torate of the Web, huh.

Audience 6: Right.

Rogers: So it’s like [indis­tinct]. Yeah no, I mean that is a… Is that a Wired arti­cle or some­thing, the apps are com­ing,” or I for­get. It was…

Yeah I mean I don’t real­ly have view on that. I mean it’s more like sim­i­lar­ly, right. So if apps become dom­i­nant, or if par­tic­u­lar types of apps, then you can still apply the same prin­ci­ples. You can see the extent to which the prin­ci­ples will work or not, con­tin­ue to work. So you know, fol­low the medi­um, etc. So I’ll fol­low them, the apps. The rise of the apps. 

Any other…anything else? 

Moderator: Otherwise, thank you very much Roger for a fas­ci­nat­ing talk.

Roger: Yeah, thank you. Thanks.

Further Reference

Podcast episode page