I’ve learned so much already this cou­ple of days days at PopTech, and already some of the things that I’m learn­ing are offer­ing me a lit­tle bit of clar­i­ty. For exam­ple, Moran Cerf yes­ter­day told us that there are actu­al­ly two voic­es in our brains that are help­ing us to make deci­sions. In this case, I have me on the stage stand­ing here right now, and then I have the me that decid­ed that this slide would be a good way to start off the presentation. 

Big data? I think all of us have prob­a­bly seen pre­sen­ta­tions already about big data. Big data in busi­ness, big data in sci­ence, big data in piz­za deliv­ery. Whatever those things are. But I think one of the things that we haven’t heard very much about is what is the sub­jec­tive expe­ri­ence like of liv­ing in this world of big data? What is it like to be us liv­ing in this ever more com­pli­cat­ed world?

I do a lot of trav­el­ing, and and so maybe this is an exam­ple with some bias, but I think if we imag­ine the expe­ri­ence of being in an air­port, we might start to under­stand this data expe­ri­ence. First of all because there are a lot of sys­tems that are trans­par­ent to us that are hap­pen­ing around us. Our bag­gage is mov­ing on carousels. There are secu­ri­ty agents who are herd­ing us through line-ups, and so on and so on and so on.

Second of all, there’s a loss of con­trol there. I think it’s one of the only places that we vol­un­tar­i­ly give up con­trol in our lives, in an air­port. We’re put into line-ups. We’re kind of direct­ed. We’re put on to these planes in these kind of data pack­ets that are then sort­ed into anoth­er air­port and offloaded, and so on and so on and so on.

Maybe more impor­tant­ly, though, is this idea that we’re part of a sys­tem that we can’t pos­si­bly imag­ine the mag­ni­tude of. Right now, as we speak, there are more than a mil­lion peo­ple in the air. The graph­ic that you see behind us is this respir­ing sys­tem of air­planes land­ing and tak­ing off at fifteen-minute inter­vals. There are thou­sands of air­planes in the air right now. So I real­ly think that that this idea of being caught in a sys­tem which is very com­pli­cat­ed, and too com­pli­cat­ed for us to under­stand real­ly mir­rors this expe­ri­ence of big data. 

The American nov­el­ist David Foster Wallace was very pres­i­dent about this. He was asked by his edi­tor when he was writ­ing Infinite Jest about why he put so many foot­notes in the book. The foot­notes in that book are real­ly incred­i­ble. Sometimes the foot­notes have foot­notes, and occa­sion­al­ly those foot­notes have foot­notes, as well. And he said that one of the things he want­ed to do was to mim­ic the infor­ma­tion flood and data triage that he expect­ed to be an even big­ger part of life fif­teen years hence. Infinite Jest was writ­ten six­teen years ago.

I’ve been real­ly excit­ed about this idea of big data since I had a con­ver­sa­tion about three years ago with this man, László Barabási. We were work­ing on a project for Wired mag­a­zine in the UK, where I was the edi­tor of that some­what unfortunately-named sec­tion called Infoporn. I worked with doc­tor Barabási on a project to help show some of the results from a project that he was work­ing on about human mobil­i­ty. And we had the real­ly good luck of work­ing with this data set which is one of the largest data sets of cell phone usage from an unnamed European coun­try. The coun­try has a name, I’m just not allowed to tell you what it is. And so the first thing we see is this graph, which is not a very excit­ing graph.

What this is, is this is a seg­ment of about fifty thou­sand of those peo­ple, and as the graph gets taller on the left, those peo­ple talk a lot on their phones. And as it gets short­er on the right, they don’t talk very much at all. And the peo­ple on the left, they talk eighty-four hours a week on their phone, and the peo­ple on the right are not real­ly talk­ing at all. 

So, on the left-hand side of this graph­ic, I took a whole bunch of these sec­tions and stacked them up so that we could just see the rich­ness of this data. There’s a lot of data, and for each per­son in that data set, we were able to see their call­ing his­to­ry over time, piece it togeth­er, see exact­ly how they were call­ing and who they were calling. 

But maybe more inter­est­ing­ly is the thing that hap­pened on the oth­er side of the page, for which I built these lit­tle cubes which I call mobil­i­ty maps.

And so what we’re see­ing here is we’re see­ing a cube that shows a sin­gle per­son­’s trav­el over about four days. This per­son is clear­ly a com­muter. They trav­el back and forth from one loca­tion to anoth­er. But in that data set, we were able to pro­duce these mobil­i­ty maps for every­body in that data set. Tens of thou­sands, hun­dreds of thou­sands of peo­ple, and see how their lives can be rep­re­sent­ed in this real­ly sim­ple form. And one of the things that sur­prised me and that sur­prised the Barabási group was that there was a lot of pre­dic­tive­ness in this data. And so I start­ed get­ting this idea that we could look at the data trails we were leav­ing behind, and we could start to con­struct things from them. So the next few projects that I worked on car­ried through that idea. 

This is a project called Just Landed. How many of you are on Twitter? Probably most peo­ple are on Twitter. So, you’ve read those tweets, that some­body says, I just land­ed in Hawaii. We’re stuck on the run­way for twen­ty min­utes. This is real­ly irri­tat­ing.” These rich white peo­ple tragedy quotes, right. So I thought it would be real­ly inter­est­ing to take those kind of real­ly self-serving things, these thinly-veiled show-offs and put them togeth­er into a map. Maybe we could recre­ate human mobil­i­ty sys­tem by see­ing how peo­ple are show­ing off about their trav­el around the world.

Maybe a lit­tle at a lit­tle less mean­ly, I also put togeth­er this project, called called GoodMorning!, which looks at every­body say­ing good morn­ing” to each oth­er on Twitter. So, here’s every­body in the world in 2009, so more than three years ago, all say­ing good morn­ing.” The green peo­ple are get­ting up ear­ly and say­ing good morn­ing, the orange peo­ple a lit­tle bit lat­er, and the red peo­ple real­ly late. When we look at the United States, you real­ly see the red on the West Coast and the green on the East Coast.

And I’ve been car­ry­ing these ideas through­out my work ever since then. At the time we built this project, work­ing togeth­er with a sta­tis­ti­cian named Mark Hansen and with the rest of the real­ly tal­ent­ed team at the R&D group at the Times, we built this project called Cascade. And what cas­cade does is it looks at con­ver­sa­tions about Times con­tent on Twitter. And we’re able to recre­ate every sin­gle con­ver­sa­tion that hap­pened about every piece of Times con­tent, in real-time.

So, we’re look­ing at a sto­ry which is a cou­ple of years old here, but it’s an inter­est­ing one for a cou­ple of rea­sons. Just to explain what we’re see­ing, on the left-hand side is the very birth of this con­ver­sa­tion. And then now we’re about twenty-four hours into the con­ver­sa­tion. As degrees of sep­a­ra­tion get above, we’re going from one per­son to the oth­er, who tweets to the oth­er per­son, who tweets to the oth­er person. 

But real­ly what we see is we see the archi­tec­ture of dis­cus­sion. We see some­thing that we’ve nev­er seen before. So excit­ing. This was one of my favorite projects to work on. I felt like an archae­ol­o­gist, expos­ing things for the first time that we’ve nev­er seen before. 

But, some­thing always set real­ly uneasi­ly with me about this project and with the oth­er work that I showed you, and that is that large­ly this work depends on oppor­tunism. We’re depen­dent on tak­ing peo­ple’s data with­out telling them about it. And even though Twitter is overt­ly pub­lic, that I don’t think makes me feel a lot better.

I was speak­ing at a con­fer­ence a cou­ple of years ago, and some­body said this for the first time, I think the first time that I heard it: data is the new oil. And peo­ple…they clapped. They were excit­ed about data being the new oil. I think they were think­ing about this, right?

JR Ewing from the TV show Dallas laughing and holding  a handful of money

Whereas I was think­ing about this, but in the con­text of this:

We did­n’t do very well with oil. And to sug­gest the data can be the new oil I find frankly ter­ri­fy­ing. But maybe there is a piece of this anal­o­gy that works for us, because oil is com­posed of all these tiny microor­gan­isms, these pre­his­toric microor­gan­isms that have been com­pressed into this sort of valu­able resource.

Data con­sists of frag­ments of our lives. The valu­able data that we’re talk­ing about con­tent con­sists of frag­ments of our lives that are being com­pressed into this valu­able resource. Now, maybe it’s the Canadian in me, but I’m not sure that I trust cor­po­ra­tions to take charge of this type of resource. And I’m real­ly inter­est­ed in how we can do a bet­ter job with data than we did with oil.

Do you want to stop dif­fer­ent trans­mit­ted diseases?

Do you want to design bet­ter cities? Do you want to stop traf­fic jams?

The data to do so is there in pri­vate hands, and we need to iden­ti­fy some social con­sen­sus by which the data can be shared with the dif­fer­ent stake­hold­ers who can take advan­tage of that.
edge​.org, Thinking in Network Terms, A Conversation with Albert-László Barabási”

So, László Barabási, who we men­tioned before, in an inter­view last month had some real­ly inter­est­ing things to say about data and its val­ue to us. He says if we want to do all these great things with data, we have to come to a social con­sen­sus, because this data is valu­able and it’s owned by all of us col­lec­tive­ly. So how do we come to a social con­sen­sus to make sure that that data can be used for good and not nec­es­sar­i­ly only for prof­it. Well, three things, I think. 

Data own­er­ship. We have to get peo­ple used to the idea of own­ing their own data. It is your data, you should own that data, and that’s not the way it works right now. So, at the Times, we put togeth­er this project called Open Paths [blprnt​.com blog post] which allows you to store your loca­tion data secure­ly and share it, if you wish, with resources. It’s the if you wish” that’s impor­tant there with with researchers. So you can share this if you’d like to. So please down­load the app, start record­ing your loca­tion data. It’s real­ly fun to explore. And then share that data, if you wish.

The sec­ond thing that we need to real­ly be talk­ing about his date and ethics, because I think ethics have been very, almost all, lack­ing from this con­ver­sa­tion. And it’s real­ly impor­tant that as con­sumers of data ser­vices we start to make deci­sions based on ethics.

And then final­ly let’s get back to the first thing that I talked about, which is this sub­jec­tive expe­ri­ence of liv­ing in a data world. I’m real­ly real­ly real­ly con­vinced that the only way we can reach this con­sen­sus that we’re talk­ing about is by shar­ing with peo­ple and expos­ing to peo­ple what is hap­pen­ing in this data world. And that’s I think where the role of data art comes in. 

I come here today because I’m excit­ed about data but also because I’m ter­ri­fied. I’m ter­ri­fied that we are hav­ing progress with­out cul­ture in the world of data. And as we’ve seen with these failed indus­tries before, progress with­out cul­ture does not work. And there’s a lot of pow­er­ful peo­ple in this room, and if I can leave you with one thing, let’s try to bring cul­ture into our dis­cus­sion with data. And let’s try to not make the same mis­takes with this new resource that we have with the last ones.

Thank you.

Further Reference

This pre­sen­ta­tion at the PopTech site.