I’m going to talk about this bot @congressedits. My apolo­gies to Eric. I think Eric saw me use these slides in DC at the DC Hack and Tell meet­ing a cou­ple of months ago.

I’m not sure that @congressedits is cool in what it does, because real­ly it’s just a copy of anoth­er bot. So back in August, this guy in the UK called Tom Scott, he cre­at­ed a bot that that when­ev­er some­body from the UK Parliament edit­ed a Wikipedia page anony­mous­ly, it would tweet it, and he basi­cal­ly used IFTTT to do that. And it gave me the idea that it would be kin­da cool to do that for the US Capitol. If you want, lat­er I can tell you what he did with IFTTT but you’re prob­a­bly famil­iar with it already. The thing that he did was, there were two IP address­es that he knew about, and their change feed URLs at Wikipedia. You can basi­cal­ly give it an IP address and it’ll give you a list of changes. So he just hooked up those syn­di­cat­ed feeds to IFTTT and then had them tweet.

I cre­at­ed @congressedits, which was a copy of that. The prob­lem with Congress is that I did­n’t know what the IP address­es were, but luck­i­ly I knew some­body who did know what the IP address­es were. So this guy Josh Tauberer who runs a site called GovTrack—he’s been run­ning it maybe a decade or so—but over the years he’s col­lect­ed these IP address ranges. So what he does is he scrapes [] and oth­er places where you can get US leg­is­la­tion from. And I think he must’ve devel­oped some func­tion­al­i­ty over the years where when you’re on the Capitol, well not on the Capitol but in the Capitol, he had some func­tion­al­i­ty where it behaved some­what differently.

So he had col­lect­ed these ranges and I just sort of tweet­ed at him one night, Do you know what the IP address ranges are?” And he was like, Yeah, just look in the source code on Github here.”, so I got them.

The prob­lem with this is you can’t use IFTTT for thou­sands of IP address­es very eas­i­ly and I did­n’t real­ly feel like spend­ing that much time putting things into IFTTT, so there’s thou­sands of IP address ranges with­in these. But I did do some pre­vi­ous work with Wikipedia, which I’m going to show you a quick video of (@28:40):

vlcsnap-2015-03-31-13h12m20s225This is a site I did that showed all the live edits to Wikipedia, and to do it I wrote a lit­tle node app that lis­tens to all the IRC chat rooms where Wikipedia announce changes. I’d already done a lit­tle bit of work to fig­ure out how to get the changes from Wikipedia, and what this is show­ing here is, the blue heads are actu­al peo­ple. This is a user that made that par­tic­u­lar edit.

There’s a lot of bots. (@29:30)

These are all changes that are com­ing from bots and sim­i­lar­ly you can pause it and click on the bot pic­ture and then see some infor­ma­tion about the bot that made that change. It’s kin­da cool because you can see dis­cus­sion around the bot, too. You can see things that the user’s decid­ed to share, oth­er peo­ple talk­ing about it.

The oth­er thing is that you can see anony­mous edits, too. (@30:15)

There’s not as many, but you can see these ones with a lit­tle red head with a ques­tion mark in it. These are users that haven’t logged into Wikipedia that have made a par­tic­u­lar change. When you do that, Wikipedia record the IP address that the change came from, since they don’t have a user to asso­ciate it with. So that’s basi­cal­ly what I used to do @congressedits.

For @congressedits what I could do was I cre­at­ed a lit­tle node library for the wik­istream appli­ca­tion that makes it real­ly easy to get the changes. Assuming that you’ve installed the wikichanges library with npm, this is an exam­ple import­ing it and then instan­ti­at­ing it, and then lis­ten­ing for all the changes from all the major lan­guage Wikipedias. The thing that you get in this call­back is a change object. I’ll just show you real­ly quick. This is actu­al­ly what I want­ed to share with you, because you guys write bots. You might have some ideas for what to do with this data.

Every time there’s a change, that func­tion will get this Javascript object, which has the URL for the diff so that from there you can get what actu­al­ly changed. You can get the URL for the arti­cle itself. You get the title, the user—in this case it’s an IP address, some oth­er infor­ma­tion like whether it’s anony­mous, what name­space it’s in, stuff like that. Not a lot, but the stuff that you have here in com­bi­na­tion with the API can get quite a bit of infor­ma­tion about the change and the article.

So I cre­at­ed the @congressedits bot. Darius was here at MITH this week and made a com­ment like it took twen­ty min­utes to do some­thing and some­body was like, What are you try­ing to com­mu­ni­cate when you say that?” But this bot, because I’d done that work before, it real­ly did take like half an hour or some­thing. And most of that was just get­ting the Twitter keys all in line.

vlcsnap-2015-03-31-13h24m14s220

This is an exam­ple of one of the tweets, and it’s not a very inter­est­ing tweet in itself because they all look very sim­i­lar, like this arti­cle edit­ed anony­mous­ly. You see US House of Representatives or the Senate, depend­ing on which IP address matched.

vlcsnap-2015-03-31-13h25m45s46

The cool thing is the tweet is linked to the diff, so if you fol­low the diff URL you can see what changed. In this case, it’s quite fun­ny because a lot of the changes are— I think there’s a set of bizarre indi­vid­u­als in the Capitol build­ing that once they real­ized that @congressedits was there and had a lot of fol­low­ers, they were adding these crazy things to Wikipedia. So in this case some­body’s say­ing Rumsfeld was an alien lizard.

Audience: You can’t prove he’s not.

[Citation need­ed]” might’ve been a good thing to put after it, maybe.

But the coolest thing about this project for me was I put the code on Github with­out real­ly think­ing about it because that’s what I always do, and soon all these peo­ple had installed it and fig­ured out IP address ranges that they cared about tweet­ing. The first one was the Government of Canada. This guy Nick Ruest cre­at­ed a very sim­i­lar bot. Somebody in Germany cre­at­ed one for @bundesedit. I thought it was fun­ny, fol­low­ing some of the diffs for these, I was real­ly sur­prised at how con­struc­tive the edits were. I don’t know if that was me stereo­typ­ing Germans or what, but it was just sur­pris­ing how good the edits were that came from there. >Here’s one from , France, Israel.

And then some peo­ple start­ed cre­at­ing some cool ones that mon­i­tor IP address ranges for com­pa­nies. So this one per­son did phar­ma­ceu­ti­cal com­pa­nies. I don’t know exact­ly where they got the IP address ranges from, but they were able to fig­ure some of them out. And then some­bob­dy did an oil edits one, so oil companies.

vlcsnap-2015-03-31-13h29m50s234

That ban­ner is great. But it actu­al­ly turned up some inter­est­ing stuff because it found an edit where the Russian tele­vi­sion edit­ed the Wikipedia arti­cle I think about MH-17, the plane that was shot down over the Ukraine and changed the arti­cle so that it was say­ing that instead of Ukrainian sep­a­ratists shoot­ing down the plane, it said Ukrainian mil­i­tary or some­thing like that. So it actu­al­ly turned into this kind of news sto­ry. And it got writ­ten up in all these bizarre places.

vlcsnap-2015-03-31-13h33m38s141I just thought I’d close with— Somebody had an idea to cre­ate a sim­i­lar bot but that would tweet not when peo­ple were edit­ing from Congress, but when arti­cles about Congress are edit­ed. So it’s a lit­tle bit more high-volume, espe­cial­ly because we just went through an elec­tion, so there’s a lot of churn around the arti­cles. But some­body was inter­est­ed in it. Maybe five or six times that num­ber of peo­ple fol­low it now, but that was what I was going to men­tion to you guys is that if you do hap­pen to use node—actually there’s one for Python, too, for get­ting the change stream. And prob­a­bly you could fig­ure it out in what­ev­er lan­guage you want, you basi­cal­ly just need to con­nect to some IRC chats and parse this kind of weird mes­sage that goes in every time a change is made. But if you do use node or Python, there’s a library there already you can use which basi­cal­ly you can sit and watch all the changes to Wikipedia.

And Wikipedia is kind of an inter­est­ing knowl­edge base because it’s not like Twitter is. There’s a lot of vol­ume on Twitter and I guess you can fil­ter it down either using the sam­ple stream or putting your own text fil­ters in there. But Wikipedia’s kind of neat because it sort of reflects inter­ests of this crazy Wikipedia cul­ture, which they’re on top of cur­rent events and doing all this stuff, so the change stream is like a lit­tle feed that you can get off of that atten­tion that’s going on on Wikipedia.

And if you have any ques­tions about get­ting access to data, I’d be hap­py to help. And if you have ideas I’d love to hear them. That’s it.