I’m going to talk about this bot @congressedits. My apolo­gies to Eric. I think Eric saw me use these slides in DC at the DC Hack and Tell meet­ing a cou­ple of months ago.

I’m not sure that @congressedits is cool in what it does, because real­ly it’s just a copy of anoth­er bot. So back in August, this guy in the UK called Tom Scott, he cre­at­ed a bot that that when­ev­er some­body from the UK Parliament edit­ed a Wikipedia page anony­mous­ly, it would tweet it, and he basi­cal­ly used IFTTT to do that. And it gave me the idea that it would be kin­da cool to do that for the US Capitol. If you want, lat­er I can tell you what he did with IFTTT but you’re prob­a­bly famil­iar with it already. The thing that he did was, there were two IP address­es that he knew about, and their change feed URLs at Wikipedia. You can basi­cal­ly give it an IP address and it’ll give you a list of changes. So he just hooked up those syn­di­cat­ed feeds to IFTTT and then had them tweet.

I cre­at­ed @congressedits, which was a copy of that. The prob­lem with Congress is that I didn’t know what the IP address­es were, but luck­i­ly I knew some­body who did know what the IP address­es were. So this guy Josh Tauberer who runs a site called GovTrack—he’s been run­ning it maybe a decade or so—but over the years he’s col­lect­ed these IP address ranges. So what he does is he scrapes [] and oth­er places where you can get US leg­is­la­tion from. And I think he must’ve devel­oped some func­tion­al­i­ty over the years where when you’re on the Capitol, well not on the Capitol but in the Capitol, he had some func­tion­al­i­ty where it behaved some­what dif­fer­ent­ly.

So he had col­lect­ed these ranges and I just sort of tweet­ed at him one night, Do you know what the IP address ranges are?” And he was like, Yeah, just look in the source code on Github here.”, so I got them.

The prob­lem with this is you can’t use IFTTT for thou­sands of IP address­es very eas­i­ly and I didn’t real­ly feel like spend­ing that much time putting things into IFTTT, so there’s thou­sands of IP address ranges with­in these. But I did do some pre­vi­ous work with Wikipedia, which I’m going to show you a quick video of (@28:40):

vlcsnap-2015-03-31-13h12m20s225This is a site I did that showed all the live edits to Wikipedia, and to do it I wrote a lit­tle node app that lis­tens to all the IRC chat rooms where Wikipedia announce changes. I’d already done a lit­tle bit of work to fig­ure out how to get the changes from Wikipedia, and what this is show­ing here is, the blue heads are actu­al peo­ple. This is a user that made that par­tic­u­lar edit.

There’s a lot of bots. (@29:30)

These are all changes that are com­ing from bots and sim­i­lar­ly you can pause it and click on the bot pic­ture and then see some infor­ma­tion about the bot that made that change. It’s kin­da cool because you can see dis­cus­sion around the bot, too. You can see things that the user’s decid­ed to share, oth­er peo­ple talk­ing about it.

The oth­er thing is that you can see anony­mous edits, too. (@30:15)

There’s not as many, but you can see these ones with a lit­tle red head with a ques­tion mark in it. These are users that haven’t logged into Wikipedia that have made a par­tic­u­lar change. When you do that, Wikipedia record the IP address that the change came from, since they don’t have a user to asso­ciate it with. So that’s basi­cal­ly what I used to do @congressedits.

For @congressedits what I could do was I cre­at­ed a lit­tle node library for the wik­istream appli­ca­tion that makes it real­ly easy to get the changes. Assuming that you’ve installed the wikichanges library with npm, this is an exam­ple import­ing it and then instan­ti­at­ing it, and then lis­ten­ing for all the changes from all the major lan­guage Wikipedias. The thing that you get in this call­back is a change object. I’ll just show you real­ly quick. This is actu­al­ly what I want­ed to share with you, because you guys write bots. You might have some ideas for what to do with this data.

Every time there’s a change, that func­tion will get this Javascript object, which has the URL for the diff so that from there you can get what actu­al­ly changed. You can get the URL for the arti­cle itself. You get the title, the user—in this case it’s an IP address, some oth­er infor­ma­tion like whether it’s anony­mous, what name­space it’s in, stuff like that. Not a lot, but the stuff that you have here in com­bi­na­tion with the API can get quite a bit of infor­ma­tion about the change and the arti­cle.

So I cre­at­ed the @congressedits bot. Darius was here at MITH this week and made a com­ment like it took twen­ty min­utes to do some­thing and some­body was like, What are you try­ing to com­mu­ni­cate when you say that?” But this bot, because I’d done that work before, it real­ly did take like half an hour or some­thing. And most of that was just get­ting the Twitter keys all in line.

vlcsnap-2015-03-31-13h24m14s220

This is an exam­ple of one of the tweets, and it’s not a very inter­est­ing tweet in itself because they all look very sim­i­lar, like this arti­cle edit­ed anony­mous­ly. You see US House of Representatives or the Senate, depend­ing on which IP address matched.

vlcsnap-2015-03-31-13h25m45s46

The cool thing is the tweet is linked to the diff, so if you fol­low the diff URL you can see what changed. In this case, it’s quite fun­ny because a lot of the changes are— I think there’s a set of bizarre indi­vid­u­als in the Capitol build­ing that once they real­ized that @congressedits was there and had a lot of fol­low­ers, they were adding these crazy things to Wikipedia. So in this case somebody’s say­ing Rumsfeld was an alien lizard.

Audience: You can’t prove he’s not.

[Citation need­ed]” might’ve been a good thing to put after it, maybe.

But the coolest thing about this project for me was I put the code on Github with­out real­ly think­ing about it because that’s what I always do, and soon all these peo­ple had installed it and fig­ured out IP address ranges that they cared about tweet­ing. The first one was the Government of Canada. This guy Nick Ruest cre­at­ed a very sim­i­lar bot. Somebody in Germany cre­at­ed one for @bundesedit. I thought it was fun­ny, fol­low­ing some of the diffs for these, I was real­ly sur­prised at how con­struc­tive the edits were. I don’t know if that was me stereo­typ­ing Germans or what, but it was just sur­pris­ing how good the edits were that came from there. >Here’s one from , France, Israel.

And then some peo­ple start­ed cre­at­ing some cool ones that mon­i­tor IP address ranges for com­pa­nies. So this one per­son did phar­ma­ceu­ti­cal com­pa­nies. I don’t know exact­ly where they got the IP address ranges from, but they were able to fig­ure some of them out. And then some­bob­dy did an oil edits one, so oil com­pa­nies.

vlcsnap-2015-03-31-13h29m50s234

That ban­ner is great. But it actu­al­ly turned up some inter­est­ing stuff because it found an edit where the Russian tele­vi­sion edit­ed the Wikipedia arti­cle I think about MH17, the plane that was shot down over the Ukraine and changed the arti­cle so that it was say­ing that instead of Ukrainian sep­a­ratists shoot­ing down the plane, it said Ukrainian mil­i­tary or some­thing like that. So it actu­al­ly turned into this kind of news sto­ry. And it got writ­ten up in all these bizarre places.

vlcsnap-2015-03-31-13h33m38s141I just thought I’d close with— Somebody had an idea to cre­ate a sim­i­lar bot but that would tweet not when peo­ple were edit­ing from Congress, but when arti­cles about Congress are edit­ed. So it’s a lit­tle bit more high‐volume, espe­cial­ly because we just went through an elec­tion, so there’s a lot of churn around the arti­cles. But some­body was inter­est­ed in it. Maybe five or six times that num­ber of peo­ple fol­low it now, but that was what I was going to men­tion to you guys is that if you do hap­pen to use node—actually there’s one for Python, too, for get­ting the change stream. And prob­a­bly you could fig­ure it out in what­ev­er lan­guage you want, you basi­cal­ly just need to con­nect to some IRC chats and parse this kind of weird mes­sage that goes in every time a change is made. But if you do use node or Python, there’s a library there already you can use which basi­cal­ly you can sit and watch all the changes to Wikipedia.

And Wikipedia is kind of an inter­est­ing knowl­edge base because it’s not like Twitter is. There’s a lot of vol­ume on Twitter and I guess you can fil­ter it down either using the sam­ple stream or putting your own text fil­ters in there. But Wikipedia’s kind of neat because it sort of reflects inter­ests of this crazy Wikipedia cul­ture, which they’re on top of cur­rent events and doing all this stuff, so the change stream is like a lit­tle feed that you can get off of that atten­tion that’s going on on Wikipedia.

And if you have any ques­tions about get­ting access to data, I’d be hap­py to help. And if you have ideas I’d love to hear them. That’s it.


Help Support Open Transcripts

If you found this useful or interesting, please consider supporting the project monthly at Patreon or once via Square Cash, or even just sharing the link. Thanks.