I’m going to talk about this bot @congressedits. My apologies to Eric. I think Eric saw me use these slides in DC at the DC Hack and Tell meeting a couple of months ago.
I’m not sure that @congressedits is cool in what it does, because really it’s just a copy of another bot. So back in August, this guy in the UK called Tom Scott, he created a bot that that whenever somebody from the UK Parliament edited a Wikipedia page anonymously, it would tweet it, and he basically used IFTTT to do that. And it gave me the idea that it would be kinda cool to do that for the US Capitol. If you want, later I can tell you what he did with IFTTT but you’re probably familiar with it already. The thing that he did was, there were two IP addresses that he knew about, and their change feed URLs at Wikipedia. You can basically give it an IP address and it’ll give you a list of changes. So he just hooked up those syndicated feeds to IFTTT and then had them tweet.
I created @congressedits, which was a copy of that. The problem with Congress is that I didn’t know what the IP addresses were, but luckily I knew somebody who did know what the IP addresses were. So this guy Josh Tauberer who runs a site called GovTrack—he’s been running it maybe a decade or so—but over the years he’s collected these IP address ranges. So what he does is he scrapes [] and other places where you can get US legislation from. And I think he must’ve developed some functionality over the years where when you’re on the Capitol, well not on the Capitol but in the Capitol, he had some functionality where it behaved somewhat differently.
So he had collected these ranges and I just sort of tweeted at him one night, “Do you know what the IP address ranges are?” And he was like, “Yeah, just look in the source code on Github here.”, so I got them.
The problem with this is you can’t use IFTTT for thousands of IP addresses very easily and I didn’t really feel like spending that much time putting things into IFTTT, so there’s thousands of IP address ranges within these. But I did do some previous work with Wikipedia, which I’m going to show you a quick video of (@28:40):
This is a site I did that showed all the live edits to Wikipedia, and to do it I wrote a little node app that listens to all the IRC chat rooms where Wikipedia announce changes. I’d already done a little bit of work to figure out how to get the changes from Wikipedia, and what this is showing here is, the blue heads are actual people. This is a user that made that particular edit.
There’s a lot of bots. (@29:30)
These are all changes that are coming from bots and similarly you can pause it and click on the bot picture and then see some information about the bot that made that change. It’s kinda cool because you can see discussion around the bot, too. You can see things that the user’s decided to share, other people talking about it.
The other thing is that you can see anonymous edits, too. (@30:15)
There’s not as many, but you can see these ones with a little red head with a question mark in it. These are users that haven’t logged into Wikipedia that have made a particular change. When you do that, Wikipedia record the IP address that the change came from, since they don’t have a user to associate it with. So that’s basically what I used to do @congressedits.
For @congressedits what I could do was I created a little node library for the wikistream application that makes it really easy to get the changes. Assuming that you’ve installed the wikichanges library with npm, this is an example importing it and then instantiating it, and then listening for all the changes from all the major language Wikipedias. The thing that you get in this callback is a change object. I’ll just show you really quick. This is actually what I wanted to share with you, because you guys write bots. You might have some ideas for what to do with this data.
Every time there’s a change, that function will get this Javascript object, which has the URL for the diff so that from there you can get what actually changed. You can get the URL for the article itself. You get the title, the user—in this case it’s an IP address, some other information like whether it’s anonymous, what namespace it’s in, stuff like that. Not a lot, but the stuff that you have here in combination with the API can get quite a bit of information about the change and the article.
So I created the @congressedits bot. Darius was here at MITH this week and made a comment like it took twenty minutes to do something and somebody was like, “What are you trying to communicate when you say that?” But this bot, because I’d done that work before, it really did take like half an hour or something. And most of that was just getting the Twitter keys all in line.
This is an example of one of the tweets, and it’s not a very interesting tweet in itself because they all look very similar, like this article edited anonymously. You see US House of Representatives or the Senate, depending on which IP address matched.
The cool thing is the tweet is linked to the diff, so if you follow the diff URL you can see what changed. In this case, it’s quite funny because a lot of the changes are— I think there’s a set of bizarre individuals in the Capitol building that once they realized that @congressedits was there and had a lot of followers, they were adding these crazy things to Wikipedia. So in this case somebody’s saying Rumsfeld was an alien lizard.
Audience: You can’t prove he’s not.
“[Citation needed]” might’ve been a good thing to put after it, maybe.
But the coolest thing about this project for me was I put the code on Github without really thinking about it because that’s what I always do, and soon all these people had installed it and figured out IP address ranges that they cared about tweeting. The first one was the Government of Canada. This guy Nick Ruest created a very similar bot. Somebody in Germany created one for @bundesedit. I thought it was funny, following some of the diffs for these, I was really surprised at how constructive the edits were. I don’t know if that was me stereotyping Germans or what, but it was just surprising how good the edits were that came from there. >Here’s one from , France, Israel.
And then some people started creating some cool ones that monitor IP address ranges for companies. So this one person did pharmaceutical companies. I don’t know exactly where they got the IP address ranges from, but they were able to figure some of them out. And then somebobdy did an oil edits one, so oil companies.
That banner is great. But it actually turned up some interesting stuff because it found an edit where the Russian television edited the Wikipedia article I think about MH-17, the plane that was shot down over the Ukraine and changed the article so that it was saying that instead of Ukrainian separatists shooting down the plane, it said Ukrainian military or something like that. So it actually turned into this kind of news story. And it got written up in all these bizarre places.
I just thought I’d close with— Somebody had an idea to create a similar bot but that would tweet not when people were editing from Congress, but when articles about Congress are edited. So it’s a little bit more high-volume, especially because we just went through an election, so there’s a lot of churn around the articles. But somebody was interested in it. Maybe five or six times that number of people follow it now, but that was what I was going to mention to you guys is that if you do happen to use node—actually there’s one for Python, too, for getting the change stream. And probably you could figure it out in whatever language you want, you basically just need to connect to some IRC chats and parse this kind of weird message that goes in every time a change is made. But if you do use node or Python, there’s a library there already you can use which basically you can sit and watch all the changes to Wikipedia.
And Wikipedia is kind of an interesting knowledge base because it’s not like Twitter is. There’s a lot of volume on Twitter and I guess you can filter it down either using the sample stream or putting your own text filters in there. But Wikipedia’s kind of neat because it sort of reflects interests of this crazy Wikipedia culture, which they’re on top of current events and doing all this stuff, so the change stream is like a little feed that you can get off of that attention that’s going on on Wikipedia.
And if you have any questions about getting access to data, I’d be happy to help. And if you have ideas I’d love to hear them. That’s it.