If you use Netflix, you may have seen the weird way that they seem to cat­e­go­rize their con­tent, like Movies star­ring Gary Busey,” which is not the way I would choose to spend my Saturday night brows­ing enter­tain­ment options. Or Wacky Cult Films” or Medical Movies based on Books” or what have you. This is sort of inter­est­ing and bizarre. Where do these come from? Not just how does Netflix know what you want or what you might like, but how do they even know that Coma is a med­ical movie based on a book? Some of these things, they seem like, okay movies star­ring Gary Busey, that’s just meta­da­ta that you’d get from any­where, but what makes a cult film wacky?” How would you know it’s wacky? And yet these films seem wacky and cultish, so how did all that happen?

So about a year ago, Alexis Madrigal at the Atlantic and I start­ed think­ing about this. Really, Alexis start­ed think­ing about it first and then he roped me in. Alexis put togeth­er this sto­ry and I put togeth­er the strange soft­ware that lives inside of it. I just want to talk a lit­tle bit about why we did this, what we did, what it meant to peo­ple. And then I’m going to make some aes­thet­ic judge­ments about bots deriv­ing those judge­ments from the expe­ri­ence we had doing this. 

One of the things that hap­pened is as Alexis start­ed look­ing at these gen­res (Netflix is call­ing them alt­gen­res” so I’ll try to use that term.) he was able to just scrape them all down because they were sequen­tial­ly num­bered in URL query string vari­ables. There turned out to be about 76,897 of them. They weren’t all in sequen­tial order, but it was pos­si­ble to set up a script and his account did­n’t get dis­abled for doing this even after he went down to Netflix and inter­viewed them about this whole process and how they col­lect­ed this information.

So they had 76,000 of these alt­gen­res, these ways of describ­ing movies, and in some ways the process of writ­ing this bot became about recre­at­ing, to some extent, what Netflix had already done, which is a very weird, kind of seem­ing­ly point­less act to do until you do it. But the process was kind of [inter­est­ing?]. We talked a lit­tle bit about cor­po­ra already and this bot (if it’s indeed a bot; it’s real­ly a text gen­er­a­tor), we had to cre­ate the cor­pus for it. First that involved gath­er­ing all of these alt­gen­res and they’re unstruc­tured; we just get them as a text string. So we end­ed up using a con­cor­dance pro­gram which you can get, AntConc, to try to struc­ture that data. 

There were some pat­terns that began to emerge, like we’d see about” pop up and there’s some kind of sub­ject that’s present, and like­wise time peri­ods or set in Asia,” set in Europe.” These were data that we were able to extract and struc­ture with this con­cor­dance pro­gram. Then after all that was done, all we end­ed up with was a big spread­sheet of these chunks of data. It still was­n’t clear exact­ly how they would get put togeth­er, either in Netflix style or in anoth­er style.

Screenshot of the Atlantic article, with a generated Netflix-style genre reading "Blockbuster Dramas Based on Books from the 1910s"

So we start­ed ana­lyz­ing the actu­al gen­res and I devel­oped some gram­mars for try­ing to recre­ate first of all the Netflix-style genre which you can see here. What’s inter­est­ing about gen­er­at­ing Netflix genre names is that might or might not actu­al­ly exist. They’re using the same data that we pulled out of Netflix, and then we’re rear­rang­ing accord­ing to the log­ic of a gram­mar that I wrote based on our analy­sis of what we thought the alt­genre struc­ture looked like. But these may not actu­al­ly cor­re­spond with any actu­al gen­res that Netflix adver­tis­es or any films, for that mat­ter: Witty Werewolf Mysteries, or Quirky Detective Disney Fairy Tales, or Hit-Man Spy Dramas, or whatever.

That was rel­a­tive­ly straight­foward, and I’m going to come back to talk about gram­mars in a sec­ond, but from there once we had that it occurred to us, what else can we do? What oth­er ways of inter­sect­ing this data are there? You could make these Hollywood pitch room kinds of con­cepts pret­ty eas­i­ly: Heartfelt Tortured Genius Provocative Tearjerkers is not the best Hollywood pitch, but Morality Immigrant-Life Comedies might be, or Prison Post-Apocalyptic Mockumentaries. I would watch that. 

But most inter­est­ing was just going bonkers with this data in gonzo mode” [inaudi­ble] and incor­po­rat­ing as much as pos­si­ble: Viral Plague Sci-Fi Movies Based on Children’s Books Set in Europe for Ages 8 to 10; or First Love Slice of Life Musicals Set in Europe From the 19820s For Hopeless Romantics; Bounty-Hunter Fantasy Movies Based on Books About Cats. This is stuff that you would put in your bot if you were mak­ing a bot, and indeed it’s not a bot but you can tweet it, and the gonzo ultraniche gen­res, these are the ones that peo­ple want­ed to talk about, or they want­ed to reflect on.

So that’s what we did and what it looks like at the end. This arti­cle got a lot of reads and tweets. I haven’t count­ed them all. But I want to go back and say a cou­ple of things about the aes­thet­ics of this project. 

The first is that this isn’t a bot, it’s a text gen­er­a­tor that you can tweet out as a bot, and I think that’s impor­tant in this case because often as bot mak­ers we cel­e­brate the ambi­gu­i­ty of bots, espe­cial­ly on Twitter. Is that real or not, and we don’t know. We love that we don’t know. But some­times you actu­al­ly want to know. You want to know this is text gen­er­a­tion, and it’s meant to be text gen­er­a­tion, or you can see the struc­ture of some­thing else, in this case the Netflix alt­gen­res. That’s one obser­va­tion I would make.

The sec­ond obser­va­tion is that the gen­er­a­tion of your own cor­po­ra is some­times real­ly free­ing, and it also forces you to think about a small­er set of data and how you can inter­act with it pro­gram­mat­i­cal­ly more delib­er­ate­ly. So as much as I love Wordnik and stream­ing data off of Twitter and just using the essen­tial­ly infi­nite amount of con­tent that you get from that chan­nel, I think there’s also a rea­son to pre­fer oth­er kinds of meth­ods. There’s noth­ing new about writ­ing a context-free gram­mar and oper­at­ing it on a small data set. That’s all that the gonzo gram­mar is doing. You can find code like this in Python or any­where, but this is what I wrote quick­ly for the project in Javascript. It’s like 10% of the time add a region, and then three or less adjec­tives and the genre name, and half the time add data from the descrip­tion,” and because this context-free gram­mar can recurse I can have things like stars, which build into the thing that we call roles like star­ring Gary Busey” but it’s actu­al­ly star­ring #star” or cre­at­ed by #cre­ator” and then we have three lev­els of stars based on their pop­u­lar­i­ty or their frequency. 

So in this kind of data vs. process mode that we are always in when we’re writ­ing soft­ware, I feel like the bot world has been very very data-oriented, and there’s rea­sons to be more process-oriented. Grammars are just one exam­ple of how you might do it, but focus­ing on dif­fer­ent ways of putting togeth­er small­er sets of data is also an option that’s avail­able to us.

The final thing I want to note, again I don’t know if this is a bot or just a text gen­er­a­tor, but if it is a bot it’s a bot that has a rhetor­i­cal func­tion. Someone is meant to inter­act with this in order to gain pur­chase on the idea of what these Netflix alt­gen­res are and what they mean. And what they mean for Netflix is they actu­al­ly sit peo­ple down in front of movies and pay them to write down that this is a wacky movie, and then they get that in their data­base and then they’re using that as a way of try­ing to recon­struct what these movies are about or how they appeal to peo­ple. It’s kind of replac­ing the rec­om­men­da­tion sys­tems that were pre­vi­ous­ly used and also at least sup­pos­ed­ly inform­ing the orig­i­nal con­tent devel­op­ment that Netflix is doing now. And see­ing into the process is inter­est­ing and impor­tant maybe, to get a sense of how is it that these kinds of media objects get con­struct­ed. This is not the only way that ideas are being con­sid­ered at Netflix, but it’s one of those ways. So there’s this addi­tion­al future of sort of rhetor­i­cal bots or bots that are try­ing to show you some­thing about the world rather than try­ing to be an expe­ri­ence that you have with it.

Further Reference

Mark Sample cre­at­ed a Netflix Genres Twitter bot, inspired by Alexis and Ian’s Atlantic piece.

There’s a (sad­ly emp­ty) Ian Bogost Movies” alt­genre.

Ian dis­cuss­es Netflix’s alt­gen­res again as part of The Cathedral of Computation” for The Atlantic.

Darius Kazemi’s home page for Bot Summit 2014, with YouTube links to indi­vid­ual ses­sions, and a log of the IRC channel.