Samim Winiger: Welcome to Ethical Machines. We are your hosts…

Roelof Pieters: Roelof.

Winiger: And Samim.

Pieters: Ethical Machines is a series of con­ver­sa­tions about humans, machines, and ethics. It aims at start­ing a deep­er, better-informed debate about impli­ca­tions of intel­li­gent sys­tems for soci­ety and individuals.

Winiger: For this episode, we invit­ed Alex Champandard and Gene Kogan to talk to us about cre­ative AIs. Let’s dive in.

Hi, Alex. Welcome to the podcast.

Alex J. Champandard: Thanks for hav­ing me, it’s a real plea­sure. I think it’s a real­ly amaz­ing­ly dynam­ic and active, boom­ing field, and so it’s great to talk about these topics.

Winiger: I had the plea­sure ear­li­er this sum­mer to attend a con­fer­ence you were orga­niz­ing by the name of nucl​.ai. But I’ve also seen you’ve done so many things online. Maybe you can start it off by ask­ing what’s your back­ground, actually.

Champandard: So I got hooked on AI for games from uni­ver­si­ty onwards. And I spent a few years as an AI pro­gram­mer in the games indus­try and did some stuff in the UK and in Vienna for Rockstar. After that, a bit of con­sult­ing also in games, con­tract­ing on mul­ti­play­er bots, those types of things. I’m now focus­ing on the con­fer­ence orga­ni­za­tion. Because I found that was some­thing that was miss­ing a lot, sort of bring­ing all these ideas into this shared melt­ing pot and see what comes out. Yeah, and that’s kind of my focus today. Finding cool things, dig­ging into them and then teach­ing and pass­ing on the information.

Pieters: So, one of these cool things, Alex, has been Deep Forger. So for our lis­ten­ers, can you explain what Deep Forger exact­ly is?

Champandard: So a few weeks ago now there was a new algo­rithm pub­lished called A Neural Algorithm for Artistic Style. So I thought well, that’s a kind of an inter­est­ing super pow­er­ful, very gener­ic algo­rithm that could take the ideas or the style or the pat­terns in one image and then com­bine them with the con­tents of the dif­fer­ent types of pat­terns in anoth­er image, and then cre­ate a final out­put. And so these kinds of algo­rithms tend to have so many pos­si­bil­i­ties, but it’s so dif­fi­cult to under­stand them just by read­ing the algorithm. 

So I thought a Twitter bot would be a good way to explore that and first of all let peo­ple sub­mit things, but also let it try to gen­er­ate cer­tain things based on a data­base of dif­fer­ent styles. And so it’s proven to be very inter­est­ing, help­ing explore the field of what’s pos­si­ble by tweet­ing com­bi­na­tions of style images and oth­er peo­ple’s pho­tographs and putting them togeth­er and see­ing what comes out.

Winiger: Yeah I mean, I’ve been watch­ing the hash­tag #stylenet on Twitter prac­ti­cal­ly every day. And Deep Forger things, they seem to be improv­ing over time. At least from the out­side. How are you doing it? Are you opti­miz­ing the sys­tem constantly?

Champandard: So yeah, I think there’s three things to it. The first is I think that the user sub­mis­sions, that peo­ple are sub­mit­ting things based on things that work. And there are some very pas­sion­ate users, and they learn what works and they tweak the para­me­ters so they do two or three sub­mis­sions and they get it right. And so that you’re see­ing the com­mu­ni­ty learn­ing how this algo­rithm works, which is amaz­ing to me. You know, peo­ple are under­stand­ing how this algo­rithm real­ly behaves as a tool.

The sec­ond part is me tweak­ing the code and going over cer­tain fail­ure cas­es and then I add an extra piece of code that will deal with that or cus­tomize cer­tain para­me­ters somewhere.

And the third is also I’m start­ing basi­cal­ly doing some form of learn­ing. Gathering sta­tis­tics on things that work and things that don’t.

Pieters: Yeah, and as you know Samim and I are big fans of what we term cre­ative AI.” So specif­i­cal­ly to StyleNet, there were so many dif­fer­ent things actu­al­ly pop­ping out the last cou­ple of months. DeepDream and StyleNet and all these oth­er peo­ple. So what fas­ci­nat­ed you in StyleNet specifically?

Champandard: So, the DeepDream stuff I thought was amaz­ing tech­ni­cal­ly, and I real­ly got into the tech side of things but I was­n’t that amazed by the result. So even though I was fas­ci­nat­ed by that I did­n’t real­ly jump into it because I did­n’t see the poten­tial as a tool. But the StyleNet thing was imme­di­ate­ly obvi­ous. I thought it seemed to res­onate that peo­ple will be using this and it’s the kind of tech­nique that will help artists maybe in six months, two years, who knows?

And so I jumped onto that imme­di­ate­ly, like even before there was a first imple­men­ta­tion out I thought, okay how can we take this fur­ther. And so I start­ed the idea of the bot and a data­base of paint­ings, putting stuff togeth­er. Sort of, I assumed the algo­rithm was already imple­ment­ed some­where and then I built this whole infra­struc­ture around that hole and then plugged in what­ev­er was avail­able. We’ve tried about three dif­fer­ent imple­men­ta­tions now of the StyleNet, and they get bet­ter over time. So it’s quite amaz­ing how fast this is moving.

Winiger: So you men­tioned these data­bas­es of paintings.

Champandard: Yeah.

Winiger: And I’m assum­ing these are par­tial­ly copy­right­ed paint­ings, in a sense. Is the copy­right still intact, or…?

Champandard: So the data­base I’m using is from the Metropolitan Museum; it’s called the Collection. It’s avail­able online and oth­er bots are using it. To be hon­est, I’ve been kin­da side­step­ping this issue of copy­right because I think it’s an absolute­ly huge top­ic. If you put in two dif­fer­ent images and mix them up and then have an AI sys­tem that cre­ates some out­put, what is actu­al­ly the terms and con­di­tions on the final image? I mean, is it sub­ject to the orig­i­nal per­son who sub­mit­ted it? But what if the painter’s style…is that copy­rightable in some way? There are so many impli­ca­tions there. Certainly a cou­ple of peo­ple have men­tioned copy­right about the bot but it real­ly has­n’t cre­at­ed a con­tro­ver­sy yet.

As the qual­i­ty of the images improves I expect more artists to raise these issues. But for any artists that are work­ing in this field now, if I was good at paint­ing I’d prob­a­bly be look­ing at how to find styles that work well with these kind of rep­re­sen­ta­tions and make them eas­i­ly automat­able or trans­fer­able so that if I had fans as an artist they could say, Hey, I would like to have a pic­ture of my cat paint­ed.” And that’s some­thing we’ve seen from the Twitter bot. People sub­mit pic­tures of their hous­es or them­selves that they want paint­ed in a famous style. So if you were to do that as an artist today, you could say, Well look, we can do that for you, ful­ly cus­tomized art part­ly using neur­al net­work rules.” So a data­base of con­tent that’s been custom-created for this neur­al net­work, and pro­duc­ing amaz­ing results. I think that’s some­thing that we could see more of in the next year or two.

Pieters: Do you keep track of what are the most pop­u­lar art styles. Is it Picasso or is it Dalí or. What are the Deep Forger’s fans like?

Champandard: So the bot tweets its most highly-rated forg­eries. So if you have retweets or favorites on a paint­ing it will get retweet­ed. And the ones that have proven the most pop­u­lar I think are the landscapes. 

Winiger: So with DeepDream out, twenty-four hours lat­er there was porn, DeepPorn or what­ev­er you want to call it, online. And with StyleNet it took about ten days and then there was this kind of very abstract one online. Do you think you will get worse or?

Champandard: Well, so the first few days I was expect­ing it but it did­n’t hap­pen. Then there was one sub­mis­sion that was let’s say…questionable but still in a taste­ful area. Well, I thought it would be worse, but to be hon­est I don’t think we’ve seen the peak of this quite yet. I think it’s going to still increase.

Winiger: I saw on Twitter that you had a con­ver­sa­tion about bots talk­ing to each other.

Champandard: Yeah.

Winiger: How did that work out, I’m curious?

Champandard: So there’s a Commons Bot which is tweet­ing images from the same data­base that I’m using to match the paint­ings. And so every four hours I think, the bot will tweet a mes­sage to Deep Forger and then it cre­ates a forgery. So yeah, the results have been sur­pris­ing­ly good because that bot tends to tweet greyscale images, which users don’t real­ly sub­mit that much. And so you end up with more sketch­es that match. And so it explores a dif­fer­ent part of space.

Pieters: So the bots already have some kind of agency. Would you already con­sid­er this as being some­thing like cre­ative? Are we there yet? Is Deep Forger going to be the first cre­ative agent?

Champandard: I… I don’t like to use the word cre­ative” on a boolean. Actually, short­ly after I made the bot I met up with an old friend of mine who was a pro­gram­mer at Rockstar with me and switched into doing art and that side of things. And we had a long dis­cus­sion about this and he brought up the fact that art is a ques­tion of hav­ing the agency and then using that to make cer­tain deci­sions. So if you look at the bot as it is now, it cer­tain­ly has some agency over which paint­ings it selects and how it decides on those images. So a lot of that is code that was man­u­al­ly writ­ten, and some of it is a ran­dom deci­sion, but there’s agency there. So based on the response of cer­tain artists, I think you could say that it’s art just based on that reac­tion. Like if they real­ly react strong­ly to it then it cer­tain­ly is art, right?

Winiger: So we’ve been hav­ing an ongo­ing tweet over the last cou­ple of months about what I call com­pu­ta­tion­al com­e­dy. And com­e­dy is an art form and it must be part of any work­ing the­o­ry of cre­ative AI, I guess. And so you brought up recent­ly this notion that there must be self-reflection, in a sense.

Champandard: Yeah. I think it might be a cul­tur­al thing. My back­ground is not English-speaking. The way I see it is that com­e­dy requires intent. You have to go out there with the inten­tion of enter­tain­ing peo­ple. A stand-up come­di­an is only a stand-up come­di­an if he’s there to make peo­ple laugh on pur­pose. And so in the projects that you’ve been tweet­ing about with the #ComputationalComedy hash­tag, you’re the come­di­an. These sys­tems are just a tool that are an exten­sion of you as the come­di­an, right.

Winiger: That’s a real­ly inter­est­ing angle to explore, when some­thing becomes inten­tion­al or not. I sup­pose it’s a core prob­lem of cre­ativ­i­ty, in a sense. Generative inten­tion, next.

Pieters: So Alex, I want­ed to read a tweet which you sent out. 

And this was actu­al­ly some­thing which you already wrote very ear­ly on, I think even before you pub­licly pub­lished the bot. So is that some­thing that you’ve played with and tried out for Deep Forger?

Champandard: I think the prob­lems with StyleNet have become very quick­ly obvi­ous. Like for exam­ple if you have a paint­ed land­scape and a pho­to of a land­scape, then trans­fer­ring the styles between the two should have some under­stand­ing this is a piece of sky, this is also a piece of sky, and there­fore I should use the col­ors of the sky in one pic­ture and trans­fer that to the orig­i­nal photograph.”

And so there’s no direct way for the algo­rithm to know that. It’s just opti­miz­ing all these dif­fer­ent lay­ers in the neur­al net­work. I expect to see future work in this. I expect that will hap­pen with­in the next year, pos­si­bly. But the way I’ve been approach­ing it is just try­ing to find bet­ter match­es between the paint­ings and sort of think­ing out­side of the algo­rithm. And by find­ing real­ly good qual­i­ty match­es between the paint­ings and the style, you get real­ly good results so you can side­step some of the defi­cien­cies in the algo­rithm by by doing a bet­ter job with the selec­tion of the paintings.

Winiger: I’m curi­ous, do you see appli­ca­tion of this or sim­i­lar things in the game indus­try any­time soon?

Champandard: So I haven’t yet tried apply­ing the algo­rithm to let’s say indi­vid­ual pieces of con­tent and then using tra­di­tion­al game pipelines to pro­duce the final result. I’ve only tried tak­ing screen­shots and then apply­ing the process and get­ting the result­ing screen­shots. And so when I did this process for Quake, I labeled it like that. I said it’s more of a con­cept art. It’s help­ing you under­stand the space of what the are style could be. But when I released the Quake screen­shots the first thing that game devel­op­ers were ask­ing is Can we get this ver­sion of Quake in a shad­er? Can we make a mod that like Picasso-style Quake?”

The prob­lem is it’s tak­ing about six min­utes— At the time it was six min­utes, I’ve got it down to about three min­utes to get similar-quality 720p screen­shots. It’s nowhere near real-time, three min­utes ver­sus thir­ty frames per sec­ond. It’s not quite there. I think the appli­ca­tions for con­cept art are short term. I think that’s a very promis­ing avenue, just to get ideas on how you can add visu­al ele­ments from let’s say famous painters into your game and see how that turns out.

Pieters: Yeah. I mean, it’s one, a mat­ter of just time. Nvidia’s com­ing out with new graph­ics cards in a year now, the Pascal or what­ev­er they’re called, which will be like four times faster. So maybe we’ll get already already a frame rate of four frames a sec­onds? Like, what could be kind of appli­ca­tions when this is real-time? Things like VR?

Champandard: Yeah, I’m not con­vinced about the real-time post-processing in a shad­er. I think there’s a big mar­ket for an inde­pen­dent devel­op­er, but con­vinc­ing the stan­dard AAA or con­sole devel­op­ers to switch to this is going to be dif­fi­cult. For con­tent pro­duc­tion, I think there’s a lot of poten­tial like if you imag­ine VR itself, there’ll be a lot of empha­sis on the qual­i­ty of the envi­ron­ment. And so using these kinds of tech­niques to improve the qual­i­ty of the sky­box­es, for exam­ple, and paint­ing beautiful-looking skies or trees or land­scapes, even if there’s oth­er geom­e­try built in a more tra­di­tion­al way. Maybe using these tools as a way to aug­ment the qual­i­ty of the tex­tures or add a cer­tain style to it or makes things quick­er or eas­i­er to devel­op. Maybe tak­ing the style of an expe­ri­enced artist and then using that style to trans­fer it to the art of a begin­ner. So there might be some poten­tial there in just let­ting artists build these real­ly rough sketch­es and then hav­ing the neur­al net­work fill in the gaps.

Winiger: I mean, at the speed of inno­va­tion in this space, where do you see us in the short term or mid term, or maybe even the long term? It’s very hard to esti­mate but maybe you have some wishes.

Champandard: I hope that there’ll be anoth­er ver­sion of StyleNet that does things a bit more context-sensitively. I think on the imple­men­ta­tion side things are con­stant­ly improv­ing. We’re get­ting to under­stand things bet­ter, so that will be more of an incre­men­tal thing. But I think any­thing fur­ther for­ward will be more a ques­tion of chang­ing the mind­sets of the peo­ple that could be using the tech. Changing mind­sets always takes longer than chang­ing the tech­nol­o­gy. And with machine learn­ing mov­ing as fast as it is now, it’s going to be quite scary, the dif­fer­ence in pace between what the machines can do and what the human mind is com­fort­able with, to put it that way. So from that per­spec­tive it’s hard­er to pre­dict because involved in that is basi­cal­ly pre­dict­ing how reac­tive or respon­sive the com­mu­ni­ty will be, ver­sus how closed off they are to the whole idea.

I think it’s quite fas­ci­nat­ing just how fast things are mov­ing, real­ly. The boom that’s hap­pen­ing in the field is mind-blowing. And I’ve not seen this atmos­phere— Being in the indus­try for many years I’ve not seen it hap­pen like this. It’s amaz­ing how quick­ly you can turn out new things and new pro­to­types. So I found that very mind-blowing and cer­tain­ly a com­plete­ly new field of AI for cre­ative indus­tries, which is boom­ing right now.

Winiger: Welcome, Gene. Congratulations on the super [?] stuff you’ve been doing. There’s prac­ti­cal­ly not a day where there’s not some­thing amaz­ing com­ing from your Twitter account. I’m curi­ous, what’s your back­ground? What do you do at the moment? 

Gene Kogan: I have a mixed back­ground. So I for­mer­ly stud­ied math and did some machine learn­ing. Did some research for a time in music infor­ma­tion retrieval. So that was sort of how I got my feet wet in machine learn­ing. But the last few years I’ve been work­ing as a coder and an artist in new media. So like I do a lot of stuff with pro­jec­tion and Kinect and sen­sors, Leap Motion, what­ev­er sort of new tech­nol­o­gy at the time, try to inte­grate it into performance.

Pieters: Yeah, it’s real­ly awe­some you man­aged to put togeth­er a real­ly great motion pic­ture, I would almost call it, Why is a Raven Like a Writing Desk?” So can you tell us why you were so much attract­ed about StyleNet? How did you get involved in this kind of thing?

Kogan: I had nev­er seen any­thing quite like that before. And you know, I’ve been track­ing it [inaudi­ble] DeepDream stuff. With DeepDream it was like you real­ly only had the con­trol over the con­tent and not the style. With StyleNet you have two degrees of free­dom, so there’s a lot more sort of abil­i­ty to get the behav­ior you want. 

But to be hon­est you know, I do a lot of cod­ing and this project is the one that I did the least amount of cod­ing. I mean, I’m using Justin’s library in Torch. So to me I think my job is more like cura­to­r­i­al than any­thing. You know, the soft­ware he wrote is so good.

Pieters: Yes. Myself being the first guy to cre­ate a video with DeepDream and you being the first one to make a video using StyleNet— So how was the process, and sec­ond­ly what was the reac­tion and the response to your video?

Kogan: For mak­ing the StyleNet video, it takes a lot less inter­ven­tion, it seems, than DeepDream in get­ting sta­ble frames. The only thing that I added was blend­ing the out­put images into the next input. And to answer your sec­ond ques­tion, the reac­tion’s been real­ly great. There’s been a bunch of arti­cles, and that’s been trend­ing— Even now it’s still get­ting a lot of views.

Winiger: You men­tion this cura­to­r­i­al role. And I think this is very mag­i­cal. I mean, you did a phe­nom­e­nal job with select­ing the right inputs, and I guess out­puts. So if you would have to describe this new cre­ative process, what does it look like? I mean how do you select these inputs?

Kogan: Initially, when we were all pro­duc­ing images, I was try­ing to get a feel for what style images work best. It seems like if you do things that are just tex­tur­al, like I tried some­thing like with Mark Rothko for exam­ple, the effect is more or less kind of just take the col­or palette.” But things that have dis­cernible sort of pat­terns and shapes trans­fer real­ly real­ly well. So things like dots and lines…that’s why every­one was using Starry Night,” because it worked so well. Picking Alice in Wonderland, that was kind of a hap­py accident.

Pieters: Yeah. I mean, there’s been some recent devel­op­ments also I think to the Torch [?] like multi-style input images. Have you played with, do you have any expe­ri­ences there? 

Kogan: Yeah, just the last cou­ple of days I’ve start­ed play­ing with the mul­ti­ple styles. So I just put up a cou­ple short ani­ma­tions on Twitter that do style inter­po­la­tion from one style to anoth­er. Yeah, just been play­ing with those a lit­tle bit.

And I’m also think­ing a lit­tle bit more about bring­ing in some oth­er stuff that I know. So like you can do pret­ty effec­tive image seg­men­ta­tion. So I could poten­tial­ly apply dif­fer­ent styles to hand-selected regions of an image or a video.

Winiger: So I guess more kind of from fantasy-land, if an ani­ma­tion stu­dio approach­es you and offers you a bunch of GPUs and some cash, to do a short ani­mat­ed StyleNet thing, would you do it, A, and do you think the impli­ca­tions for ani­ma­tion stu­dios are [a given?]?

Kogan: I feel like the tech­nol­o­gy still needs to mature a lit­tle bit before the ani­ma­tion stu­dios become real­ly keen on it. Because I mean like, if you think about Pixar, they’re pro­duc­ing breath­tak­ing and very high-resolution imagery. And for now the style trans­fer stuff appeals to us because we know sort of what’s going on. I think for them, the stuff they’re mak­ing is still so much more sort of beau­ti­ful. Not to come off the wrong way but in a super­fi­cial way like in the sense that you don’t nec­es­sar­i­ly need to know as much about what’s going on under­neath the hood.

Pieters: So from your expe­ri­ence work­ing with this in [?], so mak­ing a video, what are the main lim­i­ta­tions right now?

Kogan: Well, the imple­men­ta­tions that I’ve used all of them have some tech­ni­cal con­straints. So it’s very hard to pro­duce images of a high res­o­lu­tion. And then of course it’s very cost­ly. So those are the main tech­ni­cal con­straints. And then you know, I think there’s cer­tain­ly a lot more room to improve the qual­i­ty of their results in the future. I’m sort of wait­ing to see what the machine learn­ing researchers improve on the results in the next months [inaudi­ble] and just doing some­thing total­ly different. 

There’s just so many dif­fer­ent domains this is being applied. Like I was look­ing at Alec Radford’s work. He’s putting out these videos of gen­er­at­ing faces from scratch and inter­po­lat­ing through them. To me they’re incred­i­ble. I real­ly would love to see even a real­ly huge train­ing set of images, what sort of crazy images that you can pro­duce from scratch.

Pieters: What would be the wish­list from your per­spec­tive for AI research to work on?

Kogan: Well I guess the first thing that I would say that is in the last few months or few years that some [sites?] of acad­e­mia are start­ing to try to make these things a lit­tle bit more usable for non-academics. Which is real­ly nice you know, because you get all these fresh per­spec­tives from peo­ple who may not nec­es­sar­i­ly know how to use [?] soft­ware in a very savvy way. 

All of the neur­al style libraries, they’re all just you know, command-line util­i­ties. You just put in a con­tent image and a style image… When I released the Alice video, I put a Gist that explains how to do it. And if you know your way around the ter­mi­nal you don’t real­ly even have to know that much machine learn­ing to make it work. So that’s how well these libraries are designed. I think that’s real­ly help­ful because a lot of peo­ple have dif­fer­ent sort of domains of exper­tise. You know, a machine learn­ing researcher who’s very focused on expand­ing the accu­ra­cy of the sys­tem may not even be think­ing about all the dif­fer­ent appli­ca­tions that they can inspire. So doing more of this sort of inter­fac­ing with oth­er peo­ple out­side of acad­e­mia is real­ly real­ly cool. And then I think it comes back to them. 

Winiger: Yeah, I mean there’s a sense of com­mu­ni­ty some­how. One way I recent­ly thought about it was that on the one hand the AI research is a dis­cov­ery art, I sup­pose. As a sys­tem they can inter­face with. And on the oth­er hand we’ve got all these artis­tic out­puts com­ing, so that makes it inter­est­ing for cre­ative peo­ple. How do you see this emerg­ing fur­ther? Is this ten years from now, or one year from now?

Kogan: To me the most impor­tant char­ac­ter­is­tic is that I’m inter­fac­ing with this, I’m mak­ing visu­als and so on. But real­ly the most sub­stan­tial thing is you hope that these are sort of vehi­cles to inform the pub­lic about what these machine learn­ing algo­rithms do in oth­er domains. Because maybe it’s hard to sep­a­rate appli­ca­tion from the tech­nol­o­gy, but the same under­ly­ing algo­rithms are found in all sorts of oth­er domains that are much clos­er to peo­ple’s lives. And that you know, it’s cliché to talk about how our tech­nol­o­gy’s become so omnipresent and so on. But it kind of bears repeat­ing that these things are becom­ing increas­ing­ly influ­en­tial and then when it comes time to make—particularly in pol­i­cy deci­sions and so on—the more peo­ple are informed about the exis­tence of these tech­nolo­gies, I think the more demo­c­ra­t­ic that the deci­sion­mak­ing process will be. So when you ask about ten years from now, I’m hop­ing that it kind of leads to that.

Winiger: I mean, do you call it art, or do you call it…creative? What would you call it if you would label it? And I’m ask­ing as a bit of a provo­ca­tion because I see this issue com­ing up more and more. What do you think?

Kogan: I try to the extent pos­si­ble to call it, like to describe it, in as detailed terms as I can and not wor­ry too much about whether it is or isn’t art. I mean, art is sort of an anti­quat­ed term. It car­ries these con­no­ta­tions from most­ly like the 19th cen­tu­ry that may or may not be rel­e­vant in every con­text in which it’s used. So for like the StyleNet stuff there’s so many com­po­nents to it. There’s the actu­al artists that I’m sam­pling from, there’s the soft­ware that’s made by some­body else, and I’m more of a cura­tor and so on. So it’s like…it’s very unclear where the cre­ative process comes from. [inaudi­ble] would­n’t work at all with­out all of those things work­ing in tan­dem. So I guess it’s art. I don’t know. I don’t spend too much time think­ing about it. I don’t lose much sleep over it.

Winiger: I saw you seem to like music.

Kogan: Mm hm, mm hm.

Winiger: You’ve been doing some work I sup­pose with musi­cal ele­ments or [inaudi­ble]. StyleNet for music? What’s your pre­dic­tion? How do you—

Kogan: Ah. I have heard through the grapevine that some of this is being worked on. So yeah, a month or two ago I worked with a library that was using LSTMs to train audio and to pro­duce audio from scratch. I found it a lit­tle chal­leng­ing to get much per­for­mance out of it at the time. I put out a cou­ple of sound sam­ples that worked well. But for the most part I think they were overfitting. 

It’s actu­al­ly sur­pris­ing that the audio stuff is a bit behind video, because I think maybe the biggest bot­tle­neck is that there’s noth­ing quite like ImageNet or some of these visu­al data­bas­es that exist for train­ing. There’s noth­ing quite like that as far as I’m aware of for audio. 

Pieters: No, no there’s not.

Kogan: Yeah. Although I do know that some of the audio peo­ple at both Google and Facebook, some of whom I know, have talked about a so-called DeepDream for audio. I would say that some­thing in the works of pro­duc­ing audio from scratch. I’ve also done some stuff with text also, and that’s also been pret­ty grat­i­fy­ing. So keep­ing up with the dif­fer­ent text gen­er­a­tion imple­men­ta­tions, the first one that I saw was Andrej Karpathy’s.

Winiger: What did you train your char-rnn on?

Kogan: I tried a bunch of dif­fer­ent sources. So at first I was doing just authors that I could find. I did Jack Kerouac and Ginsberg. I tried the Bible, Dante’s Inferno? Then I start­ed doing sort of more per­son­al stuff. So I was train­ing it on my Gmail, and I keep a jour­nal that I’ve been keep­ing for the last three years. So I trained it on my jour­nal. I start­ed mak­ing these texts that was like me watch­ing watch­ing like a robot ver­sion of myself writ­ing. It was real­ly surreal.

Winiger: If you made it this far, thanks for listening.

Pieters: And also we would real­ly love to hear your com­ments and any kind of feed­back. So drop us a line at info@​ethicalmachines.​com.

Winiger: See you next time.

Pieters: Adios.