Samim Winiger: Welcome to Ethical Machines. We are your hosts…
Roelof Pieters: Roelof.
Winiger: And Samim.
Pieters: Ethical Machines is a series of conversations about humans, machines, and ethics. It aims at starting a deeper, better-informed debate about implications of intelligent systems for society and individuals.
Winiger: For this episode, we invited Alex Champandard and Gene Kogan to talk to us about creative AIs. Let’s dive in.
Hi, Alex. Welcome to the podcast.
Alex J. Champandard: Thanks for having me, it’s a real pleasure. I think it’s a really amazingly dynamic and active, booming field, and so it’s great to talk about these topics.
Winiger: I had the pleasure earlier this summer to attend a conference you were organizing by the name of nucl.ai. But I’ve also seen you’ve done so many things online. Maybe you can start it off by asking what’s your background, actually.
Champandard: So I got hooked on AI for games from university onwards. And I spent a few years as an AI programmer in the games industry and did some stuff in the UK and in Vienna for Rockstar. After that, a bit of consulting also in games, contracting on multiplayer bots, those types of things. I’m now focusing on the conference organization. Because I found that was something that was missing a lot, sort of bringing all these ideas into this shared melting pot and see what comes out. Yeah, and that’s kind of my focus today. Finding cool things, digging into them and then teaching and passing on the information.
Pieters: So, one of these cool things, Alex, has been Deep Forger. So for our listeners, can you explain what Deep Forger exactly is?
Champandard: So a few weeks ago now there was a new algorithm published called A Neural Algorithm for Artistic Style. So I thought well, that’s a kind of an interesting super powerful, very generic algorithm that could take the ideas or the style or the patterns in one image and then combine them with the contents of the different types of patterns in another image, and then create a final output. And so these kinds of algorithms tend to have so many possibilities, but it’s so difficult to understand them just by reading the algorithm.
So I thought a Twitter bot would be a good way to explore that and first of all let people submit things, but also let it try to generate certain things based on a database of different styles. And so it’s proven to be very interesting, helping explore the field of what’s possible by tweeting combinations of style images and other people’s photographs and putting them together and seeing what comes out.
Winiger: Yeah I mean, I’ve been watching the hashtag #stylenet on Twitter practically every day. And Deep Forger things, they seem to be improving over time. At least from the outside. How are you doing it? Are you optimizing the system constantly?
Champandard: So yeah, I think there’s three things to it. The first is I think that the user submissions, that people are submitting things based on things that work. And there are some very passionate users, and they learn what works and they tweak the parameters so they do two or three submissions and they get it right. And so that you’re seeing the community learning how this algorithm works, which is amazing to me. You know, people are understanding how this algorithm really behaves as a tool.
The second part is me tweaking the code and going over certain failure cases and then I add an extra piece of code that will deal with that or customize certain parameters somewhere.
And the third is also I’m starting basically doing some form of learning. Gathering statistics on things that work and things that don’t.
Pieters: Yeah, and as you know Samim and I are big fans of what we term “creative AI.” So specifically to StyleNet, there were so many different things actually popping out the last couple of months. DeepDream and StyleNet and all these other people. So what fascinated you in StyleNet specifically?
Champandard: So, the DeepDream stuff I thought was amazing technically, and I really got into the tech side of things but I wasn’t that amazed by the result. So even though I was fascinated by that I didn’t really jump into it because I didn’t see the potential as a tool. But the StyleNet thing was immediately obvious. I thought it seemed to resonate that people will be using this and it’s the kind of technique that will help artists maybe in six months, two years, who knows?
And so I jumped onto that immediately, like even before there was a first implementation out I thought, okay how can we take this further. And so I started the idea of the bot and a database of paintings, putting stuff together. Sort of, I assumed the algorithm was already implemented somewhere and then I built this whole infrastructure around that hole and then plugged in whatever was available. We’ve tried about three different implementations now of the StyleNet, and they get better over time. So it’s quite amazing how fast this is moving.
Winiger: So you mentioned these databases of paintings.
Champandard: Yeah.
Winiger: And I’m assuming these are partially copyrighted paintings, in a sense. Is the copyright still intact, or…?
Champandard: So the database I’m using is from the Metropolitan Museum; it’s called the Collection. It’s available online and other bots are using it. To be honest, I’ve been kinda sidestepping this issue of copyright because I think it’s an absolutely huge topic. If you put in two different images and mix them up and then have an AI system that creates some output, what is actually the terms and conditions on the final image? I mean, is it subject to the original person who submitted it? But what if the painter’s style…is that copyrightable in some way? There are so many implications there. Certainly a couple of people have mentioned copyright about the bot but it really hasn’t created a controversy yet.
As the quality of the images improves I expect more artists to raise these issues. But for any artists that are working in this field now, if I was good at painting I’d probably be looking at how to find styles that work well with these kind of representations and make them easily automatable or transferable so that if I had fans as an artist they could say, “Hey, I would like to have a picture of my cat painted.” And that’s something we’ve seen from the Twitter bot. People submit pictures of their houses or themselves that they want painted in a famous style. So if you were to do that as an artist today, you could say, “Well look, we can do that for you, fully customized art partly using neural network rules.” So a database of content that’s been custom-created for this neural network, and producing amazing results. I think that’s something that we could see more of in the next year or two.
Pieters: Do you keep track of what are the most popular art styles. Is it Picasso or is it Dalí or. What are the Deep Forger’s fans like?
Champandard: So the bot tweets its most highly-rated forgeries. So if you have retweets or favorites on a painting it will get retweeted. And the ones that have proven the most popular I think are the landscapes.
Winiger: So with DeepDream out, twenty-four hours later there was porn, DeepPorn or whatever you want to call it, online. And with StyleNet it took about ten days and then there was this kind of very abstract one online. Do you think you will get worse or?
Champandard: Well, so the first few days I was expecting it but it didn’t happen. Then there was one submission that was let’s say…questionable but still in a tasteful area. Well, I thought it would be worse, but to be honest I don’t think we’ve seen the peak of this quite yet. I think it’s going to still increase.
Winiger: I saw on Twitter that you had a conversation about bots talking to each other.
Champandard: Yeah.
Winiger: How did that work out, I’m curious?
Champandard: So there’s a Commons Bot which is tweeting images from the same database that I’m using to match the paintings. And so every four hours I think, the bot will tweet a message to Deep Forger and then it creates a forgery. So yeah, the results have been surprisingly good because that bot tends to tweet greyscale images, which users don’t really submit that much. And so you end up with more sketches that match. And so it explores a different part of space.
Pieters: So the bots already have some kind of agency. Would you already consider this as being something like creative? Are we there yet? Is Deep Forger going to be the first creative agent?
Champandard: I… I don’t like to use the word “creative” on a boolean. Actually, shortly after I made the bot I met up with an old friend of mine who was a programmer at Rockstar with me and switched into doing art and that side of things. And we had a long discussion about this and he brought up the fact that art is a question of having the agency and then using that to make certain decisions. So if you look at the bot as it is now, it certainly has some agency over which paintings it selects and how it decides on those images. So a lot of that is code that was manually written, and some of it is a random decision, but there’s agency there. So based on the response of certain artists, I think you could say that it’s art just based on that reaction. Like if they really react strongly to it then it certainly is art, right?
Winiger: So we’ve been having an ongoing tweet over the last couple of months about what I call computational comedy. And comedy is an art form and it must be part of any working theory of creative AI, I guess. And so you brought up recently this notion that there must be self-reflection, in a sense.
Champandard: Yeah. I think it might be a cultural thing. My background is not English-speaking. The way I see it is that comedy requires intent. You have to go out there with the intention of entertaining people. A stand-up comedian is only a stand-up comedian if he’s there to make people laugh on purpose. And so in the projects that you’ve been tweeting about with the #ComputationalComedy hashtag, you’re the comedian. These systems are just a tool that are an extension of you as the comedian, right.
Winiger: That’s a really interesting angle to explore, when something becomes intentional or not. I suppose it’s a core problem of creativity, in a sense. Generative intention, next.
Pieters: So Alex, I wanted to read a tweet which you sent out.
The next #StyleNet paper must include semantic information: this is grass vs. a house, then optimize accordingly. pic.twitter.com/bfF0IE7X5X
— Alex J. Champandard 🍂 (@alexjc) September 2, 2015
And this was actually something which you already wrote very early on, I think even before you publicly published the bot. So is that something that you’ve played with and tried out for Deep Forger?
Champandard: I think the problems with StyleNet have become very quickly obvious. Like for example if you have a painted landscape and a photo of a landscape, then transferring the styles between the two should have some understanding “this is a piece of sky, this is also a piece of sky, and therefore I should use the colors of the sky in one picture and transfer that to the original photograph.”
And so there’s no direct way for the algorithm to know that. It’s just optimizing all these different layers in the neural network. I expect to see future work in this. I expect that will happen within the next year, possibly. But the way I’ve been approaching it is just trying to find better matches between the paintings and sort of thinking outside of the algorithm. And by finding really good quality matches between the paintings and the style, you get really good results so you can sidestep some of the deficiencies in the algorithm by by doing a better job with the selection of the paintings.
Winiger: I’m curious, do you see application of this or similar things in the game industry anytime soon?
Champandard: So I haven’t yet tried applying the algorithm to let’s say individual pieces of content and then using traditional game pipelines to produce the final result. I’ve only tried taking screenshots and then applying the process and getting the resulting screenshots. And so when I did this process for Quake, I labeled it like that. I said it’s more of a concept art. It’s helping you understand the space of what the are style could be. But when I released the Quake screenshots the first thing that game developers were asking is “Can we get this version of Quake in a shader? Can we make a mod that like Picasso-style Quake?”
The problem is it’s taking about six minutes— At the time it was six minutes, I’ve got it down to about three minutes to get similar-quality 720p screenshots. It’s nowhere near real-time, three minutes versus thirty frames per second. It’s not quite there. I think the applications for concept art are short term. I think that’s a very promising avenue, just to get ideas on how you can add visual elements from let’s say famous painters into your game and see how that turns out.
Pieters: Yeah. I mean, it’s one, a matter of just time. Nvidia’s coming out with new graphics cards in a year now, the Pascal or whatever they’re called, which will be like four times faster. So maybe we’ll get already already a frame rate of four frames a seconds? Like, what could be kind of applications when this is real-time? Things like VR?
Champandard: Yeah, I’m not convinced about the real-time post-processing in a shader. I think there’s a big market for an independent developer, but convincing the standard AAA or console developers to switch to this is going to be difficult. For content production, I think there’s a lot of potential like if you imagine VR itself, there’ll be a lot of emphasis on the quality of the environment. And so using these kinds of techniques to improve the quality of the skyboxes, for example, and painting beautiful-looking skies or trees or landscapes, even if there’s other geometry built in a more traditional way. Maybe using these tools as a way to augment the quality of the textures or add a certain style to it or makes things quicker or easier to develop. Maybe taking the style of an experienced artist and then using that style to transfer it to the art of a beginner. So there might be some potential there in just letting artists build these really rough sketches and then having the neural network fill in the gaps.
Winiger: I mean, at the speed of innovation in this space, where do you see us in the short term or mid term, or maybe even the long term? It’s very hard to estimate but maybe you have some wishes.
Champandard: I hope that there’ll be another version of StyleNet that does things a bit more context-sensitively. I think on the implementation side things are constantly improving. We’re getting to understand things better, so that will be more of an incremental thing. But I think anything further forward will be more a question of changing the mindsets of the people that could be using the tech. Changing mindsets always takes longer than changing the technology. And with machine learning moving as fast as it is now, it’s going to be quite scary, the difference in pace between what the machines can do and what the human mind is comfortable with, to put it that way. So from that perspective it’s harder to predict because involved in that is basically predicting how reactive or responsive the community will be, versus how closed off they are to the whole idea.
I think it’s quite fascinating just how fast things are moving, really. The boom that’s happening in the field is mind-blowing. And I’ve not seen this atmosphere— Being in the industry for many years I’ve not seen it happen like this. It’s amazing how quickly you can turn out new things and new prototypes. So I found that very mind-blowing and certainly a completely new field of AI for creative industries, which is booming right now.
Winiger: Welcome, Gene. Congratulations on the super [?] stuff you’ve been doing. There’s practically not a day where there’s not something amazing coming from your Twitter account. I’m curious, what’s your background? What do you do at the moment?
Gene Kogan: I have a mixed background. So I formerly studied math and did some machine learning. Did some research for a time in music information retrieval. So that was sort of how I got my feet wet in machine learning. But the last few years I’ve been working as a coder and an artist in new media. So like I do a lot of stuff with projection and Kinect and sensors, Leap Motion, whatever sort of new technology at the time, try to integrate it into performance.
Pieters: Yeah, it’s really awesome you managed to put together a really great motion picture, I would almost call it, “Why is a Raven Like a Writing Desk?” So can you tell us why you were so much attracted about StyleNet? How did you get involved in this kind of thing?
Kogan: I had never seen anything quite like that before. And you know, I’ve been tracking it [inaudible] DeepDream stuff. With DeepDream it was like you really only had the control over the content and not the style. With StyleNet you have two degrees of freedom, so there’s a lot more sort of ability to get the behavior you want.
But to be honest you know, I do a lot of coding and this project is the one that I did the least amount of coding. I mean, I’m using Justin’s library in Torch. So to me I think my job is more like curatorial than anything. You know, the software he wrote is so good.
Pieters: Yes. Myself being the first guy to create a video with DeepDream and you being the first one to make a video using StyleNet— So how was the process, and secondly what was the reaction and the response to your video?
Kogan: For making the StyleNet video, it takes a lot less intervention, it seems, than DeepDream in getting stable frames. The only thing that I added was blending the output images into the next input. And to answer your second question, the reaction’s been really great. There’s been a bunch of articles, and that’s been trending— Even now it’s still getting a lot of views.
Winiger: You mention this curatorial role. And I think this is very magical. I mean, you did a phenomenal job with selecting the right inputs, and I guess outputs. So if you would have to describe this new creative process, what does it look like? I mean how do you select these inputs?
Kogan: Initially, when we were all producing images, I was trying to get a feel for what style images work best. It seems like if you do things that are just textural, like I tried something like with Mark Rothko for example, the effect is more or less kind of just “take the color palette.” But things that have discernible sort of patterns and shapes transfer really really well. So things like dots and lines…that’s why everyone was using “Starry Night,” because it worked so well. Picking Alice in Wonderland, that was kind of a happy accident.
Pieters: Yeah. I mean, there’s been some recent developments also I think to the Torch [?] like multi-style input images. Have you played with, do you have any experiences there?
Kogan: Yeah, just the last couple of days I’ve started playing with the multiple styles. So I just put up a couple short animations on Twitter that do style interpolation from one style to another. Yeah, just been playing with those a little bit.
And I’m also thinking a little bit more about bringing in some other stuff that I know. So like you can do pretty effective image segmentation. So I could potentially apply different styles to hand-selected regions of an image or a video.
Winiger: So I guess more kind of from fantasy-land, if an animation studio approaches you and offers you a bunch of GPUs and some cash, to do a short animated StyleNet thing, would you do it, A, and do you think the implications for animation studios are [a given?]?
Kogan: I feel like the technology still needs to mature a little bit before the animation studios become really keen on it. Because I mean like, if you think about Pixar, they’re producing breathtaking and very high-resolution imagery. And for now the style transfer stuff appeals to us because we know sort of what’s going on. I think for them, the stuff they’re making is still so much more sort of beautiful. Not to come off the wrong way but in a superficial way like in the sense that you don’t necessarily need to know as much about what’s going on underneath the hood.
Pieters: So from your experience working with this in [?], so making a video, what are the main limitations right now?
Kogan: Well, the implementations that I’ve used all of them have some technical constraints. So it’s very hard to produce images of a high resolution. And then of course it’s very costly. So those are the main technical constraints. And then you know, I think there’s certainly a lot more room to improve the quality of their results in the future. I’m sort of waiting to see what the machine learning researchers improve on the results in the next months [inaudible] and just doing something totally different.
There’s just so many different domains this is being applied. Like I was looking at Alec Radford’s work. He’s putting out these videos of generating faces from scratch and interpolating through them. To me they’re incredible. I really would love to see even a really huge training set of images, what sort of crazy images that you can produce from scratch.
Pieters: What would be the wishlist from your perspective for AI research to work on?
Kogan: Well I guess the first thing that I would say that is in the last few months or few years that some [sites?] of academia are starting to try to make these things a little bit more usable for non-academics. Which is really nice you know, because you get all these fresh perspectives from people who may not necessarily know how to use [?] software in a very savvy way.
All of the neural style libraries, they’re all just you know, command-line utilities. You just put in a content image and a style image… When I released the Alice video, I put a Gist that explains how to do it. And if you know your way around the terminal you don’t really even have to know that much machine learning to make it work. So that’s how well these libraries are designed. I think that’s really helpful because a lot of people have different sort of domains of expertise. You know, a machine learning researcher who’s very focused on expanding the accuracy of the system may not even be thinking about all the different applications that they can inspire. So doing more of this sort of interfacing with other people outside of academia is really really cool. And then I think it comes back to them.
Winiger: Yeah, I mean there’s a sense of community somehow. One way I recently thought about it was that on the one hand the AI research is a discovery art, I suppose. As a system they can interface with. And on the other hand we’ve got all these artistic outputs coming, so that makes it interesting for creative people. How do you see this emerging further? Is this ten years from now, or one year from now?
Kogan: To me the most important characteristic is that I’m interfacing with this, I’m making visuals and so on. But really the most substantial thing is you hope that these are sort of vehicles to inform the public about what these machine learning algorithms do in other domains. Because maybe it’s hard to separate application from the technology, but the same underlying algorithms are found in all sorts of other domains that are much closer to people’s lives. And that you know, it’s cliché to talk about how our technology’s become so omnipresent and so on. But it kind of bears repeating that these things are becoming increasingly influential and then when it comes time to make—particularly in policy decisions and so on—the more people are informed about the existence of these technologies, I think the more democratic that the decisionmaking process will be. So when you ask about ten years from now, I’m hoping that it kind of leads to that.
Winiger: I mean, do you call it art, or do you call it…creative? What would you call it if you would label it? And I’m asking as a bit of a provocation because I see this issue coming up more and more. What do you think?
Kogan: I try to the extent possible to call it, like to describe it, in as detailed terms as I can and not worry too much about whether it is or isn’t art. I mean, art is sort of an antiquated term. It carries these connotations from mostly like the 19th century that may or may not be relevant in every context in which it’s used. So for like the StyleNet stuff there’s so many components to it. There’s the actual artists that I’m sampling from, there’s the software that’s made by somebody else, and I’m more of a curator and so on. So it’s like…it’s very unclear where the creative process comes from. [inaudible] wouldn’t work at all without all of those things working in tandem. So I guess it’s art. I don’t know. I don’t spend too much time thinking about it. I don’t lose much sleep over it.
Winiger: I saw you seem to like music.
Kogan: Mm hm, mm hm.
Winiger: You’ve been doing some work I suppose with musical elements or [inaudible]. StyleNet for music? What’s your prediction? How do you—
Kogan: Ah. I have heard through the grapevine that some of this is being worked on. So yeah, a month or two ago I worked with a library that was using LSTMs to train audio and to produce audio from scratch. I found it a little challenging to get much performance out of it at the time. I put out a couple of sound samples that worked well. But for the most part I think they were overfitting.
It’s actually surprising that the audio stuff is a bit behind video, because I think maybe the biggest bottleneck is that there’s nothing quite like ImageNet or some of these visual databases that exist for training. There’s nothing quite like that as far as I’m aware of for audio.
Pieters: No, no there’s not.
Kogan: Yeah. Although I do know that some of the audio people at both Google and Facebook, some of whom I know, have talked about a so-called DeepDream for audio. I would say that something in the works of producing audio from scratch. I’ve also done some stuff with text also, and that’s also been pretty gratifying. So keeping up with the different text generation implementations, the first one that I saw was Andrej Karpathy’s.
Winiger: What did you train your char-rnn on?
Kogan: I tried a bunch of different sources. So at first I was doing just authors that I could find. I did Jack Kerouac and Ginsberg. I tried the Bible, Dante’s Inferno? Then I started doing sort of more personal stuff. So I was training it on my Gmail, and I keep a journal that I’ve been keeping for the last three years. So I trained it on my journal. I started making these texts that was like me watching watching like a robot version of myself writing. It was really surreal.
Winiger: If you made it this far, thanks for listening.
Pieters: And also we would really love to hear your comments and any kind of feedback. So drop us a line at info@ethicalmachines.com.
Winiger: See you next time.
Pieters: Adios.