A neural algorithm of artistic style

1 Introduction

This paper proposes an algorithm to divide a picture into style and content. This can be used in picture synthesis.  For example, given a picture of scenery called A, and a picture of any famous artwork B, we can recombine A and B into one pictures that both capture the content of the A and get gist  of B's style. The example is as below.

f:id:PDFangeltop1:20160526164151p:plain

This can be done using the Neural network method that is prevalent nowadays. Concretely, in the context of neural network , the image can be represented using a real-valued vector which is computed using some kind of deep neural networks. In this paper, they use pre-trained VGG model of convolutional network.  

 

2 Model

Since the generated picture both capture the style and content of 2 pictures, we define an objective function that has the loss term of content with respect to one picture and the loss term of style with respect to another picture.

f:id:PDFangeltop1:20160526164918p:plain

x is the input picture that we want to generated , and is initilzed as white noise image.

p is the picture of which x has to satisfy the content constraint.

a is the picture of which x has to meet the style constraint.

Since there are coefficients of both content loss term and style loss term. We can control the generated image to whether have more fidelity on content or on style by adjust the coefficient. 

In the model all the parameter will be fixed (including the pretrained VGG model) except the input image x.  We use back-propragation that carry the error from objective function to the input image x. That is , the exact pixel value of x will be modified according to the error back propagated from the model so that finally the x will both satify the content constraint of p and style constraint of a.

For content loss, 

f:id:PDFangeltop1:20160526165703p:plain

F(l,i,j) means the l-th  layer of VGG net. And i, j means the j-th neuron of the i-th filter in l-th layer. This means that we reshape the matrix in l-th layer [C,H,W] to [C,H*W]. This loss term can also be considered as putting a pixel-wise constraint on the generated picture. 

 

For the style loss,

f:id:PDFangeltop1:20160526170719p:plain

f:id:PDFangeltop1:20160526170726p:plain

We measure the style variable of one layer by letting the all filters multiply with each other.The factor wl is equal to one divided by the number of active layers with a non-zero loss-weigth wl.