[学习交流] 【上海校区】深度学习实战（一）快速理解实现风格迁移

前言

Gatys大神之前发表了一篇利用风格迁移进行作画的文章，让普通的照片具有名人的画风，效果如下：

让一篇普通的图片有了梵高的风格，厉害了。
文章链接：A Neural Algorithm of Artistic Style

其实我们也可以利用风格迁移实现自己的风格，比如本文实现的国画风格。难点在于很多国画背景与内容色差并没西方画作那么明显，有的仅用淡墨，黑白两色就可以完成。这样给风格迁移学习带来较大困难，不过通过调节学习速率和损失函数也可以实现很好的学习效果。比如本文实现的国画风格： 

风格迁移1. 马

风格迁移2. 山水

是不是很有意思！接下来我们就一起来理解实现风格迁移。

实现风格迁移主要依赖于
• torch7
• loadcaffe
• VGG-19 model

可选：
CUDA ，cuDNN，OpenCL

安装相关平台还是蛮麻烦的，需要 protobuf, loadcaffe, torch 三件套。而且有些 trick ，比如 lua 版本要足够新否则会有包安装不上的问题（luarocks），需要查询相关安装说明，github issue 以及 FAQ 耐心解决。

相关代码整理在我的GitHub上，感兴趣的同学可以直接克隆下来调试玩一玩。

GitHub地址：https://github.com/TONYCHANBB/Chinese_painting-style

原理

回到原理上来，作者定义了两个损失函数：style loss 和 content loss，回到文章初始的图上来

将图a的style loss 和图p的content loss 组合起来，最小化total loss function求得x

Ltotal(p⃗ ,a⃗ ,x⃗ )=αLcontent(p⃗ ,x⃗ )+βLstyle(a⃗ ,x⃗ )Ltotal(p→,a→,x→)=αLcontent(p→,x→)+βLstyle(a→,x→)

其中，αα, ββ对应两个loss的权重，调节它们会得到不同的效果。

如何得到两个损失函数和内容风格重建呢呢，我们回到网络结构上来，作者利用了VGG-Network16个卷积层和5个池化层，没有用全连接层，采用的平均池化。（文末有VGG的网络结构图）

对于内容重建来说，用了原始网络的五个卷积层，‘conv1_1’ (a), ‘conv2_1’ (b), ‘conv3_1’ (c), ‘conv4_1’ (d) and ‘conv5_1’ (e)，即图下方中的a、b、c、d、e。VGG 网络主要用来做内容识别，在实践中作者发现，使用前三层a、b、c已经能够达到比较好的内容重建工作，d、e两层保留了一些比较高层的特征，丢失了一些细节。

对于风格重建,用了卷积层的不同子集：
‘conv1_1’ (a),
‘conv1_1’ and ‘conv2_1’ (b),
‘conv1_1’, ‘conv2_1’ and ‘conv3_1’ (c),
‘conv1_1’, ‘conv2_1’ , ‘conv3_1’and ‘conv4_1’ (d),
‘conv1_1’, ‘conv2_1’ , ‘conv3_1’, ‘conv4_1’and ‘conv5_1’ (e)

这样构建网络可以忽略图像的内容，保留风格。

内容损失函数：

<span class="MathJax" id="MathJax-Element-41-Frame" tabindex="0" data-mathml="Lcontent" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">LcontentLcontent采用平方损失函数，为每个像素的损失和

Lcontent(p⃗ ,x⃗ ,l)=12∑ij(Flij−Plij)2Lcontent(p→,x→,l)=12∑ij(Fijl−Pijl)2

<span class="MathJax" id="MathJax-Element-43-Frame" tabindex="0" data-mathml="Fijl" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">FlijFijl为第<span class="MathJax" id="MathJax-Element-44-Frame" tabindex="0" data-mathml="l" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">ll层第<span class="MathJax" id="MathJax-Element-45-Frame" tabindex="0" data-mathml="i" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">ii个卷积的第<span class="MathJax" id="MathJax-Element-46-Frame" tabindex="0" data-mathml="j" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">jj个位置的特征表示，用来代表内容，<span class="MathJax" id="MathJax-Element-47-Frame" tabindex="0" data-mathml="P" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">PP为某图像该位置的特征表示，<span class="MathJax" id="MathJax-Element-48-Frame" tabindex="0" data-mathml="x" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">xx为想形成的目标图像。
我们可以这样理解，首先对待提取内容的图片p得到该位置的内容表示<span class="MathJax" id="MathJax-Element-49-Frame" tabindex="0" data-mathml="P" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">PP，可以构造一个图像<span class="MathJax" id="MathJax-Element-50-Frame" tabindex="0" data-mathml="x" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">xx在该位置的特征无限趋近于<span class="MathJax" id="MathJax-Element-51-Frame" tabindex="0" data-mathml="P" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">PP，使得内容损失函数最小，我们的目标就是找到这个在内容上无限接近<span class="MathJax" id="MathJax-Element-52-Frame" tabindex="0" data-mathml="P" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">PP的<span class="MathJax" id="MathJax-Element-53-Frame" tabindex="0" data-mathml="x" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">xx.
如何找到它呢？作者对<span class="MathJax" id="MathJax-Element-54-Frame" tabindex="0" data-mathml="x" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">xx生成一个白噪声图像，然后利用经典的梯度下降法Find it。
损失函数的导数为：

∂Lcontent∂Flij={(Fl−Pl)ij 0Flij>0Flij<0∂Lcontent∂Fijl={(Fl−Pl)ijFijl>0 0Fijl<0

注意由于VGG使用 ReLu作为 activation layer，所以导数分段，<span class="MathJax" id="MathJax-Element-56-Frame" tabindex="0" data-mathml="F" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">FF小于0，导数为0. <span class="MathJax" id="MathJax-Element-57-Frame" tabindex="0" data-mathml="Lcontent" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">LcontentLcontent为各层损失求和。

风格损失函数：

风格损失函数理解上与内容损失函数相同，只是利用了不同层相应的组合表示，作者对于每一层的相应建立了一个格莱姆矩阵<span class="MathJax" id="MathJax-Element-58-Frame" tabindex="0" data-mathml="G" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">GG表示他们的特征关联，

Glij=∑kFlijFljkGijl=∑kFijlFjkl

ll层的损失为：

Ei=14N2lM2l∑ij(Glij−Alij)2Ei=14Nl2Ml2∑ij(Gijl−Aijl)2

其中AA为原始图像在ll层的表示。

则风格损失函数的表示为：

Lstyle(a⃗ ,x⃗ ,l)=∑l=0LwiEiLstyle(a→,x→,l)=∑l=0LwiEi

<span class="MathJax" id="MathJax-Element-65-Frame" tabindex="0" data-mathml="Wl" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">WlWl为每层的权重。

导数为：

∂El∂Flij={1N2lM2l((Fl)T(Gl−Al))ji 0Flij>0Flij<0∂El∂Fijl={1Nl2Ml2((Fl)T(Gl−Al))jiFijl>0 0Fijl<0

则总的损失函数为

Ltotal(p⃗ ,a⃗ ,x⃗ )=αLcontent(p⃗ ,x⃗ )+βLstyle(a⃗ ,x⃗ )Ltotal(p→,a→,x→)=αLcontent(p→,x→)+βLstyle(a→,x→)

就是文初的给出的公式，我们最小化这个损失函数就可以了。

值得注意的，这里优化的参数不再是网络的<span class="MathJax" id="MathJax-Element-68-Frame" tabindex="0" data-mathml="w" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">ww和<span class="MathJax" id="MathJax-Element-69-Frame" tabindex="0" data-mathml="b" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">bb，而是初始输入的一张白噪声图片<span class="MathJax" id="MathJax-Element-70-Frame" tabindex="0" data-mathml="x" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">xx 。

了解原理后我们就可以大干一场了！首先可以把GitHub上代码clone或下载下来，先运行一遍，保证没有问题，理解原理和代码后就可以修改参数，制作我们自己的风格了！

Tips：
(1) 注意我们还需要下载VGG模型（放在当前项目下），运行时记得模型的路径改成自己的当前路径

(2) 我们可以调参，更改优化算法，甚至网络结构，尝试看会不会得到更好的效果，而且我们还可以做视频的风格转化哦

(3) neural style 无法保存训练好的模型，每次转换风格都要重新跑一遍，时间很长很长，推荐大家安装GPU的TensorFlow。

(4) 斯坦福的李飞飞大牛发了一篇《Perceptual Losses for Real-Time Style Transfer and Super-Resolution》，通过使用perceptual loss来替代per-pixels loss使用pre-trained的vgg model来简化原先的loss计算，增加一个transform Network，直接生成Content image的style。感兴趣的朋友也可以研究下，做些好玩的事。

VGG-Network结构：

不二晨 · 不二晨

奈斯，加油加油

帐号		自动登录	找回密码
密码			加入黑马

[学习交流] 【上海校区】深度学习实战（一）快速理解实现风格迁移

1 个回复