文章链接:A Neural Algorithm of Artistic Style
• torch7
• loadcaffe
• VGG-19 model
安装相关平台还是蛮麻烦的, 需要 protobuf, loadcaffe, torch 三件套。而且有些 trick ,比如 lua 版本要足够新否则会有包安装不上的问题 (luarocks),需要查询相关安装说明,github issue 以及 FAQ 耐心解决。
原理回到原理上来,作者定义了两个损失函数:style loss 和 content loss,回到文章初始的图上来
将图a的style loss 和 图p的content loss 组合起来,最小化total loss function求得x
对于内容重建来说,用了原始网络的五个卷积层,‘conv1_1’ (a), ‘conv2_1’ (b), ‘conv3_1’ (c), ‘conv4_1’ (d) and ‘conv5_1’ (e),即图下方中的a、b、c、d、e。VGG 网络主要用来做内容识别,在实践中作者发现,使用前三层a、b、c已经能够达到比较好的内容重建工作,d、e两层保留了一些比较高层的特征,丢失了一些细节。
‘conv1_1’ (a),
‘conv1_1’ and ‘conv2_1’ (b),
‘conv1_1’, ‘conv2_1’ and ‘conv3_1’ (c),
‘conv1_1’, ‘conv2_1’ , ‘conv3_1’and ‘conv4_1’ (d),
‘conv1_1’, ‘conv2_1’ , ‘conv3_1’, ‘conv4_1’and ‘conv5_1’ (e)
<span class="MathJax" id="MathJax-Element-41-Frame" tabindex="0" data-mathml="Lcontent" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">LcontentLcontent采用平方损失函数,为每个像素的损失和
<span class="MathJax" id="MathJax-Element-43-Frame" tabindex="0" data-mathml="Fijl" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">FlijFijl为第<span class="MathJax" id="MathJax-Element-44-Frame" tabindex="0" data-mathml="l" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">ll层第<span class="MathJax" id="MathJax-Element-45-Frame" tabindex="0" data-mathml="i" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">ii个卷积的第<span class="MathJax" id="MathJax-Element-46-Frame" tabindex="0" data-mathml="j" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">jj个位置的特征表示,用来代表内容,<span class="MathJax" id="MathJax-Element-47-Frame" tabindex="0" data-mathml="P" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">PP为某图像该位置的特征表示,<span class="MathJax" id="MathJax-Element-48-Frame" tabindex="0" data-mathml="x" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">xx为想形成的目标图像。
我们可以这样理解,首先对待提取内容的图片p得到该位置的内容表示<span class="MathJax" id="MathJax-Element-49-Frame" tabindex="0" data-mathml="P" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">PP,可以构造一个图像<span class="MathJax" id="MathJax-Element-50-Frame" tabindex="0" data-mathml="x" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">xx在该位置的特征无限趋近于<span class="MathJax" id="MathJax-Element-51-Frame" tabindex="0" data-mathml="P" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">PP,使得内容损失函数最小,我们的目标就是找到这个在内容上无限接近<span class="MathJax" id="MathJax-Element-52-Frame" tabindex="0" data-mathml="P" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">PP的<span class="MathJax" id="MathJax-Element-53-Frame" tabindex="0" data-mathml="x" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">xx.
如何找到它呢?作者对<span class="MathJax" id="MathJax-Element-54-Frame" tabindex="0" data-mathml="x" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">xx生成一个白噪声图像,然后利用经典的梯度下降法Find it。
注意由于VGG使用 ReLu作为 activation layer,所以导数分段,<span class="MathJax" id="MathJax-Element-56-Frame" tabindex="0" data-mathml="F" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">FF小于0,导数为0. <span class="MathJax" id="MathJax-Element-57-Frame" tabindex="0" data-mathml="Lcontent" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">LcontentLcontent为各层损失求和。
风格损失函数理解上与内容损失函数相同,只是利用了不同层相应的组合表示,作者对于每一层的相应建立了一个格莱姆矩阵<span class="MathJax" id="MathJax-Element-58-Frame" tabindex="0" data-mathml="G" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">GG表示他们的特征关联,
<span class="MathJax" id="MathJax-Element-65-Frame" tabindex="0" data-mathml="Wl" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">WlWl为每层的权重。
值得注意的,这里优化的参数不再是网络的<span class="MathJax" id="MathJax-Element-68-Frame" tabindex="0" data-mathml="w" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">ww和<span class="MathJax" id="MathJax-Element-69-Frame" tabindex="0" data-mathml="b" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">bb,而是初始输入的一张白噪声图片<span class="MathJax" id="MathJax-Element-70-Frame" tabindex="0" data-mathml="x" role="presentation" style="box-sizing: border-box; outline: 0px; display: inline; line-height: normal; font-size: 16px; text-align: left; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; word-break: break-all; position: relative;">xx 。
(1) 注意我们还需要下载VGG模型(放在当前项目下),运行时记得模型的路径改成自己的当前路径
(2) 我们可以调参,更改优化算法,甚至网络结构,尝试看会不会得到更好的效果,而且我们还可以做视频的风格转化哦
(3) neural style 无法保存训练好的模型,每次转换风格都要重新跑一遍,时间很长很长,推荐大家安装GPU的TensorFlow。
(4) 斯坦福的李飞飞大牛发了一篇《Perceptual Losses for Real-Time Style Transfer and Super-Resolution》,通过使用perceptual loss来替代per-pixels loss使用pre-trained的vgg model来简化原先的loss计算,增加一个transform Network,直接生成Content image的style。感兴趣的朋友也可以研究下,做些好玩的事。
欢迎光临 黑马程序员技术交流社区 ( | 黑马程序员IT技术论坛 X3.2 |