Semi-parametric Image Synthesis

Year：CVPR2018
Author：Xiaojuan Qi
School：CUHK 港中文
Code：https://github.com/xjqicuhk/SIMS

Problem Statement / Gap

基于参数的深度学习网络图像生成与人类的绘画过程不符。
非参数方法在图像生成没有利用大量的数据集。
因此引入semi-parametric的图像生成方法。

Contributions

模型根据segmentation的形状及周边类型信息，在数据集中找到与最相似的同类别块状图像，不同类别拼接转换得到Canvas，与标签图一起作为输入，得到最终的生成结果。

For each connected component, we retrieve a compatible segment from M based on shape, location and context, after transformed, they are composited onto a canvas. The canvas C and the input layout L are used as input to a synthesis network.

Method and Solution

External Memory

1. Retrieval the most compatible segment
compute $L_j^mask$ and $L_j^cont$ for each semantic segment $L_j$ , then select the most compatible segment $P_{\sigma (j)}$ in M; IoU is the intersection over union score

2. Match origin mask through Transformation network
The transformation network : T is designed to transform the selected object segment $P_{\sigma (j)}$ to match $L_j$ via translation, rotation, scaling and clipping.
Network training: the author simulate inconsistencies (in shape, scale and location) by applying random affine transformations and cropping to $P_i^{color}$ , and get $\widetilde{P}_i^{color}$ .

color image is more specific and better constrains the transformation.

3. Adjust object order through Ordering network
When two segment overlap, model need to determine their order, since one of them will occlude the other. Like sky and building, building should be the front and sky should be background.
Network training: some datasets have provided the front-back order, network’s output is a c-dimensional one-hot vector that indicates the semantic label of the segment that should be front.

Image Synthesis

Canvas is inadequate in itself: 1) regions are typically missing. 2) different segments are inconsistently illuminated. 3)color-inbalanced. 4) boundary artifacts

Aim: canvas and semantic layout --> realistic image

1. Image Synthesis Network
Architecture: an encoder-decoder structure with skip connections.

2. How to train network?
method: simulate canvas $C'$ by applying stenciling, color transfer and boundary elision, to recover the original image.

Stenciling: simulate missing regions by stenciling each segment in (I,L) using a mask obtained from a different segment in the dataset.

Color transfer: transfer from the color distribution of a segment in M to the segment in $C'$

Boundary elision: boundary pixel are replaced by white pixel to force the network to learn to synthesize content near boundaries. Outside an object segment are replaced by black pixels.

Evaluation

compare with: pix2pix, CRN
datasets: cityscapes, ade20k, nyu
evaluation: IoU, Accuracy…

Notes

How to represent $P_i^{color}$

Summary

Actually it’s a comprehensive work, the author even use an inpaiting network to fill up the missing region? But the idea that finding the most similar segment from training datasets by compute a score contains shape, location or color is almost silimar with mine, hope i can get some new ideas from this paper.

Sometimes using network to synthesis is more quicker, ranther than using an analyical approach, without hard-coding such properties as rules.

What a goddamn network!!

References

Tranformation Network: M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu. Spatial transformer networks. In NIPS, 2015. 3, 4
Color Transfer: E. Reinhard, M. Ashikhmin, B. Gooch, and P. Shirley. Color transfer between images. IEEE Computer Graphics and Applications, 21(5), 2001. 6