Large Scale GAN Training for High Fidelity Natural Image Synthesis

Andrew Brock
Andrew Brock

ICLR, Volume abs/1809.11096, 2019.

Cited by: 905|Bibtex|Views51|Links
EI
Keywords:
generative imageimage synthesisgan trainingFrechet Inception DistanceInception ScoreMore(2+)
Weibo:
We have demonstrated that Generative Adversarial Networks trained to model natural images of multiple categories highly benefit from scaling up, both in terms of fidelity and variety of the generated samples

Abstract:

Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying o...More

Code:

Data:

0
Introduction
  • ArXiv:1809.11096v2 [cs.LG] 25 Feb 2019

    The state of generative image modeling has advanced dramatically in recent years, with Generative Adversarial Networks (GANs, Goodfellow et al (2014)) at the forefront of efforts to generate highfidelity, diverse images with models learned directly from data.
  • GAN training is dynamic, and sensitive to nearly every aspect of its setup, but a torrent of research has yielded empirical and theoretical insights enabling stable training in a variety of settings
  • Despite this progress, the current state of the art in conditional ImageNet modeling (Zhang et al, 2018) achieves an Inception Score (Salimans et al, 2016) of 52.5, compared to 233 for real data.
Highlights
  • arXiv:1809.11096v2 [cs.LG] 25 Feb 2019

    The state of generative image modeling has advanced dramatically in recent years, with Generative Adversarial Networks (GANs, Goodfellow et al (2014)) at the forefront of efforts to generate highfidelity, diverse images with models learned directly from data
  • As our models are able to trade sample variety for quality, it is unclear how best to compare against prior art; we report values at three settings, with complete curves in Appendix D
  • Our models outperform the previous state-of-the-art Inception Score and Frechet Inception Distance scores achieved by Miyato et al (2018) and Zhang et al (2018)
  • We have demonstrated that Generative Adversarial Networks trained to model natural images of multiple categories highly benefit from scaling up, both in terms of fidelity and variety of the generated samples
  • Our models set a new level of performance among ImageNet Generative Adversarial Network models, improving on the state of the art by a large margin
  • We have presented an analysis of the training behavior of large scale Generative Adversarial Network, characterized their stability in terms of the singular values of their weights, and discussed the interplay between stability and performance
Results
  • EVALUATION ON IMAGENET

    The authors evaluate the models on ImageNet ILSVRC 2012 (Russakovsky et al, 2015) at 128×128, 256×256, and 512×512 resolutions, employing the settings from Table 1, row 8.
  • As the models are able to trade sample variety for quality, it is unclear how best to compare against prior art; the authors report values at three settings, with complete curves in Appendix D.
  • We report the FID at the truncation setting for which our model’s IS is the same as that attained by the real validation data, reasoning that this is a passable measure of maximum sample variety achieved while still achieving a good level of “objectness.” Third, we report FID at the maximum IS achieved by each model, to demonstrate how much variety must be traded off to maximize quality.
  • The authors' models outperform the previous state-of-the-art IS and FID scores achieved by Miyato et al (2018) and Zhang et al (2018)
Conclusion
  • The authors have demonstrated that Generative Adversarial Networks trained to model natural images of multiple categories highly benefit from scaling up, both in terms of fidelity and variety of the generated samples.
  • The authors' models set a new level of performance among ImageNet GAN models, improving on the state of the art by a large margin.
  • The authors have presented an analysis of the training behavior of large scale GANs, characterized their stability in terms of the singular values of their weights, and discussed the interplay between stability and performance
Summary
  • Introduction:

    ArXiv:1809.11096v2 [cs.LG] 25 Feb 2019

    The state of generative image modeling has advanced dramatically in recent years, with Generative Adversarial Networks (GANs, Goodfellow et al (2014)) at the forefront of efforts to generate highfidelity, diverse images with models learned directly from data.
  • GAN training is dynamic, and sensitive to nearly every aspect of its setup, but a torrent of research has yielded empirical and theoretical insights enabling stable training in a variety of settings
  • Despite this progress, the current state of the art in conditional ImageNet modeling (Zhang et al, 2018) achieves an Inception Score (Salimans et al, 2016) of 52.5, compared to 233 for real data.
  • Results:

    EVALUATION ON IMAGENET

    The authors evaluate the models on ImageNet ILSVRC 2012 (Russakovsky et al, 2015) at 128×128, 256×256, and 512×512 resolutions, employing the settings from Table 1, row 8.
  • As the models are able to trade sample variety for quality, it is unclear how best to compare against prior art; the authors report values at three settings, with complete curves in Appendix D.
  • We report the FID at the truncation setting for which our model’s IS is the same as that attained by the real validation data, reasoning that this is a passable measure of maximum sample variety achieved while still achieving a good level of “objectness.” Third, we report FID at the maximum IS achieved by each model, to demonstrate how much variety must be traded off to maximize quality.
  • The authors' models outperform the previous state-of-the-art IS and FID scores achieved by Miyato et al (2018) and Zhang et al (2018)
  • Conclusion:

    The authors have demonstrated that Generative Adversarial Networks trained to model natural images of multiple categories highly benefit from scaling up, both in terms of fidelity and variety of the generated samples.
  • The authors' models set a new level of performance among ImageNet GAN models, improving on the state of the art by a large margin.
  • The authors have presented an analysis of the training behavior of large scale GANs, characterized their stability in terms of the singular values of their weights, and discussed the interplay between stability and performance
Tables
  • Table1: Frechet Inception Distance (FID, lower is better) and Inception Score (IS, higher is better) for ablations of our proposed modifications. Batch is batch size, Param is total number of parameters, Ch. is the channel multiplier representing the number of units in each layer, Shared is using shared embeddings, Skip-z is using skip connections from the latent to multiple layers, Ortho. is Orthogonal Regularization, and Itr indicates if the setting is stable to 106 iterations, or it collapses at the given iteration. Other than rows 1-4, results are computed across 8 random initializations
  • Table2: Evaluation of models at different resolutions. We report scores without truncation (Column 3), scores at the best FID (Column 4), scores at the IS of validation data (Column 5), and scores at the max IS (Column 6). Standard deviations are computed over at least three random initializations
  • Table3: BigGAN results on JFT-300M at 256×256 resolution. The FID and IS columns report these scores given by the JFT-300M-trained Inception v2 classifier with noise distributed as z ∼ N (0, I) (non-truncated). The (min FID) / IS and FID / (max IS) columns report scores at the best FID and IS from a sweep across truncated noise distributions ranging from σ = 0 to σ = 2. Images from the JFT-300M validation set have an IS of 50.88 and FID of 1.94
  • Table4: BigGAN architecture for 128 × 128 images. ch represents the channel width multiplier in each network from Table 1
  • Table5: BigGAN architecture for 256 × 256 images. Relative to the 128 × 128 architecture, we add an additional ResBlock in each network at 16×16 resolution, and move the non-local block in G to 128 × 128 resolution. Memory constraints prevent us from moving the non-local block in D
  • Table6: BigGAN architecture for 512 × 512 images. Relative to the 256 × 256 architecture, we add an additional ResBlock at the 512 × 512 resolution. Memory constraints force us to move the non-local block in both networks back to 64 × 64 resolution as in the 128 × 128 pixel setting
  • Table7: BigGAN-deep architecture for 128 × 128 images
  • Table8: BigGAN-deep architecture for 256 × 256 images
  • Table9: BigGAN-deep architecture for 512 × 512 images
Download tables as Excel
Funding
  • We find that we can dramatically improve the state of the art and train models up to 512×512 resolution without need for explicit multiscale methods like Karras et al (2018)
  • In all three cases, our models outperform the previous state-of-the-art IS and FID scores achieved by Miyato et al (2018) and Zhang et al (2018)
  • Our results show that these techniques substantially improve performance even in the setting of this much larger dataset at the same model capacity (64 base channels)
Study subjects and analysis
cases: 3
Second, we report the FID at the truncation setting for which our model’s IS is the same as that attained by the real validation data, reasoning that this is a passable measure of maximum sample variety achieved while still achieving a good level of “objectness.” Third, we report FID at the maximum IS achieved by each model, to demonstrate how much variety must be traded off to maximize quality. In all three cases, our models outperform the previous state-of-the-art IS and FID scores achieved by Miyato et al (2018) and Zhang et al (2018). In addition to the BigGAN model introduced in the first version of the paper and used in the majority of experiments (unless otherwise stated), we also present a 4x deeper model (BigGAN-deep) which uses a different configuration of residual blocks

Reference
  • Martın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: A system for large-scale machine learning. In OSDI, 2016.
    Google ScholarLocate open access versionFindings
  • Martin Arjovsky, Soumith Chintala, and Leon Bottou. Wasserstein generative adversarial networks. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Shane Barratt and Rishi Sharma. A note on the Inception Score. In arXiv preprint arXiv:1801.01973, 2018.
    Findings
  • Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, and Remi Munos. The Cramer distance as a solution to biased Wasserstein gradients. In arXiv preprint arXiv:1705.10743, 2017.
    Findings
  • Mikolaj Binkowski, Dougal J. Sutherland, Michael Arbel, and Arthur Gretton. Demystifying MMD GANs. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Andrew Brock, Theodore Lim, J.M. Ritchie, and Nick Weston. Neural photo editing with introspective adversarial networks. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Harm de Vries, Florian Strub, Jeremie Mary, Hugo Larochelle, Olivier Pietquin, and Aaron Courville. Modulating early visual processing by language. In NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Emily Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. Deep generative image models using a laplacian pyramid of adversarial networks. In NIPS, 2015.
    Google ScholarLocate open access versionFindings
  • Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. A learned representation for artistic style. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • William Fedus, Mihaela Rosca, Balaji Lakshminarayanan, Andrew M. Dai, Shakir Mohamed, and Ian Goodfellow. Many paths to equilibrium: GANs do not need to decrease a divergence at every step. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, 2010.
    Google ScholarLocate open access versionFindings
  • Gene Golub and Henk Van der Vorst. Eigenvalue computation in the 20th century. Journal of Computational and Applied Mathematics, 123:35–65, 2000.
    Google ScholarLocate open access versionFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, and Aaron Courville Yoshua Bengio. Generative adversarial nets. In NIPS, 2014.
    Google ScholarLocate open access versionFindings
  • Google. Cloud TPUs. https://cloud.google.com/tpu/, 2018.
    Findings
  • Ishaan Gulrajani, Faruk Ahmed, Martın Arjovsky, Vincent Dumoulin, and Aaron C. Courville. Improved training of Wasserstein GANs. In NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Gunter Klambauer, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
    Google ScholarLocate open access versionFindings
  • Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of GANs for improved quality, stability, and variation. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2014.
    Google ScholarLocate open access versionFindings
  • Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt Kira. On convergence and stability of GANs. In arXiv preprint arXiv:1705.07215, 2017.
    Findings
  • Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009.
    Google ScholarFindings
  • Jae Hyun Lim and Jong Chul Ye. Geometric GAN. In arXiv preprint arXiv:1705.02894, 2017.
    Findings
  • Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, and Zhen Wang. Least squares generative adversarial networks. In arXiv preprint arXiv:1611.04076, 2016.
    Findings
  • Marco Marchesi. Megapixel size image creation using generative adversarial networks. In arXiv preprint arXiv:1706.00082, 2016.
    Findings
  • Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. Which training methods for GANs do actually converge? In ICML, 2018.
    Google ScholarLocate open access versionFindings
  • Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. In arXiv preprint arXiv:1411.1784, 2014.
    Findings
  • Takeru Miyato and Masanori Koyama. cGANs with projection discriminator. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. f-GAN: Training generative neural samplers using variational divergence minimization. In NIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Augustus Odena, Vincent Dumoulin, and Chris Olah. Deconvolution and checkerboard artifacts. Distill, 2016.
    Google ScholarLocate open access versionFindings
  • Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier GANs. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Augustus Odena, Jacob Buckman, Catherine Olsson, Tom B. Brown, Christopher Olah, Colin Raffel, and Ian Goodfellow. Is generator conditioning causally related to GAN performance? In ICML, 2018.
    Google ScholarLocate open access versionFindings
  • Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron Courville. FiLM: Visual reasoning with a general conditioning layer. In AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • Mathijs Pieters and Marco Wiering. Comparing generative adversarial network techniques for image creation and modificatio. In arXiv preprint arXiv:1803.09093, 2014.
    Findings
  • Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. ImageNet large scale visual recognition challenge. IJCV, 115:211–252, 2015.
    Google ScholarLocate open access versionFindings
  • Tim Salimans and Diederik Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In NIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training GANs. In NIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Han Zhang, Alec Radford, and Dimitris Metaxas. Improving GANs using optimal transport. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Andrew Saxe, James McClelland, and Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In ICLR, 2014.
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi, and Ferenc Huszr. Amortised map inference for image super-resolution. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15:1929–1958, 2014.
    Google ScholarLocate open access versionFindings
  • Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. Revisiting unreasonable effectiveness of data in deep learning era. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Lucas Theis, Aaron van den Oord, and Matthias Bethge. A note on the evaluation of generative models. In arXiv preprint arXiv:1511.01844, 2015.
    Findings
  • Dustin Tran, Rajesh Ranganath, and David M. Blei. Hierarchical implicit models and likelihood-free variational inference. In NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Xiaolong Wang, Ross B. Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Yuhuai Wu, Yuri Burda, Ruslan Salakhutdinov, and Roger B. Grosse. On the quantitative analysis of decoder-based generative models. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Yasin Yazc, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, and Vijay Chandrasekhar. The unusual effectiveness of averaging in gan training. In arXiv preprint arXiv:1806.04498, 2018.
    Findings
  • Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. Self-attention generative adversarial networks. In arXiv preprint arXiv:1805.08318, 2018.
    Findings
Your rating :
0

 

Tags
Comments