FeGAN: Scaling Distributed GANs

Abstract

Existing approaches to distribute Generative Adversarial Networks (GANs) either (i) fail to scale for they typically put the two components of a GAN (the generator and the discriminator) on different machines, inducing significant communication overhead, or (ii) they face GAN training specific issues, exacerbated by distribution.

We propose FeGAN, the first middleware for distributing GANs over hundreds of devices addressing the issues of mode collapse and vanishing gradients. Essentially, we revisit the idea of Federated Learning, co-locating a generator with a discriminator on each device (addressing the scaling problem) and having a server aggregate the devices' models using balanced sampling and Kullback-Leibler (KL) weighting, mitigating training issues and boosting convergence.

Through extensive experiments, we show that FeGAN generates high-quality dataset samples in a scalable and devices' heterogeneity tolerant manner. In particular, FeGAN achieves up to 5× throughput gain with 1.5× less bandwidth compared to the state-of-the-art GAN distributed approach (named MD-GAN), while scaling to at least one order of magnitude more devices. We demonstrate that FeGAN boosts training by 2.6× w.r.t. a baseline application of Federated Learning to GANs while preventing training issues.

Publication
Proceedings of the 21st International Middleware Conference