Abstract
Normalization methods are a central building block in the deep learning toolbox. They accelerate and stabilize training, while decreasing the dependence on manually tuned learning rate schedules. When learning from multi-modal distributions, the effectiveness of batch normalization (BN), arguably the most prominent normalization method, is reduced. As a remedy, we propose a more flexible approach: by extending the normalization to more than a single mean and variance, we detect modes of data on-the-fly, jointly normalizing samples that share common features. We demonstrate that our method outperforms BN and other widely used normalization techniques in several experiments, including single and multi-task datasets.
Original language | English |
---|---|
Title of host publication | Proceedings of the Seventh International Conference on Learning Representations (ICLR 2019) |
Place of Publication | New Orleans, Louisiana, USA |
Number of pages | 12 |
Publication status | E-pub ahead of print - 9 May 2019 |
Event | Seventh International Conference on Learning Representations - New Orleans, United States Duration: 6 May 2019 → 9 May 2019 https://iclr.cc/ |
Conference
Conference | Seventh International Conference on Learning Representations |
---|---|
Abbreviated title | ICLR 2019 |
Country/Territory | United States |
City | New Orleans |
Period | 6/05/19 → 9/05/19 |
Internet address |
Keywords
- Deep learning
- expert models
- Normalization
- Computer vision