Normalization methods are a central building block in the deep learning toolbox. They accelerate and stabilize training, while decreasing the dependence on manually tuned learning rate schedules. When learning from multi-modal distributions, the effectiveness of batch normalization (BN), arguably the most prominent normalization method, is reduced. As a remedy, we propose a more flexible approach: by extending the normalization to more than a single mean and variance, we detect modes of data on-the-fly, jointly normalizing samples that share common features. We demonstrate that our method outperforms BN and other widely used normalization techniques in several experiments, including single and multi-task datasets.
Original languageEnglish
Title of host publicationProceedings of the Seventh International Conference on Learning Representations (ICLR 2019)
Place of PublicationNew Orleans, Louisiana, USA
Number of pages12
Publication statusE-pub ahead of print - 9 May 2019
EventSeventh International Conference on Learning Representations - New Orleans, United States
Duration: 6 May 20199 May 2019


ConferenceSeventh International Conference on Learning Representations
Abbreviated titleICLR 2019
Country/TerritoryUnited States
CityNew Orleans
Internet address


  • Deep learning
  • expert models
  • Normalization
  • Computer vision


Dive into the research topics of 'Mode Normalization'. Together they form a unique fingerprint.

Cite this