Optimizing Network Performance in Distributed Machine Learning

Luo Mai, Chuntao Hong, Paolo Costa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To cope with the ever growing availability of training data, there have been several proposals to scale machine learning computation beyond a single server and distribute it across a cluster. While this enables reducing the training time, the observed speed up is often limited by network bottlenecks.

To address this, we design MLNET, a host-based communication layer that aims to improve the network performance of distributed machine learning systems. This is achieved through a combination of traffic reduction techniques (to diminish network load in the core and at the edges) and traffic management (to reduce average training time). A key feature of MLNET is its compatibility with existing hardware and software infrastructure so it can be immediately deployed.

We describe the main techniques underpinning MLNET and show through simulation that the overall training time can be reduced by up to 78%. While preliminary, our results indicate the critical role played by the network and the benefits of introducing a new communication layer to increase the performance of distributed machine learning systems.
Original languageEnglish
Title of host publicationProceedings of the 7th USENIX Conference on Hot Topics in Cloud Computing
Place of PublicationUSA
PublisherUSENIX Association
Number of pages7
Publication statusPublished - 6 Jul 2015
Event7th USENIX Workshop on Hot Topics in Cloud Computing - Santa Clara, United States
Duration: 6 Jul 20157 Jul 2015
https://www.usenix.org/conference/hotcloud15

Publication series

NameHotCloud’15
PublisherUSENIX Association

Workshop

Workshop7th USENIX Workshop on Hot Topics in Cloud Computing
Abbreviated titleHotCloud '15
Country/TerritoryUnited States
CitySanta Clara
Period6/07/157/07/15
Internet address

Fingerprint

Dive into the research topics of 'Optimizing Network Performance in Distributed Machine Learning'. Together they form a unique fingerprint.

Cite this