Sparse Communication for Distributed Gradient Descent

Alham Aji, Kenneth Heafield

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We make distributed stochastic gradient descent faster by exchanging sparse up-dates instead of dense updates. Gradient updates are positively skewed as most updates are near zero, so we map the 99% smallest updates (by absolute value) to zero then exchange sparse matrices. This method can be combined with quantization to further improve the compression. We explore different configurations and apply them to neural machine translation and MNIST image classification tasks. Most configurations work on MNIST, where as different configurations reduce convergence rate on the more complex translation task. Our experiments show that we can achieve up to 49% speedup on MNIST and 22% on NMT without damaging the final accuracy or BLEU.
Original languageEnglish
Title of host publicationProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)
Place of PublicationCopenhagen, Denmark
PublisherAssociation for Computational Linguistics (ACL)
Pages440-445
Number of pages6
ISBN (Print)978-1-945626-83-8
DOIs
Publication statusPublished - 11 Sep 2017
EventEMNLP 2017: Conference on Empirical Methods in Natural Language Processing - Copenhagen, Denmark
Duration: 7 Sep 201711 Sep 2017
http://emnlp2017.net/index.html
http://emnlp2017.net/

Conference

ConferenceEMNLP 2017: Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2017
Country/TerritoryDenmark
CityCopenhagen
Period7/09/1711/09/17
Internet address

Fingerprint

Dive into the research topics of 'Sparse Communication for Distributed Gradient Descent'. Together they form a unique fingerprint.

Cite this