## Abstract

We propose a randomized first order optimization method—SEGA (SkEtched

GrAdient)—which progressively throughout its iterations builds a variancereduced

estimate of the gradient from random linear measurements (sketches) of the gradient. In each iteration, SEGA updates the current estimate of the gradient

through a sketch-and-project operation using the information provided by the

latest sketch, and this is subsequently used to compute an unbiased estimate of

the true gradient through a random relaxation procedure. This unbiased estimate

is then used to perform a gradient step. Unlike standard subspace descent methods, such as coordinate descent, SEGA can be used for optimization problems with a non-separable proximal term. We provide a general convergence analysis and prove linear convergence for strongly convex objectives. In the special case of coordinate sketches, SEGA can be enhanced with various techniques such as importance sampling, minibatching and acceleration, and its rate is up to a small constant factor identical to the best-known rate of coordinate descent.

GrAdient)—which progressively throughout its iterations builds a variancereduced

estimate of the gradient from random linear measurements (sketches) of the gradient. In each iteration, SEGA updates the current estimate of the gradient

through a sketch-and-project operation using the information provided by the

latest sketch, and this is subsequently used to compute an unbiased estimate of

the true gradient through a random relaxation procedure. This unbiased estimate

is then used to perform a gradient step. Unlike standard subspace descent methods, such as coordinate descent, SEGA can be used for optimization problems with a non-separable proximal term. We provide a general convergence analysis and prove linear convergence for strongly convex objectives. In the special case of coordinate sketches, SEGA can be enhanced with various techniques such as importance sampling, minibatching and acceleration, and its rate is up to a small constant factor identical to the best-known rate of coordinate descent.

Original language | English |
---|---|

Number of pages | 12 |

Publication status | Accepted/In press - 5 Sep 2018 |

Event | Thirty-second Conference on Neural Information Processing Systems - Montreal, Canada Duration: 3 Dec 2018 → 8 Dec 2018 https://nips.cc/ |

### Conference

Conference | Thirty-second Conference on Neural Information Processing Systems |
---|---|

Abbreviated title | NIPS 2018 |

Country | Canada |

City | Montreal |

Period | 3/12/18 → 8/12/18 |

Internet address |