Sample-optimal Density Estimation in Nearly-linear Time

Jayadev Acharya, Ilias Diakonikolas, Jerry Li, Ludwig Schmidt

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We design a new, fast algorithm for agnostically learning univariate probability distributions whose densities are well-approximated by piecewise polynomial functions. Let ƒ be the density function of an arbitrary univariate distribution, and suppose that ƒ is OPT-close in Li- distance to an unknown piecewise polynomial function with t interval pieces and degree d. For any γ > 0, our algorithm draws n = Õγ(t(d + 1)/ ∊2) samples from ƒ, runs in time Õ(n), and with probability at least 9/10 outputs an Ογ (t)-piecewise degree-d hypothesis H that is (3 + γ) · OPT + ∊ close to f. Our approximation factor almost matches the best known information-theoretic (but computationally inefficient) upper bound of 3. Our general algorithm yields (nearly) sample- optimal and nearly-linear time estimators for a wide range of structured distribution families over both continuous and discrete domains in a unified way. For most of our applications, these are the first sample-optimal and nearly-linear time estimators in the literature. As a consequence, our work resolves the sample and computational complexities of a broad class of inference tasks via a single “meta-algorithm”. Moreover, we demonstrate that our algorithm performs very well in experiments. Our algorithm consists of three levels: (i) At the top level, we employ an iterative greedy algorithm for finding a good partition of the real line into the pieces of a piecewise polynomial. (ii) For each piece, we show that the sub-problem of finding a good polynomial fit on the current interval can be solved efficiently with a separation oracle method. (iii) We reduce the task of finding a separating hyperplane to a combinatorial problem and design a nearly-linear algorithm for this problem. Combining these three procedures gives a density estimation algorithm with the claimed guarantees.
Original languageEnglish
Title of host publicationProceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms
Place of PublicationPhiladelphia, PA, USA
PublisherSociety for Industrial and Applied Mathematics
Pages1278-1289
Number of pages12
ISBN (Electronic)978-1-61197-478-2
DOIs
Publication statusPublished - 19 Jan 2017
EventTwenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms - Hotel Porta Fira, Barcelona, Spain
Duration: 16 Jan 201719 Jan 2017
http://www.siam.org/meetings/da17/

Publication series

NameSODA '17
PublisherSociety for Industrial and Applied Mathematics

Conference

ConferenceTwenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms
Abbreviated titleSIAM 2017
Country/TerritorySpain
CityBarcelona
Period16/01/1719/01/17
Internet address

Fingerprint

Dive into the research topics of 'Sample-optimal Density Estimation in Nearly-linear Time'. Together they form a unique fingerprint.

Cite this