Projects per year
Abstract / Description of output
We design a new, fast algorithm for agnostically learning univariate probability distributions whose densities are well-approximated by piecewise polynomial functions. Let ƒ be the density function of an arbitrary univariate distribution, and suppose that ƒ is OPT-close in Li- distance to an unknown piecewise polynomial function with t interval pieces and degree d. For any γ > 0, our algorithm draws n = Õγ(t(d + 1)/ ∊2) samples from ƒ, runs in time Õ(n), and with probability at least 9/10 outputs an Ογ (t)-piecewise degree-d hypothesis H that is (3 + γ) · OPT + ∊ close to f. Our approximation factor almost matches the best known information-theoretic (but computationally inefficient) upper bound of 3.
Our general algorithm yields (nearly) sample- optimal and nearly-linear time estimators for a wide range of structured distribution families over both continuous and discrete domains in a unified way. For most of our applications, these are the first sample-optimal and nearly-linear time estimators in the literature. As a consequence, our work resolves the sample and computational complexities of a broad class of inference tasks via a single “meta-algorithm”. Moreover, we demonstrate that our algorithm performs very well in experiments.
Our algorithm consists of three levels: (i) At the top level, we employ an iterative greedy algorithm for finding a good partition of the real line into the pieces of a piecewise polynomial. (ii) For each piece, we show that the sub-problem of finding a good polynomial fit on the current interval can be solved efficiently with a separation oracle method. (iii) We reduce the task of finding a separating hyperplane to a combinatorial problem and design a nearly-linear algorithm for this problem. Combining these three procedures gives a density estimation algorithm with the claimed guarantees.
Original language | English |
---|---|
Title of host publication | Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms |
Place of Publication | Philadelphia, PA, USA |
Publisher | Society for Industrial and Applied Mathematics |
Pages | 1278-1289 |
Number of pages | 12 |
ISBN (Electronic) | 978-1-61197-478-2 |
DOIs | |
Publication status | Published - 19 Jan 2017 |
Event | Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms - Hotel Porta Fira, Barcelona, Spain Duration: 16 Jan 2017 → 19 Jan 2017 http://www.siam.org/meetings/da17/ |
Publication series
Name | SODA '17 |
---|---|
Publisher | Society for Industrial and Applied Mathematics |
Conference
Conference | Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms |
---|---|
Abbreviated title | SIAM 2017 |
Country/Territory | Spain |
City | Barcelona |
Period | 16/01/17 → 19/01/17 |
Internet address |
Fingerprint
Dive into the research topics of 'Sample-optimal Density Estimation in Nearly-linear Time'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Sublinear Algorithms for Approximating Probability Distribution
Diakonikolas, I.
1/09/14 → 31/08/15
Project: Research