Projects per year
Abstract
We design a new, fast algorithm for agnostically learning univariate probability distributions whose densities are wellapproximated by piecewise polynomial functions. Let ƒ be the density function of an arbitrary univariate distribution, and suppose that ƒ is OPTclose in Li distance to an unknown piecewise polynomial function with t interval pieces and degree d. For any γ > 0, our algorithm draws n = Õγ(t(d + 1)/ ∊2) samples from ƒ, runs in time Õ(n), and with probability at least 9/10 outputs an Ογ (t)piecewise degreed hypothesis H that is (3 + γ) · OPT + ∊ close to f. Our approximation factor almost matches the best known informationtheoretic (but computationally inefficient) upper bound of 3.
Our general algorithm yields (nearly) sample optimal and nearlylinear time estimators for a wide range of structured distribution families over both continuous and discrete domains in a unified way. For most of our applications, these are the first sampleoptimal and nearlylinear time estimators in the literature. As a consequence, our work resolves the sample and computational complexities of a broad class of inference tasks via a single “metaalgorithm”. Moreover, we demonstrate that our algorithm performs very well in experiments.
Our algorithm consists of three levels: (i) At the top level, we employ an iterative greedy algorithm for finding a good partition of the real line into the pieces of a piecewise polynomial. (ii) For each piece, we show that the subproblem of finding a good polynomial fit on the current interval can be solved efficiently with a separation oracle method. (iii) We reduce the task of finding a separating hyperplane to a combinatorial problem and design a nearlylinear algorithm for this problem. Combining these three procedures gives a density estimation algorithm with the claimed guarantees.
Original language  English 

Title of host publication  Proceedings of the TwentyEighth Annual ACMSIAM Symposium on Discrete Algorithms 
Place of Publication  Philadelphia, PA, USA 
Publisher  Society for Industrial and Applied Mathematics 
Pages  12781289 
Number of pages  12 
ISBN (Electronic)  9781611974782 
DOIs  
Publication status  Published  19 Jan 2017 
Event  TwentyEighth Annual ACMSIAM Symposium on Discrete Algorithms  Hotel Porta Fira, Barcelona, Spain Duration: 16 Jan 2017 → 19 Jan 2017 http://www.siam.org/meetings/da17/ 
Publication series
Name  SODA '17 

Publisher  Society for Industrial and Applied Mathematics 
Conference
Conference  TwentyEighth Annual ACMSIAM Symposium on Discrete Algorithms 

Abbreviated title  SIAM 2017 
Country/Territory  Spain 
City  Barcelona 
Period  16/01/17 → 19/01/17 
Internet address 
Fingerprint
Dive into the research topics of 'Sampleoptimal Density Estimation in Nearlylinear Time'. Together they form a unique fingerprint.Projects
 1 Finished

Sublinear Algorithms for Approximating Probability Distribution
Diakonikolas, I.
1/09/14 → 31/08/15
Project: Research