We give a computationally efficient semi-agnostic algorithm for learning univariate probability distributions that are well approximated by piecewise polynomial density functions. Let p be an arbitrary distribution over an interval I, and suppose that p is τ-close (in total variation distance) to an unknown probability distribution q that is defined by an unknown partition of I into t intervals and t unknown degree d polynomials specifying q over each of the intervals. We give an algorithm that draws Õ(t(d + 1)/ε2) samples from p, runs in time poly(t, d + 1, 1/ε), and with high probability outputs a piecewise polynomial hypothesis distribution h that is (14τ + ε)-close to p in total variation distance. Our algorithm combines tools from real approximation theory, uniform convergence, linear programming, and dynamic programming. Its sample complexity is simultaneously near optimal in all three parameters t, d and ε; we show that even for τ = 0, any algorithm that learns an unknown t-piecewise degree-d probability distribution over I to accuracy ε must use [EQUATION] samples from the distribution, regardless of its running time.
We apply this general algorithm to obtain a wide range of results for many natural density estimation problems over both continuous and discrete domains. These include state-of-the-art results for learning mixtures of log-concave distributions; mixtures of t-modal distributions; mixtures of Monotone Hazard Rate distributions; mixtures of Poisson Binomial Distributions; mixtures of Gaussians; and mixtures of k-monotone densities. Our general technique gives improved results, with provably optimal sample complexities (up to logarithmic factors) in all parameters in most cases, for all these problems via a single unified algorithm.
|Title of host publication||Proceedings of the 46th Annual ACM Symposium on Theory of Computing|
|Place of Publication||New York, NY, USA|
|Number of pages||10|
|Publication status||Published - 2014|
- computational learning theory
- unsupervised learning
- learning distributions