Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms

Siu On Chan, Ilias Diakonikolas, Rocco A Servedio, Xiaorui Sun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Let be an unknown and arbitrary probability distribution over . We consider the problem of \emph{density estimation}, in which a learning algorithm is given i.i.d. draws from and must (with high probability) output a hypothesis distribution that is close to . The main contribution of this paper is a highly efficient density estimation algorithm for learning using a variable-width histogram, i.e., a hypothesis distribution with a piecewise constant probability density function. In more detail, for any and , we give an algorithm that makes draws from , runs in time, and outputs a hypothesis distribution that is piecewise constant with pieces. With high probability the hypothesis satisfies , where denotes the total variation distance (statistical distance), is a universal constant, and is the smallest total variation distance between and any -piecewise constant distribution. The sample size and running time of our algorithm are both optimal up to logarithmic factors. The approximation factor'' that is present in our result is inherent in the problem, as we prove that no algorithm with sample size bounded in terms of and can achieve regardless of what kind of hypothesis distribution it uses.
Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 27
EditorsZ. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, K. Q. Weinberger
PublisherCurran Associates Inc
Pages1844-1852
Number of pages9
Publication statusPublished - 2014

Fingerprint

Dive into the research topics of 'Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms'. Together they form a unique fingerprint.

Cite this