Learning k-Modal Distributions via Testing

Constantinos Daskalakis, Ilias Diakonikolas, Rocco A. Servedio

Research output: Contribution to journalArticlepeer-review

Abstract

A k-modal probability distribution over the discrete domain {1,...,n} is one whose histogram has at most k "peaks" and "valleys." Such distributions are natural generalizations of monotone (k=0) and unimodal (k=1) probability distributions, which have been intensively studied in probability theory and statistics.In this paper we consider the problem of learning (i.e., performing density estimation of) an unknown k-modal distribution with respect to the L1 distance. The learning algorithm is given access to independent samples drawn from an unknown k-modal distribution p, and it must output a hypothesis distribution pˆ such that with high probability the total variation distance between p and pˆ is at most ϵ. Our main goal is to obtain computationally efficient algorithms for this problem that use (close to) an information-theoretically optimal number of samples.We give an efficient algorithm for this problem that runs in time poly(k,log(n),1/ϵ). For k≤O~(logn), the number of samples used by our algorithm is very close (within an O~(log(1/ϵ)) factor) to being information-theoretically optimal. Prior to this work computationally efficient algorithms were known only for the cases k=0,1 (Birgé 1987, 1997).A novel feature of our approach is that our learning algorithm crucially uses a new algorithm for property testing of probability distributions as a key subroutine. The learning algorithm uses the property tester to efficiently decompose the k-modal distribution into k (near-)monotone distributions, which are easier to learn.
Original languageEnglish
Pages (from-to)535-570
Number of pages36
JournalTheory of Computing
Volume10
Issue number20
DOIs
Publication statusPublished - Dec 2014

Fingerprint

Dive into the research topics of 'Learning k-Modal Distributions via Testing'. Together they form a unique fingerprint.

Cite this