Projects per year
Abstract
Histograms are among the most popular structures for the succinct summarization of data in a variety of database applications. In this work, we provide fast and nearoptimal algorithms for approximating arbitrary one dimensional data distributions by histograms.
A khistogram is a piecewise constant function with k pieces. We consider the following natural problem, previously studied by Indyk, Levi, and Rubinfeld in PODS 2012: given samples from a distribution p over {1,...,n}, compute a k histogram that minimizes the l2distance from p, up to an additive ε. We design an algorithm for this problem that uses the informationtheoretically minimal sample size of m = O(1/ε2), runs in samplelinear time O(m), and outputs an O(k)histogram whose l2distance from p is at most O(optk) +ε, where optk is the minimum l2distance between p and any khistogram. Perhaps surprisingly, the sample size and running time of our algorithm are independent of the universe size.
We generalize our approach to obtain fast algorithms for multiscale histogram construction, as well as approximation by piecewise polynomial distributions. We experimentally demonstrate one to two orders of magnitude im rovement in terms of empirical running times over previous stateoftheart algorithms.
A khistogram is a piecewise constant function with k pieces. We consider the following natural problem, previously studied by Indyk, Levi, and Rubinfeld in PODS 2012: given samples from a distribution p over {1,...,n}, compute a k histogram that minimizes the l2distance from p, up to an additive ε. We design an algorithm for this problem that uses the informationtheoretically minimal sample size of m = O(1/ε2), runs in samplelinear time O(m), and outputs an O(k)histogram whose l2distance from p is at most O(optk) +ε, where optk is the minimum l2distance between p and any khistogram. Perhaps surprisingly, the sample size and running time of our algorithm are independent of the universe size.
We generalize our approach to obtain fast algorithms for multiscale histogram construction, as well as approximation by piecewise polynomial distributions. We experimentally demonstrate one to two orders of magnitude im rovement in terms of empirical running times over previous stateoftheart algorithms.
Original language  English 

Title of host publication  Proceedings of the 34th ACM SIGMODSIGACTSIGAI Symposium on Principles of Database Systems 
Publisher  Association for Computing Machinery (ACM) 
Pages  249263 
Number of pages  15 
ISBN (Electronic)  9781450327572 
DOIs  
Publication status  Published  20 May 2015 
Publication series
Name  PODS '15 

Fingerprint
Dive into the research topics of 'Fast and NearOptimal Algorithms for Approximating Distributions by Histograms'. Together they form a unique fingerprint.Projects
 1 Finished

Sublinear Algorithms for Approximating Probability Distribution
Diakonikolas, I.
1/09/14 → 31/08/15
Project: Research