On Scale Independence for Querying Big Data

Wenfei Fan, Floris Geerts, Leonid Libkin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To make query answering feasible in big datasets, practitioners have been looking into the notion of scale independence of queries. Intuitively, such queries require only a relatively small subset of the data, whose size is determined by the query and access methods rather than the size of the dataset itself. This paper aims to formalize this notion and study its properties. We start by defining what it means to be scale-independent, and provide matching upper and lower bounds for checking scale independence, for queries in various languages, and for combined and data complexity. Since the complexity turns out to be rather high, and since scale-independent queries cannot be captured syntactically, we develop sufficient conditions for scale independence. We formulate them based on access schemas, which combine indexing and constraints together with bounds on the sizes of retrieved data sets. We then study two variations of scale-independent query answering, inspired by existing practical systems. One concerns incremental query answering: we check when query answers can be maintained in response to updates scale-independently. The other explores scale-independent query rewriting using views.
Original languageEnglish
Title of host publicationProceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems
Place of PublicationNew York, NY, USA
PublisherACM
Pages51-62
Number of pages12
ISBN (Print)978-1-4503-2375-8
DOIs
Publication statusPublished - 2014

Keywords / Materials (for Non-textual outputs)

  • big data, query answering, scale independence

Fingerprint

Dive into the research topics of 'On Scale Independence for Querying Big Data'. Together they form a unique fingerprint.

Cite this