Making Queries Tractable on Big Data with Preprocessing

Wenfei Fan, Floris Geerts, Frank Neven

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

A query class is traditionally considered tractable if there exists a polynomial-time (PTIME) algorithm to answer its queries. When it comes to big data, however, PTIME algorithms often become infeasible in practice. A traditional
and effective approach to coping with this is to preprocess data off-line, so that queries in the class can be subsequently evaluated on the data efficiently. This paper aims to provide a formal foundation for this approach in terms of computational complexity. (1) We propose a set of -tractable queries, denoted by T0 Q, to characterize classes of queries that can be answered in parallel poly-logarithmic time (NC) after PTIME preprocessing. (2) We show that several natural query classes are -tractable and are feasible on big data. (3) We also study a set TQ of query classes that can be effectively converted to -tractable queries by re-factorizing its data and queries for preprocessing. We introduce a form of NC reductions to characterize such conversions. (4) We show that a natural query class is complete for TQ. (5) We also show that T0 Q ⊂ P unless P = NC, i.e., the set T0 Q of all -tractable queries is properly contained in the set P of all PTIME queries. Nonetheless, TQ = P, i.e., all PTIME query classes can be made -tractable via proper refactorizations. This work is a step towards understanding the tractability of queries in the context of big data.
Original languageEnglish
Pages (from-to)685-696
Number of pages12
JournalProceedings of the VLDB Endowment (PVLDB)
Issue number9
Publication statusPublished - 2013


Dive into the research topics of 'Making Queries Tractable on Big Data with Preprocessing'. Together they form a unique fingerprint.

Cite this