TY - JOUR
T1 - Distributed Graph Simulation: Impossibility and Possibility
AU - Fan, Wenfei
AU - Wang, Xin
AU - Wu, Yinghui
AU - Deng, Dong
PY - 2014
Y1 - 2014
N2 - This paper studies fundamental problems for distributed graph simulation. Given a pattern query Q and a graph G that is fragmented and distributed, a graph simulation algorithm A is to compute the matches Q(G) of Q in G. We say that A is parallel scalable in (a) response time if its parallel computational cost is determined by the largest fragment Fm of G and the size |Q| of query Q, and (b) data shipment if its total amount of data shipped is determined by |Q| and the number of fragments of G, independent of the size of graph G. (1) We prove an impossibility theorem: there exists no distributed graph simulation algorithm that is parallel scalable in either response time or data shipment. (2)However, we show that distributed graph simulation is partition bounded, i.e., its response time depends only on |Q|,|Fm| and the number |Vf | of nodes in G with edges across different fragments; and its data shipment depends on |Q| and the number |Ef | of crossing edges only. We provide the first algorithms with these performance guarantees. (3) We also identify special cases of patterns and graphs when parallel scalability is possible. (4) We experimentally verify the scalability and efficiency of our algorithms.
AB - This paper studies fundamental problems for distributed graph simulation. Given a pattern query Q and a graph G that is fragmented and distributed, a graph simulation algorithm A is to compute the matches Q(G) of Q in G. We say that A is parallel scalable in (a) response time if its parallel computational cost is determined by the largest fragment Fm of G and the size |Q| of query Q, and (b) data shipment if its total amount of data shipped is determined by |Q| and the number of fragments of G, independent of the size of graph G. (1) We prove an impossibility theorem: there exists no distributed graph simulation algorithm that is parallel scalable in either response time or data shipment. (2)However, we show that distributed graph simulation is partition bounded, i.e., its response time depends only on |Q|,|Fm| and the number |Vf | of nodes in G with edges across different fragments; and its data shipment depends on |Q| and the number |Ef | of crossing edges only. We provide the first algorithms with these performance guarantees. (3) We also identify special cases of patterns and graphs when parallel scalability is possible. (4) We experimentally verify the scalability and efficiency of our algorithms.
M3 - Article
VL - 7
SP - 1083
EP - 1094
JO - Proceedings of the VLDB Endowment (PVLDB)
JF - Proceedings of the VLDB Endowment (PVLDB)
SN - 2150-8097
IS - 12
ER -