TY - JOUR
T1 - Diversified Top-k Graph Pattern Matching
AU - Fan, Wenfei
AU - Wang, Xin
AU - Wu, Yinghui
PY - 2013
Y1 - 2013
N2 - Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q;G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expensive on large real-life social graphs. Moreover, in practice many social queries are to find matches of a specific pattern node, rather than the entire M(Q;G). This paper studies top-k graph pattern matching. (1) We revise graph pattern matching defined in terms of simulation, by supporting a designated output node uo. Given G and Q, it is to find those nodes in M(Q;G) that match uo, instead of the large set M(Q;G). (2)We study two classes of functions for ranking the matches: relevance functions r() based on, e.g., social impact, and distance functions d() to cover diverse elements. (3) We develop two algorithms for computing top-k matches of uo based on r(), with the early termination property, i.e., they find top-k matches without computing the entire M(Q;G). (4)We also study diversifiedtop-k matching, a bi-criteria optimization problem based on both r() and d(). We show that its decision problem is NP-complete. Nonetheless, we provide an approximation algorithm with performance guarantees and a heuristic one with the early termination property. (5) Using real-life and synthetic data, we experimentally verify that our (diversified) top-k matching algorithms are effective, and outperform traditional matching algorithms in efficiency.
AB - Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q;G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expensive on large real-life social graphs. Moreover, in practice many social queries are to find matches of a specific pattern node, rather than the entire M(Q;G). This paper studies top-k graph pattern matching. (1) We revise graph pattern matching defined in terms of simulation, by supporting a designated output node uo. Given G and Q, it is to find those nodes in M(Q;G) that match uo, instead of the large set M(Q;G). (2)We study two classes of functions for ranking the matches: relevance functions r() based on, e.g., social impact, and distance functions d() to cover diverse elements. (3) We develop two algorithms for computing top-k matches of uo based on r(), with the early termination property, i.e., they find top-k matches without computing the entire M(Q;G). (4)We also study diversifiedtop-k matching, a bi-criteria optimization problem based on both r() and d(). We show that its decision problem is NP-complete. Nonetheless, we provide an approximation algorithm with performance guarantees and a heuristic one with the early termination property. (5) Using real-life and synthetic data, we experimentally verify that our (diversified) top-k matching algorithms are effective, and outperform traditional matching algorithms in efficiency.
M3 - Article
VL - 6
SP - 1510
EP - 1521
JO - Proceedings of the VLDB Endowment (PVLDB)
JF - Proceedings of the VLDB Endowment (PVLDB)
SN - 2150-8097
IS - 13
ER -