Massive graphs emerge in many real-world applications. Practitioners often find relational databases are inefficient in graph data management. In this paper, we investigate the efficiency issue by analyzing both I/O and CPU costs. First, we find the storage of a graph in relational DBMS violates the locality principle: graph queries will always reference neighbors; however, the data locations of neighbors are almost random. To solve this problem, we introduce partitioned graph storage as a new database design option. It combines database partitioning with available graph-partitioning algorithms to restructure the storage such that neighbors are located close to each other. Second, we find graph queries expressed with SQL introduce unnecessary overheads. To overcome the CPU costs, we propose a new storage access method, which we call graph scan, to retrieve neighbors in one single operation. We show experimentally that partitioned graph storage and graph scan can significantly reduce I/O and CPU costs. We conclude that a relational DBMS could be a good graph store, as long as the storage respects the locality principle and SQL overheads are eliminated.
|Title of host publication||Proceedings of the 2013 IEEE International Conference on Big Data, 6-9 October 2013, Santa Clara, CA, USA|
|Publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|Number of pages||8|
|Publication status||Published - 2013|