Managing massive graphs in relational DBMS

Ruiwen Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Massive graphs emerge in many real-world applications. Practitioners often find relational databases are inefficient in graph data management. In this paper, we investigate the efficiency issue by analyzing both I/O and CPU costs. First, we find the storage of a graph in relational DBMS violates the locality principle: graph queries will always reference neighbors; however, the data locations of neighbors are almost random. To solve this problem, we introduce partitioned graph storage as a new database design option. It combines database partitioning with available graph-partitioning algorithms to restructure the storage such that neighbors are located close to each other. Second, we find graph queries expressed with SQL introduce unnecessary overheads. To overcome the CPU costs, we propose a new storage access method, which we call graph scan, to retrieve neighbors in one single operation. We show experimentally that partitioned graph storage and graph scan can significantly reduce I/O and CPU costs. We conclude that a relational DBMS could be a good graph store, as long as the storage respects the locality principle and SQL overheads are eliminated.
Original languageEnglish
Title of host publicationProceedings of the 2013 IEEE International Conference on Big Data, 6-9 October 2013, Santa Clara, CA, USA
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages1-8
Number of pages8
DOIs
Publication statusPublished - 2013

Fingerprint

Dive into the research topics of 'Managing massive graphs in relational DBMS'. Together they form a unique fingerprint.

Cite this