Asymmetric signature schemes for efficient exact edit similarity query processing

Jianbin Qin, Wei Wang, Chuan Xiao, Yifei Lu, Xuemin Lin, Haixun Wang

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Given a query string Q, an edit similarity search finds all strings in a database whose edit distance with Q is no more than a given threshold τ . Most existing methods answering edit similarity queries employ schemes to generate string subsequences as signatures and generate candidates by set overlap queries on query and data signatures.
In this article, we show that for any such signature scheme, the lower bound of the minimum number of signatures is τ + 1, which is lower than what is achieved by existing methods. We then propose several asymmetric signature schemes, that is, extracting different numbers of signatures for the data and query strings, which achieve this lower bound. A basic asymmetric scheme is first established on the basis of matching q-chunks and q-grams between two strings. Two efficient query processing algorithms (IndexGram and IndexChunk) are developed on top of this scheme. We also propose novel candidate pruning methods to further improve the efficiency. We then generalize the basic scheme by incorporating novel ideas of floating q-chunks, optimal selection of q-chunks, and reducing the number of signatures using global ordering. As a result, the Super and Turbo families of schemes are developed together with their corresponding query processing algorithms. We have conducted a comprehensive experimental study using the six asymmetric algorithms and nine previous state-of-the-art algorithms. The experiment results clearly showcase the efficiency of our methods and demonstrate space and time characteristics of our proposed algorithms.
Original languageEnglish
Article number16
Pages (from-to)16:1-16:44
Number of pages44
JournalACM Transactions on Database Systems
Issue number3
Publication statusPublished - 1 Aug 2013


Dive into the research topics of 'Asymmetric signature schemes for efficient exact edit similarity query processing'. Together they form a unique fingerprint.
  • Efficient exact edit similarity query processing with the asymmetric signature scheme

    Qin, J., Wang, W., Lu, Y., Xiao, C. & Lin, X., 12 Jun 2011, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011. Athens, Greece: ACM, p. 1033-1044 12 p.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cite this