Pigeonring: A Principle for Faster Thresholded Similarity Search

Jianbin Qin, Chuan Xiao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The pigeonhole principle states that if n items are contained in m boxes, then at least one box has no more than n/m items. It is utilized to solve many data management problems, especially for thresholded similarity searches. Despite many pigeonhole principle-based solutions proposed in the last few decades, the condition stated by the principle is weak. It only constrains the number of items in a single box. By organizing the boxes in a ring, we propose a new principle, called the pigeonring principle, which constrains the number of items in multiple boxes and yields stronger conditions. To utilize the new principle, we focus on problems defined in the form of identifying data objects whose similarities or distances to the query is constrained by a threshold. Many solutions to these problems utilize the pigeonhole principle to find candidates that satisfy a filtering condition. By the new principle, stronger filtering conditions can be established. We show that the pigeonhole principle is a special case of the new principle. This suggests that all the pigeonhole principle-based solutions are possible to be accelerated by the new principle. A universal filtering framework is introduced to encompass the solutions to these problems based on the new principle. Besides, we discuss how to quickly find candidates specified by the new principle. The implementation requires only minor modifications on top of existing pigeonhole principle-based algorithms. Experimental results on real datasets demonstrate the applicability of the new principle as well as the superior performance of the algorithms based on the new principle
Original languageEnglish
Title of host publicationProceedings of 45th International Conference on Very Large Data Bases
Place of PublicationLos Angeles, California, USA
PublisherVery Large Data Base Endowment Inc.
Pages28-42
Number of pages15
Volume12
DOIs
Publication statusE-pub ahead of print - Sep 2018
Event45th International Conference on Very Large Data Bases - Los Angeles, United States
Duration: 26 Aug 201930 Aug 2019
http://vldb.org/2019/

Publication series

NameProceedings of the VLDB Endowment
PublisherVLDB Endowment
Number1
Volume12
ISSN (Electronic)2150-8097

Conference

Conference45th International Conference on Very Large Data Bases
Abbreviated titleVLDB 2019
Country/TerritoryUnited States
CityLos Angeles
Period26/08/1930/08/19
Internet address

Fingerprint

Dive into the research topics of 'Pigeonring: A Principle for Faster Thresholded Similarity Search'. Together they form a unique fingerprint.

Cite this