Bag-of-visual-words (BOVW) based image representation has received intense attention in recent years and has improved content based image retrieval (CBIR) significantly. BOVW does not consider the spatial correlation between visual words in natural images and thus biases the generated visual words towards noise when the corresponding visual features are not stable. In this paper, we construct a visual word co-occurrence table by exploring visual word co-occurrence extracted from small affine-invariant regions in a large collection of natural images. Based on this visual word co-occurrence table, we first present a novel high-order predictor to accelerate the generation of neighboring visual words. A co-occurrence matrix is introduced to refine the similarity measure for image ranking. Like the inverse document frequency (idf), it down-weights the contribution of the words that are less discriminative because of frequent co-occurrence. We conduct experiments on Oxford and Paris Building datasets, in which the ImageNet dataset is used to implement a large scale evaluation. Thorough experimental results suggest that our method outperforms the state-of-the-art, especially when the vocabulary size is comparatively small. In addition, our method is not much more costly than the BOVW model.
|Title of host publication||MM '12 Proceedings of the 20th ACM international conference on Multimedia|
|Place of Publication||New York, NY, USA|
|Number of pages||10|
|Publication status||Published - 2012|
- bovw, co-occurrence matrix, high-order predictor