Interpretable text-guided image clustering via iterative search

Bingchen Zhao, Oisin Mac Aodha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Traditional clustering methods aim to group unlabeled data points based on their similarity to each other. However, clustering, in the absence of additional information, is an ill-posed problem as there may be many different, yet equally valid, ways to partition a dataset. Distinct users may want to use different criteria to form clusters in the same data, e.g. shape v.s. color. Recently introduced text-guided image clustering methods aim to address this ambiguity by allowing users to specify the criteria of interest using natural language instructions. This instruction provides the necessary context and control needed to obtain clusters that are more aligned with the users' intent. We propose a new text-guided clustering approach named ITGC that uses an iterative discovery process, guided by an unsupervised clustering objective, to generate interpretable visual concepts that better capture the criteria expressed in a user's instructions. We report superior performance compared to existing methods across a wide variety of image clustering and fine-grained classification benchmarks.
Original languageEnglish
Title of host publicationProceedings of the 36th British Machine Vision Conference
PublisherBMVA Press
Pages1-20
Number of pages20
Publication statusAccepted/In press - 25 Jul 2025
EventThe 36th British Machine Vision Conference - Cutlers' Hall, Sheffield, United Kingdom
Duration: 24 Nov 202527 Nov 2025
Conference number: 36
https://bmvc2025.bmva.org/

Conference

ConferenceThe 36th British Machine Vision Conference
Abbreviated titleBMVC 2025
Country/TerritoryUnited Kingdom
CitySheffield
Period24/11/2527/11/25
Internet address

Fingerprint

Dive into the research topics of 'Interpretable text-guided image clustering via iterative search'. Together they form a unique fingerprint.

Cite this