Abstract
Traditional clustering methods aim to group unlabeled data points based on their similarity to each other. However, clustering, in the absence of additional information, is an ill-posed problem as there may be many different, yet equally valid, ways to partition a dataset. Distinct users may want to use different criteria to form clusters in the same data, e.g. shape v.s. color. Recently introduced text-guided image clustering methods aim to address this ambiguity by allowing users to specify the criteria of interest using natural language instructions. This instruction provides the necessary context and control needed to obtain clusters that are more aligned with the users' intent. We propose a new text-guided clustering approach named ITGC that uses an iterative discovery process, guided by an unsupervised clustering objective, to generate interpretable visual concepts that better capture the criteria expressed in a user's instructions. We report superior performance compared to existing methods across a wide variety of image clustering and fine-grained classification benchmarks.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 36th British Machine Vision Conference |
| Publisher | BMVA Press |
| Pages | 1-20 |
| Number of pages | 20 |
| Publication status | Accepted/In press - 25 Jul 2025 |
| Event | The 36th British Machine Vision Conference - Cutlers' Hall, Sheffield, United Kingdom Duration: 24 Nov 2025 → 27 Nov 2025 Conference number: 36 https://bmvc2025.bmva.org/ |
Conference
| Conference | The 36th British Machine Vision Conference |
|---|---|
| Abbreviated title | BMVC 2025 |
| Country/Territory | United Kingdom |
| City | Sheffield |
| Period | 24/11/25 → 27/11/25 |
| Internet address |