We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on hand-crafted norming data.
|Title of host publication||Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)|
|Place of Publication||Sofia, Bulgaria|
|Publisher||Association for Computational Linguistics|
|Number of pages||11|
|Publication status||Published - 1 Aug 2013|