Abstract
Recent research in language and vision has developed models for predicting and disambiguating verbs from images. Here, we ask whether the predictions made by such models correspond to human intuitions about visual verbs. We show that the image regions a verb prediction model identifies as salient for a given verb correlate with the regions fixated by human observers performing a verb classification task.
Original language | English |
---|---|
Title of host publication | The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
Place of Publication | New Orleans, Louisiana |
Publisher | Association for Computational Linguistics |
Pages | 758-763 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 30 Jun 2018 |
Event | 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Hyatt Regency New Orleans Hotel, New Orleans, United States Duration: 1 Jun 2018 → 6 Jun 2018 http://naacl2018.org/ |
Conference
Conference | 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
---|---|
Abbreviated title | NAACL HLT 2018 |
Country/Territory | United States |
City | New Orleans |
Period | 1/06/18 → 6/06/18 |
Internet address |