Abstract
Visual relationship detection, i.e., discovering the interaction between pairs of objects in an image, plays a significant role in image understanding. However, most of recent works only consider visual features, ignoring the implicit effect of common sense. Motivated by the iterative visual reasoning in image recognition, we propose a novel model to take the advantage of common sense in the form of the knowledge graph in visual relationship detection, named Iterative Visual Relationship Detection with Commonsense Knowledge Graph (IVRDC). Our model consists of two modules: a feature module that predicts predicates by visual features and semantic features with a bi-directional RNN; and a commonsense knowledge module that constructs a specific commonsense knowledge graph for predicate prediction. After iteratively combining prediction from both modules, IVRDC updates the memory and commonsense knowledge graph. The final predictions are made by taking the result of each iteration into account with an attention mechanism. Our experiments on the Visual Relationship Detection (VRD) dataset and the Visual Genome (VG) dataset demonstrate that our proposed model is competitive.
Original language | English |
---|---|
Title of host publication | Semantic Technology |
Subtitle of host publication | 9th Joint International Conference, JIST 2019, Hangzhou, China, November 25–27, 2019, Proceedings |
Editors | Xin Wang, Francesca Alessandra Lisi, Guohui Xiao, Elena Botoeva |
Place of Publication | Cham |
Publisher | Springer |
Pages | 210-225 |
Number of pages | 16 |
ISBN (Electronic) | 978-3-030-41407-8 |
ISBN (Print) | 978-3-030-41406-1 |
DOIs | |
Publication status | Published - 14 Feb 2020 |
Event | The 9th Joint International Semantic Technology Conference - Hangzhou, China Duration: 25 Nov 2019 → 27 Nov 2019 http://jist2019.openkg.cn/ |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer, Cham |
Volume | 12032 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | The 9th Joint International Semantic Technology Conference |
---|---|
Abbreviated title | JIST 2019 |
Country/Territory | China |
City | Hangzhou |
Period | 25/11/19 → 27/11/19 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- Commonsense knowledge graph
- Visual relationship detection
- Visual Genome