Semantic Regularisation for Recurrent Image Annotation

F. Liu, T. Xiang, Timothy Hospedales, W. Yang, C. Sun

Research output: Chapter in Book/Report/Conference proceedingConference contribution


The “CNN-RNN” design pattern is increasingly widely applied in a variety of image annotation tasks including multi-label classification and captioning. Existing models use the weakly semantic CNN hidden layer or its transform
as the image embedding that provides the interface between the CNN and RNN. This leaves the RNN overstretched with two jobs: predicting the visual concepts and modelling their correlations for generating structured annotation output.
Importantly this makes the end-to-end training of the CNN and RNN slow and ineffective due to the difficulty of back propagating gradients through the RNN to train the CNN. We propose a simple modification to the design pattern that
makes learning more effective and efficient. Specifically, we propose to use a semantically regularised embedding layer as the interface between the CNN and RNN. Regularising the interface can partially or completely decouple the learning
problems, allowing each to be more effectively trained and jointly training much more efficient. Extensive experiments show that state-of-the art performance is achieved on multi-label classification as well as image captioning.
Original languageEnglish
Title of host publicationComputer Vision and Pattern Recognition (CVPR 2017)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages12
ISBN (Electronic)978-1-5386-0457-1
ISBN (Print)978-1-5386-0458-8
Publication statusPublished - 9 Nov 2017
EventProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu 2017. - Hawaii, Honolulu, United States
Duration: 21 Jul 201726 Jul 2017

Publication series

ISSN (Print)1063-6919


ConferenceProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu 2017.
CountryUnited States
Internet address


Dive into the research topics of 'Semantic Regularisation for Recurrent Image Annotation'. Together they form a unique fingerprint.

Cite this