Abstract
The objective of this work is to determine if people are interacting in TV video by detecting whether they are looking at each other or not. We determine both the temporal period of the interaction and also spatially localize the relevant people. We make the following three contributions:
(i) head pose estimation in unconstrained scenarios (TV video) using Gaussian Process regression;
(ii) propose and evaluate several methods for assessing whether and when pairs of people are looking at each other in a video shot;
and
(iii) introduce new ground truth annotation for this task, extending the TV Human Interactions Dataset [22]. The peformance of the methods is evaluated on this dataset, which consists of 300 video clips extracted from TV shows. despite the variety and difficulty of this video material, our best method obtains an average precision of 86: 2%.
(i) head pose estimation in unconstrained scenarios (TV video) using Gaussian Process regression;
(ii) propose and evaluate several methods for assessing whether and when pairs of people are looking at each other in a video shot;
and
(iii) introduce new ground truth annotation for this task, extending the TV Human Interactions Dataset [22]. The peformance of the methods is evaluated on this dataset, which consists of 300 video clips extracted from TV shows. despite the variety and difficulty of this video material, our best method obtains an average precision of 86: 2%.
Original language | English |
---|---|
Title of host publication | Proceedings of the British Machine Vision Conference (BMVC) |
Subtitle of host publication | Dundee, September 2011 |
Publisher | BMVA Press |
Pages | 22.1-22.12 |
Number of pages | 12 |
ISBN (Print) | 1-901725-43-X |
DOIs | |
Publication status | Published - 2011 |