Abstract
This paper investigates the automatic segmentation of meetings into a sequence of group actions or phases. Our work is based on a corpus of multiparty meetings collected in a meeting room instrumented with video cameras, lapel microphones and a microphone array. We have extracted a set of feature streams, in this case extracted from the audio data, based on speaker turns, prosody and a transcript of what was spoken. We have related these signals to the higher level semantic categories via a multistream statistical model based on dynamic Bayesian networks (DBNs). We report on a set of experiments in which different DBN architectures are compared, together with the different feature streams. The resultant system has an action error rate of 9%.
Original language | English |
---|---|
Title of host publication | 2004 IEEE 6th Workshop on Multimedia Signal Processing |
Subtitle of host publication | MMSP 2004 |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 167-170 |
ISBN (Print) | 0-7803-8578-0 |
DOIs | |
Publication status | Published - 2004 |
Event | IEEE 6th Workshop on Multimedia Signal Processing (MMSP 2004) - Siena, Italy Duration: 29 Sept 2004 → 1 Oct 2004 |
Workshop
Workshop | IEEE 6th Workshop on Multimedia Signal Processing (MMSP 2004) |
---|---|
Country/Territory | Italy |
City | Siena |
Period | 29/09/04 → 1/10/04 |