Using the Voice Spectrum for Improved Tracking of People in a Joint Audio-video Scheme

Eleonora D'Arca, Neil Robertson, James Hopgood

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we present a new solution to the problem of speaker tracking among people where occlusions occur (disappearance and non-speaking). In a normal conversation between two or more people, we learn speaker mel-cepstral coefficients (MFCC) and incorporate this information into a sequential Bayesian audio-video position tracker. The joint video-to-audio data association step is thus improved and we achieve robust person recognition which in turn aids tracking performance. We provide comprehensive evaluation via simulations and real data quoting tracking accuracy, precision and diarisation error rate (DER) compared to ground truth. For simulate and real experiments in an open space the trajectory tracking performance increases by 20% measured against ground truth using our approach. As a further enhancement versus the state-of-the-art, speaker identity recognition at a distance is improved by 20% by exploiting audio-video localisation cues.
Original languageEnglish
Title of host publication2013 IEEE International Conference on Acoustics, Speech and Signal Processing
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
DOIs
Publication statusPublished - 21 Oct 2013
Event38th IEEE International Conference on Acoustics, Speech, and Signal Processing - Vancouver, Canada
Duration: 26 May 201331 May 2013
https://www2.securecms.com/ICASSP2013/default.asp

Conference

Conference38th IEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2013
Country/TerritoryCanada
CityVancouver
Period26/05/1331/05/13
Internet address

Fingerprint

Dive into the research topics of 'Using the Voice Spectrum for Improved Tracking of People in a Joint Audio-video Scheme'. Together they form a unique fingerprint.

Cite this