A model based approach towards practical blind enhancement of audio signals acquired in real acoustic environments.

Project Details

Description

Acoustic reverberation arises when an audio signal is radiated in a confined acoustic space due to reflections of sound waves from walls and other surfaces. The removal of reverberation – dereverberation – is required whenever an audio signal is acquired by a sensor placed away from the source. This is true both for humans with impaired hearing and also for computer-based signal processing algorithms. This research project uses statistical signal processing methods to design enhancement algorithms for reducing the effect of reverberation.

Reverberation can be physically reduced by incorporating acoustic dampeners into an environment. For example, the acoustical properties of dedicated teleconferencing rooms can be modified by including softer materials. However, for many applications such as hearing-aids, automatic speech recognition, and surveillance, it is not possible to physically make these alterations: it is necessary to manipulate the reverberant signals directly.

Signal processing is a methodology which can manipulate a digitised representation of real signals: it is a method that is extremely prevalent in modern electronic devices. The ubiquitousness of digital audio in broadcasting, domestic and multimedia applications, each offering crystal clear quality, has resulted in a heightened awareness of the achievable performance of applications involving audio signals. Nevertheless, there is less public awareness of the contribution that signal processing has made to the revolution in modern digital devices; often it is assumed the main contributors are increased storage space, computing speed, and battery life, whereas signal processing forms the algorithms that require this memory and computing power.

Blind speech dereverberation (BSD) is an important and challenging area for the signal processing community. This research project aimed to investigate various approaches for BSD of a single moving talker using just one microphone. The original proposal presented a set of specific models for investigation. Midway through the project we decided to address more fundamental questions regarding how to solve BSD, rather investigate particular models that are only applicable for specific scenarios.

We started by noting Western speech is composed of a number of sounds which are, generally, classified as either voiced or unvoiced. Voiced speech is characteristic of vowels and unvoiced speech characteristic of consonants. There are a number of existing algorithms for BSD which are optimised for a particular speech sound. For example, our Bayesian algorithm enhances unvoiced speech, while a competing algorithm based on harmonic analysis enhances voiced speech. We combined the Bayesian and harmonic analysis approaches into a hybrid algorithm, and also dealt with the issue of periods of silence in speech. The results of this research therefore include:

a) Demonstrating how existing BSD methods can be improved by classifying speech into voiced, unvoiced, and silent segments, and developing a hybrid BSD algorithm which accounts for all of these.

b) Our hybrid approach requires speech classification, which proves difficult for reverberant speech. We compare the performance of a number of common classification methods when applied to reverberant speech.

c) We began the development a fundamentally different approach to BSD which attempts to enhance features of the signal directly rather than simply minimising a squared error of the signal model.

d) Finally, this research grant helped facilitate the PI and his research students to investigate the original objectives of the project: primarily a Bayesian solution to BSD. The aims addressed include:

- a complete speech model that accounts for both voiced and unvoiced speech;
- a more realistic room acoustic model;
- system models that can account for varying source-sensor geometries;
- models that can be estimated using batch and sequential Monte Carlo methods.

Key findings

The key-findings of this research include:
a) demonstrating how existing blind speech dereverberation (BSD) methods can be improved by classifying speech into voiced, unvoiced, and silent segments, and developing a hybrid BSD algorithm which accounts for all of these;
b) a comparison of the performance of a number of common speech classification methods when applied to reverberant speech;
c) the development a fundamentally different approach to BSD which attempts to enhance features of the signal directly rather than simply minimizing an error criterion of the signal model.
StatusFinished
Effective start/end date1/09/0631/08/08

Funding

  • EPSRC: £124,180.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.