Abstract
We tackle the multi-party speech recovery problem through modeling the
acoustic of the reverberant chambers. Our approach exploits structured
sparsity models to perform room modeling and speech recovery. We propose
a scheme for characterizing the room acoustic from the unknown competing
speech sources relying on localization of the early images of the
speakers by sparse approximation of the spatial spectra of the virtual
sources in a free-space model. The images are then clustered exploiting
the low-rank structure of the spectro-temporal components belonging to
each source. This enables us to identify the early support of the room
impulse response function and its unique map to the room geometry. To
further tackle the ambiguity of the reflection ratios, we propose a
novel formulation of the reverberation model and estimate the absorption
coefficients through a convex optimization exploiting joint sparsity
model formulated upon spatio-spectral sparsity of concurrent speech
representation. The acoustic parameters are then incorporated for
separating individual speech signals through either structured sparse
recovery or inverse filtering the acoustic channels. The experiments
conducted on real data recordings demonstrate the effectiveness of the
proposed approach for multi-party speech recovery and recognition.
Original language | English |
---|---|
Publication status | Published - 2012 |
Keywords / Materials (for Non-textual outputs)
- Computer Science - Learning
- Computer Science - Sound