Abstract
Deep neural networks (DNNs) have become a standard component in supervised ASR, used in both data-driven feature extraction and acoustic modelling. Supervision is typically obtained from a forced alignment that provides phone class targets, requiring transcriptions and pronunciations. We propose a novel unsupervised DNN-based feature extractor that can be trained without these resources in zeroresource settings. Using unsupervised term discovery, we find pairs of isolated word examples of the same unknown type; these provide weak top-down supervision. For each pair, dynamic programming is used to align the feature frames of the two words. Matching frames are presented as input-output pairs to a deep autoencoder (AE) neural network. Using this AE as feature extractor in a word discrimination task, we achieve 64% relative improvement over a previous stateof-the-art system, 57% improvement relative to a bottom-up trained deep AE, and come to within 23% of a supervised system.
Original language | English |
---|---|
Title of host publication | 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Place of Publication | Brisbane, QLD, Australia |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 5818-5822 |
Number of pages | 5 |
ISBN (Electronic) | 978-1-4673-6997-8 |
DOIs | |
Publication status | Published - 6 Aug 2015 |
Event | 40th IEEE International Conference on Acoustics, Speech and Signal Processing - Brisbane Convention & Exhibition Centre, Brisbane, Australia Duration: 19 Apr 2015 → 24 Apr 2015 |
Publication series
Name | |
---|---|
Publisher | IEEE |
ISSN (Print) | 1520-6149 |
ISSN (Electronic) | 2379-190X |
Conference
Conference | 40th IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Abbreviated title | ICASSP 2015 |
Country/Territory | Australia |
City | Brisbane |
Period | 19/04/15 → 24/04/15 |
Fingerprint
Dive into the research topics of 'Unsupervised neural network based feature extraction using weak top-down constraints'. Together they form a unique fingerprint.Profiles
-
Sharon Goldwater
- School of Informatics - Personal Chair of Computational Language Learning
- Institute of Language, Cognition and Computation
- Language, Interaction and Robotics
Person: Academic: Research Active