Unsupervised neural network based feature extraction using weak top-down constraints

Herman Kamper, Micha Elsner, Aren Jansen, Sharon Goldwater

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep neural networks (DNNs) have become a standard component in supervised ASR, used in both data-driven feature extraction and acoustic modelling. Supervision is typically obtained from a forced alignment that provides phone class targets, requiring transcriptions and pronunciations. We propose a novel unsupervised DNN-based feature extractor that can be trained without these resources in zeroresource settings. Using unsupervised term discovery, we find pairs of isolated word examples of the same unknown type; these provide weak top-down supervision. For each pair, dynamic programming is used to align the feature frames of the two words. Matching frames are presented as input-output pairs to a deep autoencoder (AE) neural network. Using this AE as feature extractor in a word discrimination task, we achieve 64% relative improvement over a previous stateof-the-art system, 57% improvement relative to a bottom-up trained deep AE, and come to within 23% of a supervised system.
Original languageEnglish
Title of host publication2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Place of PublicationBrisbane, QLD, Australia
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages5818-5822
Number of pages5
ISBN (Electronic)978-1-4673-6997-8
DOIs
Publication statusPublished - 6 Aug 2015
Event40th IEEE International Conference on Acoustics, Speech and Signal Processing - Brisbane Convention & Exhibition Centre, Brisbane, Australia
Duration: 19 Apr 201524 Apr 2015

Publication series

Name
PublisherIEEE
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

Conference40th IEEE International Conference on Acoustics, Speech and Signal Processing
Abbreviated titleICASSP 2015
Country/TerritoryAustralia
CityBrisbane
Period19/04/1524/04/15

Fingerprint

Dive into the research topics of 'Unsupervised neural network based feature extraction using weak top-down constraints'. Together they form a unique fingerprint.

Cite this