Projects per year
Abstract / Description of output
Unsupervised segmentation and clustering of unlabelled speech are core problems in zero-resource speech processing. Most approaches lie at methodological extremes: some use probabilistic Bayesian models with convergence guarantees, while others opt for more efficient heuristic techniques. Despite competitive performance in previous work, the full Bayesian approach is difficult to scale to large speech corpora. We introduce an approximation to a recent Bayesian model that still has a clear objective function but improves efficiency by using hard clustering and segmentation rather than full Bayesian inference. Like its Bayesian counterpart, this embedded segmental K-means model (ES-KMeans) represents arbitrary-length word segments as fixed-dimensional acoustic word embeddings. We first compare ES-KMeans to previous approaches on common English and Xitsonga data sets (5 and 2.5 hours of speech): ES-KMeans outperforms a leading heuristic method in word segmentation, giving similar scores to the Bayesian model while being 5 times faster with fewer hyperparameters. However, its clusters are less pure than those of the other models. We then show that ES-KMeans scales to larger corpora by applying it to the 5 languages of the Zero Resource Speech Challenge 2017 (up to 45 hours), where it performs competitively compared to the challenge baseline.
Original language | English |
---|---|
Title of host publication | 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 719-726 |
Number of pages | 8 |
ISBN (Electronic) | 978-1-5090-4788-8, 978-1-5090-4787-1 |
ISBN (Print) | 978-1-5090-4789-5 |
DOIs | |
Publication status | Published - 25 Jan 2018 |
Event | 2017 IEEE Automatic Speech Recognition and Understanding Workshop - Okinawa, Japan Duration: 16 Dec 2017 → 20 Dec 2017 https://asru2017.org/ |
Conference
Conference | 2017 IEEE Automatic Speech Recognition and Understanding Workshop |
---|---|
Abbreviated title | ASRU 2017 |
Country/Territory | Japan |
City | Okinawa |
Period | 16/12/17 → 20/12/17 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- Bayes methods
- linguistics
- natural language processing
- pattern clustering
- speech processing
- speech recognition
- unsupervised learning
- competitive performance
- Bayesian approach
- speech corpora
- hard clustering
- Bayesian inference
- fixed-dimensional acoustic word embeddings
- word segmentation
- ES-KMeans scales
- zero-resource speech processing
- convergence guarantees
- heuristic techniques
- probabilistic Bayesian models
- Xitsonga data sets
- English data sets
- time 5.0 hour
- time 2.5 hour
- time 45.0 hour
- Speech
- Clustering algorithms
- Acoustics
- Standards
- Speech processing
- Probabilistic logic
- Zero-resource speech processing
- language acquisition
Fingerprint
Dive into the research topics of 'An embedded segmental K-means model for unsupervised segmentation and clustering of speech'. Together they form a unique fingerprint.Projects
- 1 Finished