Abstract / Description of output
Unsupervised speech processing methods are essential for applications ranging from zero-resource speech technology to modelling child language acquisition. One challenging problem is discovering the word inventory of the language: the
lexicon. Lexical clustering is the task of grouping unlabelled acoustic word tokens according to type. We propose a novel lexical clustering model: variable-length word segments are embedded in a fixed-dimensional acoustic space in which clustering is then performed. We evaluate several clustering algorithms and find that the best methods produce clusters with wide variation in sizes, as observed in natural language. The best probabilistic approach is an infinite Gaussian mixture model (IGMM), which automatically chooses the number of clusters. Performance is comparable to that of nonprobabilistic Chinese Whispers and average-linkage hierarchical clustering. We conclude that IGMM clustering of fixed dimensional embeddings holds promise as the lexical clustering component in unsupervised speech processing systems.
lexicon. Lexical clustering is the task of grouping unlabelled acoustic word tokens according to type. We propose a novel lexical clustering model: variable-length word segments are embedded in a fixed-dimensional acoustic space in which clustering is then performed. We evaluate several clustering algorithms and find that the best methods produce clusters with wide variation in sizes, as observed in natural language. The best probabilistic approach is an infinite Gaussian mixture model (IGMM), which automatically chooses the number of clusters. Performance is comparable to that of nonprobabilistic Chinese Whispers and average-linkage hierarchical clustering. We conclude that IGMM clustering of fixed dimensional embeddings holds promise as the lexical clustering component in unsupervised speech processing systems.
Original language | English |
---|---|
Title of host publication | Proceedings of the IEEE Spoken Language Technology Workshop |
Publisher | Institute of Electrical and Electronics Engineers |
Number of pages | 6 |
Publication status | Published - 2014 |