Projects per year
Abstract / Description of output
Self-supervised learning methods (SSL) have seen wide popularity for speech representation learning. Early methods, such as wav2vec, were causal, whilst more recent approaches, notably wav2vec 2.0 and data2vec, have employed masking strategies together with a Transformer architecture. Many SSL methods use contrastive learning; however, non-contrastive methods, while susceptible to representational collapse, have recently seen success in other fields and have been successfully applied to speech in data2vec. This work returns to the study of causal models, comparing them with equivalent non-causal variants, motivated by our observation that non-contrastive SSL models have never been investigated in this setting. To this end, we propose a novel approach, Bootstrapped Predictive Coding (BPC), a causal non-contrastive SSL model. We find that causal SSL models outperform their non-causal counterparts in both contrastive and non-contrastive training setups, and that representations obtained with BPC give overall best performance when evaluated on phone frame classification.
Original language | English |
---|---|
Title of host publication | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 11541-11545 |
Number of pages | 5 |
ISBN (Electronic) | 979-8-3503-4485-1 |
DOIs | |
Publication status | Published - 18 Mar 2024 |
Event | 2024 IEEE International Conference on Acoustics, Speech and Signal Processing - Seoul, Korea, Republic of Duration: 14 Apr 2024 → 19 Apr 2024 https://2024.ieeeicassp.org/ |
Publication series
Name | International Conference on Acoustics, Speech, and Signal Processing (ICASSP) |
---|---|
Publisher | IEEE |
ISSN (Print) | 1520-6149 |
ISSN (Electronic) | 2379-190X |
Conference
Conference | 2024 IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Abbreviated title | ICASSP 2024 |
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 14/04/24 → 19/04/24 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- self-supervised learning
- speech recognition
- non-contrastive methods
- masked acoustic modelling
Fingerprint
Dive into the research topics of 'BOOTSTRAP PREDICTIVE CODING: INVESTIGATING A NON-CONTRASTIVE SELF-SUPERVISED LEARNING APPROACH'. Together they form a unique fingerprint.Projects
- 2 Finished
-
Unmute : Opening Spoken Language Interaction to the Currently Unheard
Bell, P., Goldwater, S. & Renals, S.
1/12/20 → 30/11/23
Project: Research
-