Abstract / Description of output
State of the art models in automatic speech recognition have shown remarkable improvements due to modern self-supervised (SSL) transformer-based architectures such as wav2vec 2.0 (Baevski et al., 2020). However, how these models encode phonetic information is still not well understood. We explore whether SSL speech models display a linguistic property that characterizes human speech perception: language specificity. We show that while wav2vec 2.0 displays an overall language specificity effect when tested on Hindi vs. English, it does not resemble human speech perception when tested on finer-grained differences in Hindi speech contrasts.
Original language | English |
---|---|
Title of host publication | Proceedings of the 28th Conference on Computational Natural Language Learning |
Publisher | ACL Anthology |
Pages | 1-6 |
Number of pages | 6 |
Publication status | Accepted/In press - 24 Sept 2024 |
Event | The 28th Conference on Computational Natural Language Learning - Hyatt Regency Miami Hotel, Miami, United States Duration: 15 Nov 2024 → 16 Nov 2024 Conference number: 28 https://conll.org/2024 |
Conference
Conference | The 28th Conference on Computational Natural Language Learning |
---|---|
Abbreviated title | CoNLL 2024 |
Country/Territory | United States |
City | Miami |
Period | 15/11/24 → 16/11/24 |
Internet address |