Self-supervised speech representations display some human-like cross-linguistic perceptual abilities

Joselyn Rodriguez*, Kamala Sreepada, Ruolan Leslie Famularo, Sharon Goldwater, Naomi H. Feldman

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

State of the art models in automatic speech recognition have shown remarkable improvements due to modern self-supervised (SSL) transformer-based architectures such as wav2vec 2.0 (Baevski et al., 2020). However, how these models encode phonetic information is still not well understood. We explore whether SSL speech models display a linguistic property that characterizes human speech perception: language specificity. We show that while wav2vec 2.0 displays an overall language specificity effect when tested on Hindi vs. English, it does not resemble human speech perception when tested on finer-grained differences in Hindi speech contrasts.
Original languageEnglish
Title of host publicationProceedings of the 28th Conference on Computational Natural Language Learning
PublisherACL Anthology
Pages1-6
Number of pages6
Publication statusAccepted/In press - 24 Sept 2024
EventThe 28th Conference on Computational Natural Language Learning - Hyatt Regency Miami Hotel, Miami, United States
Duration: 15 Nov 202416 Nov 2024
Conference number: 28
https://conll.org/2024

Conference

ConferenceThe 28th Conference on Computational Natural Language Learning
Abbreviated titleCoNLL 2024
Country/TerritoryUnited States
CityMiami
Period15/11/2416/11/24
Internet address

Fingerprint

Dive into the research topics of 'Self-supervised speech representations display some human-like cross-linguistic perceptual abilities'. Together they form a unique fingerprint.

Cite this