Catch Me If You Can: Blackbox Adversarial Attacks on Automatic Speech Recognition using Frequency Masking

Xiaoliang Wu, Ajitha Rajan

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Automatic speech recognition (ASR) models are used widely in applications for voice navigation and voice control of domestic appliances. ASRs have been misused by attackers to generate malicious outputs by attacking the deep learning component within ASRs. To assess the security and robustnesss of ASRs, we propose techniques within our framework SPAT that generate blackbox (agnostic to the DNN) adversarial attacks that are portable across ASRs. This is in contrast to existing work that focuses on whitebox attacks that are time consuming and lack portability.
Our techniques generate adversarial attacks that have no human audible difference by manipulating the input speech signal using a psychoacoustic model that maintains the audio perturbations below the thresholds of human perception. We propose a framework SPAT with three attack generation techniques based on the psychoacoustic concept and frame selection techniques to selectively target the attack. We evaluate portability and effectiveness of our techniques using three popular ASRs and two input audio datasets using the metrics - Word Error Rate (WER) of output transcription, Similarity to original audio, attack Success Rate on different ASRs and Detection score by a defense system. We found our adversarial attacks were portable across ASRs, not easily detected by a state-ofthe-art defense system, and had significant difference in output transcriptions while sounding similar to original audio
Original languageEnglish
Title of host publicationProceedings of 2022 29th Asia-Pacific Software Engineering Conference (APSEC)
Number of pages10
Publication statusAccepted/In press - 25 Aug 2022
Event29th Asia-Pacific Software Engineering Conference, 2022 - Online
Duration: 6 Dec 20229 Dec 2022
Conference number: 29


Conference29th Asia-Pacific Software Engineering Conference, 2022
Abbreviated titleAPSEC 2022
Internet address


  • Automatic Speech Recognition
  • Adversarial Attack
  • Blackbox
  • Frequency Masking


Dive into the research topics of 'Catch Me If You Can: Blackbox Adversarial Attacks on Automatic Speech Recognition using Frequency Masking'. Together they form a unique fingerprint.

Cite this