Bad Characters: Imperceptible NLP Attacks

Nicholas Boucher, Ilia Shumailov, Ross Anderson, Nicolas Papernot

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Several years of research have shown that machine-learning systems are vulnerable to adversarial examples, both in theory and in practice. Until now, such attacks have primarily targeted visual models, exploiting the gap between human and machine perception. Although text-based models have also been attacked with adversarial examples, such attacks struggled to preserve semantic meaning and indistinguishability. In this paper, we explore a large class of adversarial examples that can be used to attack text-based models in a black-box setting without making any human-perceptible visual modification to inputs. We use encoding-specific perturbations that are imperceptible to the human eye to manipulate the outputs of a wide range of Natural Language Processing (NLP) systems from neural machine-translation pipelines to web search engines. We find that with a single imperceptible encoding injection – representing one invisible character, homoglyph, reordering, or deletion – an attacker can significantly reduce the performance of vulnerable models, and with three injections most models can be functionally broken. Our attacks work against currently-deployed commercial systems, including those produced by Microsoft and Google, in addition to open source models published by Facebook, IBM, and HuggingFace. This novel series of attacks presents a significant threat to many language processing systems: an attacker can affect systems in a targeted manner without any assumptions about the underlying model. We conclude that text-based NLP systems require careful input sanitization, just like conventional applications, and that given such systems are now being deployed rapidly at scale, the urgent attention of architects and operators is required.
Original languageEnglish
Title of host publicationProceedings of the 43rd IEEE Symposium on Security and Privacy, SP 2022
PublisherIEEE
Pages1987-2004
Number of pages16
ISBN (Electronic)978-1-6654-1316-9
ISBN (Print)978-1-6654-1317-6
DOIs
Publication statusPublished - 27 Jul 2022
Event43rd IEEE Symposium on Security and Privacy - San Francisco, United States
Duration: 23 May 202226 May 2022
https://www.ieee-security.org/TC/SP2022/index.html

Publication series

Name2022 IEEE Symposium on Security and Privacy (SP)
PublisherIEEE
ISSN (Print)1081-6011
ISSN (Electronic)2375-1207

Conference

Conference43rd IEEE Symposium on Security and Privacy
Abbreviated titleSP 2022
Country/TerritoryUnited States
CitySan Francisco
Period23/05/2226/05/22
Internet address

Keywords / Materials (for Non-textual outputs)

  • adversarial machine learning
  • NLP
  • text-based models
  • text encodings
  • search engines

Fingerprint

Dive into the research topics of 'Bad Characters: Imperceptible NLP Attacks'. Together they form a unique fingerprint.

Cite this