TY - GEN
T1 - Adversarial robustness of VAEs through the lens of local geometry
AU - Khan, Asif
AU - Storkey, Amos
N1 - Funding Information:
This work was partly supported by an unconditional gift from Huawei Noah’s Ark Lab, London.
Publisher Copyright:
Copyright © 2023 by the author(s)
PY - 2023/4/25
Y1 - 2023/4/25
N2 - In an unsupervised attack on variational autoencoders (VAEs), an adversary finds a small perturbation in an input sample that significantly changes its latent space encoding, thereby compromising the reconstruction for a fixed decoder. A known reason for such vulnerability is the distortions in the latent space resulting from a mismatch between approximated latent posterior and a prior distribution. Consequently, a slight change in an input sample can move its encoding to a low/zero density region in the latent space resulting in an unconstrained generation. This paper demonstrates that an optimal way for an adversary to attack VAEs is to exploit a directional bias of a stochastic pullback metric tensor induced by the encoder and decoder networks. The pullback metric tensor of an encoder measures the change in infinitesimal latent volume from an input to a latent space. Thus, it can be viewed as a lens to analyse the effect of input perturbations leading to latent space distortions. We propose robustness evaluation scores using the eigenspectrum of a pullback metric tensor. Moreover, we empirically show that the scores correlate with the robustness parameter β of the β-VAE. Since increasing β also degrades reconstruction quality, we demonstrate a simple alternative using mixup training to fill the empty regions in the latent space, thus improving robustness with improved reconstruction.
AB - In an unsupervised attack on variational autoencoders (VAEs), an adversary finds a small perturbation in an input sample that significantly changes its latent space encoding, thereby compromising the reconstruction for a fixed decoder. A known reason for such vulnerability is the distortions in the latent space resulting from a mismatch between approximated latent posterior and a prior distribution. Consequently, a slight change in an input sample can move its encoding to a low/zero density region in the latent space resulting in an unconstrained generation. This paper demonstrates that an optimal way for an adversary to attack VAEs is to exploit a directional bias of a stochastic pullback metric tensor induced by the encoder and decoder networks. The pullback metric tensor of an encoder measures the change in infinitesimal latent volume from an input to a latent space. Thus, it can be viewed as a lens to analyse the effect of input perturbations leading to latent space distortions. We propose robustness evaluation scores using the eigenspectrum of a pullback metric tensor. Moreover, we empirically show that the scores correlate with the robustness parameter β of the β-VAE. Since increasing β also degrades reconstruction quality, we demonstrate a simple alternative using mixup training to fill the empty regions in the latent space, thus improving robustness with improved reconstruction.
M3 - Conference contribution
VL - 206
T3 - Proceedings of Machine Learning Research
SP - 8954
EP - 8967
BT - Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
A2 - Ruiz, Francisco
A2 - Dy, Jennifer
A2 - van de Meent, Jan-Willem
PB - PMLR
ER -