Leveraging variational autoencoders for multiple data imputation

Breeshey Roskams-Hieter, Jude Wells, Sara K Wade

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to learn complex and non-linear relationships. In this work, we investigate the ability of variational autoencoders (VAEs) to account for uncertainty in missing data through multiple imputation. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations. To overcome this, we employ β-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification. Assigning a good value of β is critical for uncertainty calibration and we demonstrate how this can be achieved using crossvalidation.
We assess three alternative methods for sampling from the posterior distribution of missing values and apply the approach to transcriptomics datasets with various simulated missingness scenarios. Finally, we show that single imputation in transcriptomic data can cause false discoveries in downstream tasks and employing multiple imputation with β-VAEs can effectively mitigate these inaccuracies.
Original languageEnglish
JournalProceedings of the European Conference of the PHM Society
Publication statusAccepted/In press - 1 Jun 2023

Fingerprint

Dive into the research topics of 'Leveraging variational autoencoders for multiple data imputation'. Together they form a unique fingerprint.

Cite this