Abstract
We focus on the problem of unsupervised cell outlier detection and repair in mixed-type tabular data. Traditional methods are concerned only on detecting which rows in the dataset are outliers. However, identifying which cells corrupt a specific row is an important problem in practice, and the very first step towards repairing them. We introduce the Robust Variational Autoencoder (RVAE), a deep generative model that learns the joint distribution of the clean data while identifying the outlier cells, allowing their imputation (repair). RVAE explicitly learns the probability of each cell being an outlier, balancing different likelihood models in the row outlier score, making the method suitable for OD in mixed-type datasets. We show experimentally that not only RVAE performs better than several state-of-the-art methods in cell OD and repair for tabular data, but also that is robust against the initial hyper-parameter selection.
Original language | English |
---|---|
Title of host publication | Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics |
Publisher | PMLR |
Pages | 4056-4066 |
Number of pages | 10 |
Publication status | Published - 28 Aug 2020 |
Event | 23rd International Conference on Artificial Intelligence and Statistics - Teatro Politeama, Online, Italy Duration: 26 Aug 2020 → 28 Aug 2020 Conference number: 23 https://www.aistats.org/ |
Publication series
Name | Proceedings of Machine Learning Research |
---|---|
Publisher | PMLR |
Volume | 108 |
ISSN (Electronic) | 2640-3498 |
Conference
Conference | 23rd International Conference on Artificial Intelligence and Statistics |
---|---|
Abbreviated title | AISTATS 2020 |
Country/Territory | Italy |
City | Online |
Period | 26/08/20 → 28/08/20 |
Internet address |