Abstract
Type inference refers to the task of inferring the data type of a given column of data. Current approaches often fail when data contains missing data and anomalies, which are found commonly in real-world data sets. In this paper, we propose ptype, a probabilistic robust type inference method that allows us to detect such entries, and infer data types. We further show that the proposed method outperforms existing methods.
| Original language | English |
|---|---|
| Pages (from-to) | 870-904 |
| Number of pages | 35 |
| Journal | Data Mining and Knowledge Discovery |
| Volume | 34 |
| Issue number | 3 |
| Early online date | 16 Mar 2020 |
| DOIs | |
| Publication status | Published - 31 May 2020 |
Keywords / Materials (for Non-textual outputs)
- Type inference
- Robustness
- Probabilistic finite-state machine