TY - CHAP
T1 - When Training and Test Sets Are Different
T2 - Characterizing Learning Transfer
AU - Storkey, Amos
AU - Quiñonero-Candela, J
AU - Sugiyama, M
AU - Schwaighofer, A
AU - Lawrence, ND
PY - 2008/12
Y1 - 2008/12
N2 - In this chapter, a number of common forms of dataset shift are introduced, and each is related to a particular form of causal probabilistic model. Examples are given for the different types of shift, and some corresponding modeling approaches. By characterizing dataset shift in this way, there is potential for the development of models which capture the specific types of variations, combine different modes of variation, or do model selection to assess whether dataset shift is an issue in particular circumstances. As an example of how such models can be developed, an illustration is provided for one approach to adapting Gaussian process methods for a particular type of dataset shift called mixture component shift. After the issue of dataset shift is introduced, the distinction between conditional and unconditional models is elaborated in section 1.2. This difference is important in the context of dataset shift, as it will be argued in section 1.4 that dataset shift makes no difference for causally conditional models. This form of dataset shift has been called covariate shift. In section 1.5, another simple form of dataset shift is introduced: prior probability shift. This is followed by section 1.6 on sample selection bias, section 1.7 on imbalanced data, and section 1.8 on domain shift. Finally, three different types of source component shift are given in section 1.9. One example of modifying Gaussian process models to apply to one form of source component shift is given in section 1.10. A brief discussion on the issue of determining whether shift occurs (section 1.11) and on the relationship to transfer learning (section 1.12) concludes the chapter.
AB - In this chapter, a number of common forms of dataset shift are introduced, and each is related to a particular form of causal probabilistic model. Examples are given for the different types of shift, and some corresponding modeling approaches. By characterizing dataset shift in this way, there is potential for the development of models which capture the specific types of variations, combine different modes of variation, or do model selection to assess whether dataset shift is an issue in particular circumstances. As an example of how such models can be developed, an illustration is provided for one approach to adapting Gaussian process methods for a particular type of dataset shift called mixture component shift. After the issue of dataset shift is introduced, the distinction between conditional and unconditional models is elaborated in section 1.2. This difference is important in the context of dataset shift, as it will be argued in section 1.4 that dataset shift makes no difference for causally conditional models. This form of dataset shift has been called covariate shift. In section 1.5, another simple form of dataset shift is introduced: prior probability shift. This is followed by section 1.6 on sample selection bias, section 1.7 on imbalanced data, and section 1.8 on domain shift. Finally, three different types of source component shift are given in section 1.9. One example of modifying Gaussian process models to apply to one form of source component shift is given in section 1.10. A brief discussion on the issue of determining whether shift occurs (section 1.11) and on the relationship to transfer learning (section 1.12) concludes the chapter.
UR - https://mitpress.mit.edu/books/dataset-shift-machine-learning
M3 - Chapter
SN - 9780262170055
T3 - Neural Information Processing Series
SP - 3
EP - 28
BT - Dataset Shift in Machine Learning
PB - Yale University Press
CY - Cambridge
ER -