When Training and Test Sets Are Different: Characterizing Learning Transfer

Amos Storkey*, J Quiñonero-Candela, M Sugiyama, A Schwaighofer, ND Lawrence

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract / Description of output

In this chapter, a number of common forms of dataset shift are introduced, and each is related to a particular form of causal probabilistic model. Examples are given for the different types of shift, and some corresponding modeling approaches. By characterizing dataset shift in this way, there is potential for the development of models which capture the specific types of variations, combine different modes of variation, or do model selection to assess whether dataset shift is an issue in particular circumstances. As an example of how such models can be developed, an illustration is provided for one approach to adapting Gaussian process methods for a particular type of dataset shift called mixture component shift. After the issue of dataset shift is introduced, the distinction between conditional and unconditional models is elaborated in section 1.2. This difference is important in the context of dataset shift, as it will be argued in section 1.4 that dataset shift makes no difference for causally conditional models. This form of dataset shift has been called covariate shift. In section 1.5, another simple form of dataset shift is introduced: prior probability shift. This is followed by section 1.6 on sample selection bias, section 1.7 on imbalanced data, and section 1.8 on domain shift. Finally, three different types of source component shift are given in section 1.9. One example of modifying Gaussian process models to apply to one form of source component shift is given in section 1.10. A brief discussion on the issue of determining whether shift occurs (section 1.11) and on the relationship to transfer learning (section 1.12) concludes the chapter.

Original languageEnglish
Title of host publicationDataset Shift in Machine Learning
Place of PublicationCambridge
PublisherYale University Press in association with the Museum of London
Number of pages26
ISBN (Electronic)9780262257725
ISBN (Print)9780262170055
Publication statusPublished - Dec 2008

Publication series

NameNeural Information Processing Series
PublisherMIT Press


Dive into the research topics of 'When Training and Test Sets Are Different: Characterizing Learning Transfer'. Together they form a unique fingerprint.

Cite this