Multiple-source cross-validation

Krzysztof Geras, Charles Sutton

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Cross-validation is an essential tool in machine learning and statistics. The typical procedure, in which data points are randomly assigned to one of the test sets, makes an implicit assumption that the data are exchangeable. A common case in which this does not hold is when the data come from multiple sources, in the sense used in transfer learning. In this case it is common to arrange the cross-validation procedure in a way that takes the source structure into account. Although common in practice, this procedure does not appear to have been theoretically analysed. We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting. These new estimators allow for much more accurate confidence intervals and hypothesis tests to compare algorithms.
Original languageEnglish
Title of host publicationProceedings of The 30th International Conference on Machine Learning
PublisherJournal of Machine Learning Research: Workshop and Conference Proceedings
Pages1292-1300
Number of pages9
Volume28
ISBN (Print)1938-7228
Publication statusPublished - 2013

Fingerprint

Dive into the research topics of 'Multiple-source cross-validation'. Together they form a unique fingerprint.

Cite this