On Tree-Based Methods for Similarity Learning

Stephan Clémençon, Robin Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In many situations, the choice of an adequate similarity measure or metric on the feature space dramatically determines the performance of machine learning methods. Building automatically such measures is the specific purpose of metric/similarity learning. In [21], similarity learning is formulated as a pairwise bipartite ranking problem: ideally, the larger the probability that two observations in the feature space belong to the same class (or share the same label), the higher the similarity measure between them. From this perspective, the backslashmathrmROC curve is an appropriate performance criterion and it is the goal of this article to extend recursive tree-based backslashmathrmROC optimization techniques in order to propose efficient similarity learning algorithms. The validity of such iterative partitioning procedures in the pairwise setting is established by means of results pertaining to the theory of U-processes and from a practical angle, it is discussed at length how to implement them by means of splitting rules specifically tailored to the similarity learning task. Beyond these theoretical/methodological contributions, numerical experiments are displayed and provide strong empirical evidence of the performance of the algorithmic approaches we propose.
Original languageEnglish
Title of host publicationMachine Learning, Optimization, and Data Science
EditorsGiuseppe Nicosia, Panos Pardalos, Renato Umeton, Giovanni Giuffrida, Vincenzo Sciacca
Place of PublicationCham
PublisherSpringer International Publishing
Pages676-688
Number of pages13
ISBN (Electronic)978-3-030-37599-7
ISBN (Print)978-3-030-37598-0
DOIs
Publication statusPublished - 3 Jan 2020
EventFifth International Conference on Machine Learning, Optimization, and Data Science - Tuscany, Italy
Duration: 10 Sep 201913 Sep 2019
https://lod2019.icas.xyz/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume11943
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceFifth International Conference on Machine Learning, Optimization, and Data Science
Abbreviated titleLOD 2019
Country/TerritoryItaly
CityTuscany
Period10/09/1913/09/19
Internet address

Keywords

  • Metric-learning
  • Rate bound analysis
  • Similaritylearning
  • Tree-based algorithms
  • U-processes

Fingerprint

Dive into the research topics of 'On Tree-Based Methods for Similarity Learning'. Together they form a unique fingerprint.

Cite this