On Tree-Based Methods for Similarity Learning

Stephan Clémençon, Robin Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

In many situations, the choice of an adequate similarity measure or metric on the feature space dramatically determines the performance of machine learning methods. Building automatically such measures is the specific purpose of metric/similarity learning. In [21], similarity learning is formulated as a pairwise bipartite ranking problem: ideally, the larger the probability that two observations in the feature space belong to the same class (or share the same label), the higher the similarity measure between them. From this perspective, the backslashmathrmROC curve is an appropriate performance criterion and it is the goal of this article to extend recursive tree-based backslashmathrmROC optimization techniques in order to propose efficient similarity learning algorithms. The validity of such iterative partitioning procedures in the pairwise setting is established by means of results pertaining to the theory of U-processes and from a practical angle, it is discussed at length how to implement them by means of splitting rules specifically tailored to the similarity learning task. Beyond these theoretical/methodological contributions, numerical experiments are displayed and provide strong empirical evidence of the performance of the algorithmic approaches we propose.
Original languageEnglish
Title of host publicationMachine Learning, Optimization, and Data Science
EditorsGiuseppe Nicosia, Panos Pardalos, Renato Umeton, Giovanni Giuffrida, Vincenzo Sciacca
Place of PublicationCham
PublisherSpringer International Publishing
Number of pages13
ISBN (Electronic)978-3-030-37599-7
ISBN (Print)978-3-030-37598-0
Publication statusPublished - 3 Jan 2020
EventFifth International Conference on Machine Learning, Optimization, and Data Science - Tuscany, Italy
Duration: 10 Sept 201913 Sept 2019

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceFifth International Conference on Machine Learning, Optimization, and Data Science
Abbreviated titleLOD 2019
Internet address

Keywords / Materials (for Non-textual outputs)

  • Metric-learning
  • Rate bound analysis
  • Similaritylearning
  • Tree-based algorithms
  • U-processes


Dive into the research topics of 'On Tree-Based Methods for Similarity Learning'. Together they form a unique fingerprint.

Cite this