Relevance, Redundancy and Complementarity Trade-off (RRCT): a Principled, Generic, Robust Feature Selection Tool

Research output: Contribution to journalArticlepeer-review

Abstract

We present a new heuristic feature-selection (FS) algorithm that integrates in a principled algorithmic framework the three key FS components: relevance, redundancy, and complementarity. Thus, we call it relevance, redundancy, and complementarity trade-off (RRCT). The association strength between each feature and the response and between feature pairs is quantified via an information theoretic transformation of rank correlation coefficients, and the feature complementarity is quantified using partial correlation coefficients. We empirically benchmark the performance of RRCT against 19 FS algorithms across four synthetic and eight real-world datasets in indicative challenging settings evaluating the following: (1) matching the true feature set and (2) out-of-sample performance in binary and multi-class classification problems when presenting selected features into a random forest. RRCT is very competitive in both tasks, and we tentatively make suggestions on the generalizability and application of the best-performing FS algorithms across settings where they may operate effectively.

Original languageEnglish
Article number100471
Pages (from-to)100471
JournalPatterns
Volume3
Issue number5
Early online date31 Mar 2022
DOIs
Publication statusPublished - 13 May 2022

Keywords

  • DSML3: Development/pre-production: Data science output has been rolled out/validated across multiple domains/problems
  • curse of dimensionality
  • dimensionality reduction
  • feature selection
  • information theory
  • principle of parsimony
  • statistical learning
  • variable selection

Fingerprint

Dive into the research topics of 'Relevance, Redundancy and Complementarity Trade-off (RRCT): a Principled, Generic, Robust Feature Selection Tool'. Together they form a unique fingerprint.

Cite this