Identifying the best data-driven feature selection method for boosting reproducibility in classification tasks

Nicolas Georges, Islem Mhiri, Islem Rekik (Lead / Corresponding author), Alzheimer's Disease Neuroimaging Initiative

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

Considering the proliferation of extremely high-dimensional data in many domains including computer vision and healthcare applications such as computer-aided diagnosis (CAD), advanced techniques for reducing the data dimensionality and identifying the most relevant features for a given classification task such as distinguishing between healthy and disordered brain states are needed. Despite the existence of many works on boosting the classification accuracy using a particular feature selection (FS) method, choosing the best one from a large pool of existing FS techniques for boosting feature reproducibility within a dataset of interest remains a formidable challenge to tackle. Notably, a good performance of a particular FS method does not necessarily imply that the experiment is reproducible and that the features identified are optimal for the entirety of the samples. Essentially, this paper presents the first attempt to address the following challenge: "Given a set of different feature selection methods {FS1, ..., FSK }, and a dataset of interest, how to identify the most reproducible and 'trustworthy' connectomic features that would produce reliable biomarkers capable of accurately differentiate between two specific conditions?" To this aim, we propose FS-Select framework which explores the relationships among the different FS methods using a multi-graph architecture based on feature reproducibility power, average accuracy, and feature * Corresponding author. stability of each FS method. By extracting the 'central' graph node, we identify the most reliable and reproducible FS method for the target brain state classification task along with the most discriminative features fingerprinting these brain states. To evaluate the reproducibility power of FS-Select, we perturbed the training set by using different cross-validation strategies on a multi-view small-scale connectomic dataset (late mild cognitive impairment vs Alzheimer's disease) and large-scale dataset including autistic vs healthy subjects. Our experiments revealed reproducible connectional features fingerprinting disordered brain states.
Original languageEnglish
Article number107183
Pages (from-to)1-14
Number of pages14
JournalPattern Recognition
Volume101
Early online date9 Jan 2020
DOIs
Publication statusPublished - May 2020

Keywords

  • Biomarker discovery
  • Connectomics
  • Cross-validation
  • Feature reproducibility
  • Feature selection methods
  • Morphological brain network
  • Multi-graph topological analysis
  • Neurological disorders

Fingerprint Dive into the research topics of 'Identifying the best data-driven feature selection method for boosting reproducibility in classification tasks'. Together they form a unique fingerprint.

Cite this