Challenges of clustering multimodal clinical data: a review of applications in asthma subtyping

Elsie Horne, Holly Tibble, Aziz Sheikh, Thanasis Tsanas

Research output: Contribution to journalArticlepeer-review

Abstract

Background
In the current era of personalised medicine, there is increasing interest in understanding the heterogeneity in disease populations. Cluster analysis is a method which is commonly used for identifying subtypes in heterogeneous disease populations. The clinical data used in such applications are typically multimodal, which can make the application of traditional cluster analysis methods challenging.
Objectives
To review the research literature on applications of clustering multimodal clinical data to identify asthma subtypes. We assess common problems and shortcomings in the application of cluster analysis methods in determining asthma subtypes, such that they can be brought to the attention of the research community and avoided in future studies.
Methods
We searched PubMed and Scopus bibliographic databases with terms related to cluster analysis and asthma to identify studies applying dissimilarity-based cluster analysis methods. We recorded the analytic methods used by each study at each step of the cluster analysis process.
Results
Our literature search identified 63 studies that applied cluster analysis to multimodal clinical data to identify asthma subtypes. The features fed into the cluster algorithms were mixed-type in 47 (75%) studies, continuous in 12 (19%) and the feature type was unclear in the remaining four (6%) studies. Twenty-three (37%) studies used hierarchical clustering with Ward’s linkage and 22 (35%) studies used k-means. Out of these 45 studies, 39 had mixed-type features, but only five specified dissimilarity measures that could handle mixed-type features. Nine (14%) studies used a pre-clustering step to create small clusters to feed to a hierarchical method. The original sample sizes in these nine studies ranged from 84 to 349. The remaining studies used hierarchical clustering with other linkages (n=3), medoid-based methods (n=3), spectral clustering (n=1), multiple kernel k-means clustering (n=1), and in one study the methods were unclear. Fifty-four (86%) studies explained the methods used for determining the number of clusters; 24 (38%) studies tested whether their cluster solution was reproducible and 11 (17%) studies tested the stability of their solution. Reporting of the cluster analysis was generally poor in terms of the methods employed and their justification.
Conclusions
This review highlights common issues in the application of cluster analysis to multimodal clinical data to identify asthma subtypes. Some of these issues were related to the multimodal nature of the data, but many were more general issues in the application of cluster analysis. While cluster analysis may be a useful tool for investigating disease subtypes, we recommend that future studies carefully consider the implications of clustering multimodal data, the cluster analysis process itself, and the reporting of methods to facilitate replication and interpretation of findings.
Original languageEnglish
JournalJMIR Medical Informatics
DOIs
Publication statusPublished - 28 May 2020

Fingerprint

Dive into the research topics of 'Challenges of clustering multimodal clinical data: a review of applications in asthma subtyping'. Together they form a unique fingerprint.

Cite this