OBJECTIVES: Automated whole brain segmentation from magnetic resonance images is of great interest for the development of clinically relevant volumetric markers for various neurological diseases. Although deep learning methods have demonstrated remarkable potential in this area, they may perform poorly in nonoptimal conditions, such as limited training data availability. Manual whole brain segmentation is an incredibly tedious process, so minimizing the data set size required for training segmentation algorithms may be of wide interest. The purpose of this study was to compare the performance of the prototypical deep learning segmentation architecture (U-Net) with a previously published atlas-free traditional machine learning method, Classification using Derivative-based Features (C-DEF) for whole brain segmentation, in the setting of limited training data.
MATERIALS AND METHODS: C-DEF and U-Net models were evaluated after training on manually curated data from 5, 10, and 15 participants in 2 research cohorts: (1) people living with clinically diagnosed HIV infection and (2) relapsing-remitting multiple sclerosis, each acquired at separate institutions, and between 5 and 295 participants' data using a large, publicly available, and annotated data set of glioblastoma and lower grade glioma (brain tumor segmentation). Statistics was performed on the Dice similarity coefficient using repeated-measures analysis of variance and Dunnett-Hsu pairwise comparison.
RESULTS: C-DEF produced better segmentation than U-Net in lesion (29.2%-38.9%) and cerebrospinal fluid (5.3%-11.9%) classes when trained with data from 15 or fewer participants. Unlike C-DEF, U-Net showed significant improvement when increasing the size of the training data (24%-30% higher than baseline). In the brain tumor segmentation data set, C-DEF produced equivalent or better segmentations than U-Net for enhancing tumor and peritumoral edema regions across all training data sizes explored. However, U-Net was more effective than C-DEF for segmentation of necrotic/non-enhancing tumor when trained on 10 or more participants, probably because of the inconsistent signal intensity of the tissue class.
CONCLUSIONS: These results demonstrate that classical machine learning methods can produce more accurate brain segmentation than the far more complex deep learning methods when only small or moderate amounts of training data are available (n ≤ 15). The magnitude of this advantage varies by tissue and cohort, while U-Net may be preferable for deep gray matter and necrotic/non-enhancing tumor segmentation, particularly with larger training data sets (n ≥ 20). Given that segmentation models often need to be retrained for application to novel imaging protocols or pathology, the bottleneck associated with large-scale manual annotation could be avoided with classical machine learning algorithms, such as C-DEF.
- Brain/diagnostic imaging
- Brain Neoplasms/diagnostic imaging
- Deep Learning
- HIV Infections
- Image Processing, Computer-Assisted/methods
- Logistic Models
- Magnetic Resonance Imaging/methods