Speech analysis could help develop clinical tools for automatic detection of Alzheimer's disease and monitoring of its progression. However, datasets containing both clinical information and spontaneous speech suitable for statistical learning are relatively scarce. In addition, speech data are often collected under different conditions, such as monologue and dialogue recording protocols. Therefore, there is a need for methods to allow the combination of these scarce resources. In this paper, we propose two feature extraction and representation models, based on neural networks and trained on monologue and dialogue data recorded in clinical settings. These models are evaluated not only for AD recognition, but also with respect to their potential to generalise across both datasets. They provide good results when trained and tested on the same data set (72.56 % UAR for monologue data and 85.21 % for dialogue). A decrease in UAR is observed in transfer training, where feature extraction models trained on dialogues provide better average UAR on monologues (63.72 %) than the other way around (58.94 %). When the choice of classifiers is independent of feature extraction, transfer from monologue models to dialogues result in a maximum UAR of 81.04 % and transfer from dialogue features to monologue achieve a maximum UAR of 70.73 %, evidencing the generalisability of the feature model.
|Title of host publication||42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)|
|Publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|Publication status||Published - 27 Aug 2020|
|Event||42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society : EMBC'20 - Palais des congrès de Montréal, Montréal, Québec, Canada|
Duration: 20 Jul 2020 → 24 Jul 2020
|Conference||42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society|
|Period||20/07/20 → 24/07/20|