Abstract
Automatic recognition of eating conditions of humans could be auseful technology in health monitoring. The audio-visual informa-tion can be used in automating this process, and feature engineeringapproaches can reduce the dimensionality of audio-visual infor-mation. The reduced dimensionality of data (particularly featuresubset selection) can assist in designing a system for eating condi-tions recognition with lower power, cost, memory and computationresources than a system which is designed using full dimensionsof data. This paper presents Active Feature Transformation (AFT)and Active Feature Selection (AFS) methods, and applies them to allthree tasks of the ICMI 2018 EAT Challenge for recognition of usereating conditions using audio and visual features. The AFT methodis used for the transformation of the Mel-frequency Cepstral Coef-ficient andComParEfeatures for the classification task, while theAFS method helps in selecting a feature subset. Transformation byPrincipal Component Analysis (PCA) is also used for comparison.We find feature subsets of audio features using the AFS method(422 for Food Type, 104 for Likability and 68 for Difficulty out of988 features) which provide better results than the full feature set.Our results show that AFS outperforms PCA and AFT in terms ofaccuracy for the recognition of user eating conditions using audiofeatures. The AFT of visual features (facial landmarks) providesless accurate results than the AFS and AFT sets of audio features.However, the weighted score fusion of all the feature set improvesthe results.
Original language | English |
---|---|
Title of host publication | 20th ACM International Conference on Multimodal Interaction (ICMI 2018) |
Publisher | Association for Computing Machinery (ACM) |
Pages | 564-568 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 16 Oct 2018 |
Keywords
- audio-visual processing, dimensionality reduction, eating condition, feature extraction, feature selection, feature transformation