Vision2Sensor: Knowledge Transfer Across Sensing Modalities for Human Activity Recognition

Valentin Radu, Maximilian Henne

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Mobile and wearable sensing devices are pervasive, coming packed with a growing number of sensors. These are supposed to provide direct observations about user activity and context to intelligent systems, and are envisioned to be at the core of smart buildings, towards habitat automation to suit user needs. However, much of this enormous sensing capability is currently wasted, instead of being tapped into, because developing context recognition systems requires substantial amount of labeled sensor data to train models on. Sensor data is hard to interpret and annotate after collection, making it difficult and costly to generate large training sets, which is now stalling the adoption of mobile sensing at scale. We address this fundamental problem in the ubicomp community (not having enough training data) by proposing a knowledge transfer framework, Vision2Sensor, which opportunistically transfers information from an easy to interpret and more advanced sensing modality, vision, to other sensors on mobile devices. Activities recognised by computer vision in the camera field of view are synchronized with inertial sensor data to produce labels, which are then used to dynamically update a mobile sensor based recognition model. We show that transfer learning is also beneficial to identifying the best Convolutional Neural Network for vision based human activity recognition for our task. The performance of a proposed network is first evaluated on a larger dataset, followed by transferring the pre-trained model to be fine-tuned on our five class activity recognition task. Our sensor based Deep Neural Network is robust to withstand substantial degradation of label quality, dropping just 3% in accuracy on induced degradation of 15% to vision generated labels. This indicates that knowledge transfer between sensing modalities is achievable even with significant noise introduced by the labeling modality. Our system operates in real-time on embedded computing devices, ensuring user data privacy by performing all the computations in the local network.
Original languageEnglish
Article number84
Pages (from-to)84:1-84:21
Number of pages21
JournalProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Issue number3
Publication statusPublished - 9 Sept 2019


Dive into the research topics of 'Vision2Sensor: Knowledge Transfer Across Sensing Modalities for Human Activity Recognition'. Together they form a unique fingerprint.

Cite this