Development of a low-cost, noninvasive, portable visual speech recognition program

Gavriel D. Kohlberg, Yakov Gal, Anil K. Lalwani

Research output: Contribution to journalArticlepeer-review

Abstract

Objectives: Loss of speech following tracheostomy and laryngectomy severely limits communication to simple gestures and facial expressions that are largely ineffective. To facilitate communication in these patients, we seek to develop a low cost, noninvasive, portable, and simple visual speech recognition program (VSRP) to convert articulatory facial movements into speech.
Methods: A Microsoft Kinect–based VSRP was developed to capture spatial coordinates of lip movements and translate them into speech. The articulatory speech movements associated with 12 sentences were used to train an artificial neural network classifier. The accuracy of the classifier was then evaluated on a separate, previously unseen set of articulatory speech movements.
Results: The VSRP was successfully implemented and tested in 5 subjects. It achieved an accuracy rate of 77.2% (65.0%-87.6% for the 5 speakers) on a 12-sentence data set. The mean time to classify an individual sentence was 2.03 milliseconds (1.91-2.16).
Conclusion: We have demonstrated the feasibility of a low-cost, noninvasive, portable VSRP based on Kinect to accurately predict speech from articulation movements in clinically trivial time. This VSRP could be used as a novel communication device for aphonic patients.
Original languageEnglish
Pages (from-to)752-757
Number of pages6
JournalAnnals of Otology, Rhinology & Laryngology
Volume125
Issue number9
Early online date19 May 2016
DOIs
Publication statusPublished - 1 Sept 2016

Fingerprint

Dive into the research topics of 'Development of a low-cost, noninvasive, portable visual speech recognition program'. Together they form a unique fingerprint.

Cite this