Abstract

CSTR NAM TIMIT Plus (Version 0.8) RELEASE May 2012 The Centre for Speech Technology Research University of Edinburgh Copyright (c) 2012 Junichi Yamagishi [email protected] Overview This CSTR NAM TIMIT Plus corpus includes a parallel whispered speech recorded simultaneously via a non-audible murmur (NAM) microphone which uses urethane-elastomer to create a close contact with the skin and an omni-directional headset-mounted condenser microphone (a DPA 4035). The NAM microphone is a kind of special microphone which can be used as the sensing device of a silent speech interfaces (SSI) system, where alternative signal can be acquired without the user speaking in the normal way. The NAM microphone is a special body-conductive microphone (See Nakajima et al., ICASSP 2003 and Toda et al., ICASSP 2009). It can be used to detect extremely quiet speech (NAM), that even listeners around the speaker can hardly hear. NAM speech tends to be unvoiced, like whispering. The best position to place the NAM microphone is just behind the ear. It can be used to detect various kinds of speech, including whispering and normal speech, conducted through the soft tissue of the head. It is more robust to environmental noise than an ordinary microphone. Compared to other kinds of SSI systems, which may involve electrodes or other sensing devices, a NAM microphone-based SSI system is non-intrusive, cheap and convenient. The corpus comprises 421 sentences, which were selected from newspaper text, 460 sentences, which were selected from the TIMIT texts, and 18 isolated words, which were aimed for an open-source voice command recogniser ``kiku'', uttered by a young female speaker. The 421 newspaper texts were randomly taken from Herald Glasgow, with permission from Herald & Times Group. The TIMIT texts are identical those of the MOCHA-TIMIT corpus: http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html The newspaper recording comprises of two sections: one recorded in clean conditions and the other one in pre-recorded cafeteria noise played over a loudspeaker at 65 dB [A] (resulting in and SNR of approximately 10 dB). The timit and isolated word recordings were conducted only in the noise condition. Both sections of the corpus were recorded in a soundproof hemi-anechoic chamber (noise floor around 25 dB [A]) at 96kHz sampling rate and 24 bit sample depth into a Pro Tools HD system. ACKNOWLEDGEMENTS We are grateful to Dr Tomoki Toda (Nara Institute of Science and Technology) for providing the NAM microphone to us and to Dr Mark Gales and Dr Federico Flego (University of Cambrige) for providing the VTS decoder for this corpus. This corpus was constructed whilst ChenYu Yang was a visitor at the Centre for Speech Technology Research, University of Edinburgh, UK.

Data Citation

Yamagishi, Junichi; Brown, Georgina; Yang, ChenYu; Clark, Rob; King, Simon. (2021). CSTR NAM TIMIT Plus, [dataset]. University of Edinburgh. Centre for Speech Technology Research. https://doi.org/10.7488/ds/2994.
Date made available2 Mar 2021
PublisherEdinburgh DataShare

Cite this