Edinburgh Research Explorer

Statistical parametric speech synthesis for Ibibio

Research output: Contribution to journalArticle

Related Edinburgh Organisations

Documents

  • Download as Adobe PDF

    Rights statement: © Ekpenyong, M., Urua, E-A., Watts, O., King, S., & Yamagishi, J. (2014). Statistical parametric speech synthesis for Ibibio. Speech Communication, 56, 243-251. 10.1016/j.specom.2013.02.003

    Submitted manuscript, 4 MB, PDF-document

http://www.sciencedirect.com/science/article/pii/S016763931300023X
Original languageEnglish
Pages (from-to)243-251
Number of pages9
JournalSpeech Communication
Volume56
Early online date24 Feb 2013
DOIs
Publication statusPublished - Jan 2014

Abstract

Ibibio is a Nigerian tone language, spoken in the south-east coastal region of Nigeria. Like most African languages, it is resource-limited. This presents a major challenge to conventional approaches to speech synthesis, which typically require the training of numerous predictive models of linguistic features such as the phoneme sequence (i.e., a pronunciation dictionary plus a letter-to-sound model) and prosodic structure (e.g., a phrase break predictor). This training is invariably supervised, requiring a corpus of training data labelled with the linguistic feature to be predicted. In this paper, we investigate what can be achieved in the absence of many of these expensive resources, and also with a limited amount of speech recordings. We employ a statistical parametric method, because this has been found to offer good performance even on small corpora, and because it is able to directly learn the relationship between acoustics and whatever linguistic features are available, potentially mitigating the absence of explicit representations of intermediate linguistic layers such as prosody. We present an evaluation that compares systems that have access to varying degrees of linguistic structure. The simplest system only uses phonetic context (quinphones), and this is compared to systems with access to a richer set of context features, with or without tone marking. It is found that the use of tone marking contributes significantly to the quality of synthetic speech. Future work should therefore address the problem of tone assignment using a dictionary and the building of a prediction module for out-of-vocabulary words.

Download statistics

No data available

ID: 7599666