Grapheme-to-phoneme conversion methods for minority language conditions

Mengxue Cao, Steve Renals, Peter Bell, Aijun Li, Qiang Fang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This study attempts to investigate the grapheme-to-phoneme conversion approaches for minority language conditions. Instead of isolated-word data for major languages, sentence-form data is defined to be a proper form of training data for minority languages. Joint-multigram Model and Hidden Markov Model were examined in this study. The 'treat-sentence-as-word' training method and the forced-alignment process were proposed to extend the Joint-multigram Model and the Hidden Markov Model respectively to meet the minority language conditions. Results get from the sentence-form training data using our proposed methods are as good as the results get from the isolated-word training data using previous proposed methods. The Joint-multigram Model performs better for well-designed training data, while the Hidden Markov Model has more error capacity and is more proper for minority language conditions.

Original languageEnglish
Title of host publicationProceedings of the 2012 International Conference on Speech Database and Assessments, Oriental COCOSDA 2012
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages151-156
Number of pages6
ISBN (Print)9781467328104
DOIs
Publication statusPublished - 1 Feb 2013
Event2012 15th International Conference on Speech Database and Assessments, Oriental COCOSDA 2012 - Macau, China
Duration: 9 Dec 201212 Dec 2012

Conference

Conference2012 15th International Conference on Speech Database and Assessments, Oriental COCOSDA 2012
CountryChina
CityMacau
Period9/12/1212/12/12

Keywords

  • forced-alignment
  • Grapheme-to-phoneme
  • HMM
  • Joint-multigram Model
  • treat-sentence-as-word

Fingerprint Dive into the research topics of 'Grapheme-to-phoneme conversion methods for minority language conditions'. Together they form a unique fingerprint.

Cite this