Multilingual number transcription for text-to-speech conversion

Rubén San-Segundo, Juan Manuel Montero, Mircea Giurgiu, Ioana Muresan, Simon King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

This paper describes the text normalization module of a text to speech fullytrainable conversion system and its application to number transcription. The main target is to generate a language independent text normalization module, based on data instead of on expert rules. This paper proposes a general architecture based on statistical machine translation techniques. This proposal is composed of three main modules: a tokenizer for splitting the text input into a token graph, a phrasebased translation module for token translation, and a post-processing module for removing some tokens. This architecture has been evaluated for number transcription in several languages: English, Spanish and Romanian. Number transcription is an important aspect in the text normalization problem.
Original languageEnglish
Title of host publication8th ISCA Workshop on Speech Synthesis (SSW 8)
Number of pages5
Publication statusPublished - 2 Sept 2013

Keywords / Materials (for Non-textual outputs)

  • multilingual number transcription
  • text normalization
  • fully-trainable text conversion.


Dive into the research topics of 'Multilingual number transcription for text-to-speech conversion'. Together they form a unique fingerprint.

Cite this