Multilingual number transcription for text-to-speech conversion

Rubén San-Segundo, Juan Manuel Montero, Mircea Giurgiu, Ioana Muresan, Simon King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

This paper describes the text normalization module of a text to speech fullytrainable conversion system and its application to number transcription. The main target is to generate a language independent text normalization module, based on data instead of on expert rules. This paper proposes a general architecture based on statistical machine translation techniques. This proposal is composed of three main modules: a tokenizer for splitting the text input into a token graph, a phrasebased translation module for token translation, and a post-processing module for removing some tokens. This architecture has been evaluated for number transcription in several languages: English, Spanish and Romanian. Number transcription is an important aspect in the text normalization problem.
Original languageEnglish
Title of host publication8th ISCA Workshop on Speech Synthesis (SSW 8)
PublisherISCA
Pages65-69
Number of pages5
Publication statusPublished - 2 Sept 2013

Keywords / Materials (for Non-textual outputs)

  • multilingual number transcription
  • text normalization
  • fully-trainable text conversion.

Fingerprint

Dive into the research topics of 'Multilingual number transcription for text-to-speech conversion'. Together they form a unique fingerprint.

Cite this