Edinburgh Research Explorer

Detecting Acronyms from Capital Letter Sequences in Spanish

Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Ruben San-Segundo
  • Juan M. Montero
  • Veronica Lopez-Luden
  • Simon King

Related Edinburgh Organisations

Open Access permissions

Open

Documents

  • Download as Adobe PDF

    Rights statement: © San-Segundo, R., Montero, J. M., Lopez-Luden, V., & King, S. (2012). Detecting Acronyms from Capital Letter Sequences in Spanish. In Proc. Interspeech.

    Accepted author manuscript, 136 KB, PDF document

http://interspeech2012.org/accepted-abstract.html?id=245
Original languageEnglish
Title of host publicationProc. Interspeech
Publication statusPublished - 1 Sep 2012

Abstract

This paper presents an automatic strategy to decide how to pronounce a Capital Letter Sequence (CLS) in a Text to Speech system (TTS). If CLS is well known by the TTS, it can be expanded in several words. But when the CLS is unknown, the system has two alternatives: spelling it (abbreviation) or pronouncing it as a new word (acronym). In Spanish, there is a high relationship between letters and phonemes. Because of this, when a CLS is similar to other words in Spanish, there is a high tendency to pronounce it as a standard word. This paper proposes an automatic method for detecting acronyms. Additionally, this paper analyses the discrimination capability of some features, and several strategies for combining them in order to obtain the best classifier. For the best classifier, the classification error is 8.45 About the feature analysis, the best features have been the Letter Sequence Perplexity and the Average N-gram order.

Download statistics

No data available

ID: 5855659