Automatic Phonetic Transcription of Words Based On Sparse Data

Maria Wolters, Antal Van Den Bosch

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

The relation between the orthography and the phonology of a language has traditionally been modelled by hand crafted rule sets. Machine-learning (ML) approaches offers a means to gather this knowledge automatically. Problems arise when the training material is sparse. Generalising from sparse data is a well-known problem for many ML algorithms. We present experiments in which connectionist, instance based, and decision tree learning algorithms are applied to a small corpus of Scottish Gaelic. instance-based learning in the ib1-ig algorithm yields the best generalisation performance, and that most algorithms tested perform tolerably well. Given the availability of a lexicon, even if it is sparse, ML is a valuable and efficient tool for automatic phonetic transcription of written text.
Original languageEnglish
Title of host publicationWorkshop Notes of the ECML/MLnet Workshop on Empirical Learning of Natural Language Processing Tasks
Place of PublicationPrague, Czech Republic
Pages61-70
Publication statusPublished - 1997
EventEMCL/MLnet Workshop on Empirical Learning of Natural Processing Tasks - Prague, Czech Republic
Duration: 26 Apr 199726 Apr 1997

Workshop

WorkshopEMCL/MLnet Workshop on Empirical Learning of Natural Processing Tasks
Country/TerritoryCzech Republic
CityPrague
Period26/04/9726/04/97

Fingerprint

Dive into the research topics of 'Automatic Phonetic Transcription of Words Based On Sparse Data'. Together they form a unique fingerprint.

Cite this