Experiments with Cross-Language Speech Retrieval for Lower-Resource Languages

Suraj Nair, Anton Ragni, Ondrej Klejch, Petra Galuscáková, Douglas Oard

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Cross-language speech retrieval systems face a cascade of errors due to transcription and translation ambiguity. Using 1-best speech recognition and 1-best translation in such a scenario could adversely affect recall if those 1-best system guesses are not correct. Accurately representing transcription and translation probabilities could therefore improve recall, although possibly at some cost in precision. The difficulty of the task is exacerbated when working with languages for which limited resources are available, since both recognition and translation probabilities may be less accurate in such cases. This paper explores the combination of expected term counts from recognition with expected term counts from translation to perform cross-language speech retrieval in which the queries are in English and the spoken content to be retrieved is in Tagalog or Swahili. Experiments were conducted using two query types, one focused on term presence and the other focused on topical retrieval. Overall, the results show that significant improvements in ranking quality result from modeling transcription and recognition ambiguity, even in lower-resource settings, and that adapting the ranking model to specific query types can yield further improvements.
Original languageEnglish
Title of host publicationInformation Retrieval Technology
Subtitle of host publication15th Asia Information Retrieval Societies Conference, AIRS 2019, Hong Kong, China, November 7–9, 2019, Proceedings
EditorsFu Lee Wang, Haoran Xie, Wai Lam, Aixin Sun, Lun-Wei Ku, Tianyong Hao, Wei Chen, Tak-Lam Wong, Xiaohui Tao
Place of PublicationCham
PublisherSpringer International Publishing
Number of pages13
ISBN (Electronic)978-3-030-42835-8
Publication statusPublished - 27 Feb 2020

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Dive into the research topics of 'Experiments with Cross-Language Speech Retrieval for Lower-Resource Languages'. Together they form a unique fingerprint.

Cite this