Cultivating Spoken Language Technologies for Unwritten Languages

Thomas Reitmaier, Dani Kalarikalayil Raju, Ondrej Klejch, Electra Wallington, Nina Markl, Jennifer Pearson, Matt Jones, Peter Bell, Simon Robinson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We report on community-centered, collaborative research that weaves together HCI, natural language processing, linguistic, and design insights to develop spoken language technologies for unwritten languages. Across three visits to a Banjara farming community in India, we use participatory, technical, and creative methods to engage community members, collect spoken language photo annotations, and develop an information retrieval (IR) system. Drawing on orality theory, we interrogate assumptions and biases of current speech interfaces and create a simple application that leverages our IR system to match fluidly spoken queries with recorded annotations and surface corresponding photos. In-situ evaluations show how our novel approach returns reliable results and inspired the co-creation of media retrieval use-cases that are more appropriate in oral contexts. The very low (< 4h) spoken data requirements makes our approach adaptable to other contexts where languages are unwritten or have no digital language resources available.
Original languageEnglish
Title of host publicationCHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems
Publication statusAccepted/In press - 9 Mar 2024
EventACM CHI Conference on Human Factors in Computing Systems 2024
- Honolulu, United States
Duration: 11 May 202416 May 2024


ConferenceACM CHI Conference on Human Factors in Computing Systems 2024
Abbreviated titleCHI 2024
Country/TerritoryUnited States
Internet address

Keywords / Materials (for Non-textual outputs)

  • speech/language
  • zero-resource information retrieval
  • co-creation field study


Dive into the research topics of 'Cultivating Spoken Language Technologies for Unwritten Languages'. Together they form a unique fingerprint.

Cite this