On the Difficulty of Segmenting Words with Attention

Ramon Sanabria, Hao Tang, Sharon Goldwater

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech translation or speech recognition, attention can be used to locate and segment the words. We show, however, that even on monolingual data this approach is brittle. In our experiments with different input types, data sizes, and segmentation algorithms, only models trained to predict phones from words succeed in the task. Models trained to predict words from either phones or speech (i.e., the opposite direction needed to generalize to new data), yield much worse results, suggesting that attention-based segmentation is only useful in limited scenarios.
Original languageEnglish
Title of host publicationProceedings of the Second Workshop on Insights from Negative Results in NLP
PublisherAssociation for Computational Linguistics (ACL)
Number of pages7
ISBN (Electronic)978-1-954085-93-0
Publication statusPublished - 10 Nov 2021
EventWorkshop on Insights from Negative Results in NLP 2021 - Online, Punta Cana, Dominican Republic
Duration: 10 Nov 202110 Nov 2021


ConferenceWorkshop on Insights from Negative Results in NLP 2021
Country/TerritoryDominican Republic
CityPunta Cana
Internet address


Dive into the research topics of 'On the Difficulty of Segmenting Words with Attention'. Together they form a unique fingerprint.

Cite this