Abstract / Description of output
Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, and dialect normalization. Experiments show that we can achieve largely monotonic behavior. Performance is mixed, with larger gains on top of RNN baselines. General monotonicity does not benefit transformer multihead attention, however, we see isolated improvements when only a subset of heads is biased towards monotonic behavior.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
Place of Publication | Online |
Publisher | Association for Computational Linguistics |
Pages | 4474-4488 |
Number of pages | 15 |
ISBN (Electronic) | 978-1-954085-46-6 |
DOIs | |
Publication status | Published - 6 Jun 2021 |
Event | 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Duration: 6 Jun 2021 → 11 Jun 2021 https://2021.naacl.org/ |
Conference
Conference | 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics |
---|---|
Abbreviated title | NAACL 2021 |
Period | 6/06/21 → 11/06/21 |
Internet address |