On Biasing Transformer Attention Towards Monotonicity

Annette Rios, Chantal Amrhein, Noëmi Aepli, Rico Sennrich

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, and dialect normalization. Experiments show that we can achieve largely monotonic behavior. Performance is mixed, with larger gains on top of RNN baselines. General monotonicity does not benefit transformer multihead attention, however, we see isolated improvements when only a subset of heads is biased towards monotonic behavior.
Original languageEnglish
Title of host publicationProceedings of the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Place of PublicationOnline
PublisherAssociation for Computational Linguistics
Pages4474-4488
Number of pages15
ISBN (Electronic)978-1-954085-46-6
DOIs
Publication statusPublished - 6 Jun 2021
Event2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Online
Duration: 6 Jun 202111 Jun 2021
https://2021.naacl.org/

Conference

Conference2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Abbreviated titleNAACL 2021
Period6/06/2111/06/21
Internet address

Fingerprint

Dive into the research topics of 'On Biasing Transformer Attention Towards Monotonicity'. Together they form a unique fingerprint.

Cite this