LSTMS Compose — and Learn — Bottom-Up

Naomi Saphra, Adam Lopez

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Recent work in NLP shows that LSTM language models capture hierarchical structure in language data. In contrast to existing work, we consider the learning process that leads to their compositional behavior. For a closer look at how an LSTM’s sequential representations are composed hierarchically, we present a related measure of Decompositional Interdependence (DI) between word meanings in an LSTM, based on their gate interactions. We connect this measure to syntax with experiments on English language data, where DI is higher on pairs of words with lower syntactic distance. To explore the inductive biases that cause these compositional representations to arise during training, we conduct simple experiments on synthetic data. These synthetic experiments support a specific hypothesis about how hierarchical structures are discovered over the course of training: that LSTM constituent representations are learned bottom-up, relying on effective representations of their shorter children, rather than learning the longer-range relations independently from children.
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics: EMNLP 2020
PublisherAssociation for Computational Linguistics
Number of pages13
ISBN (Print)978-1-952148-90-3
Publication statusPublished - 16 Nov 2020
EventThe 2020 Conference on Empirical Methods in Natural Language Processing - Virtual conference
Duration: 16 Nov 202020 Nov 2020


ConferenceThe 2020 Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2020
CityVirtual conference
Internet address


Dive into the research topics of 'LSTMS Compose — and Learn — Bottom-Up'. Together they form a unique fingerprint.

Cite this