Quantifying the vanishing gradient and long distance dependency problem in recursive neural networks and recursive LSTMs

Phong Le, Willem Zuidema

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Recursive neural networks (RNN) and their recently proposed extension recursive long short term memory networks (RLSTM) are models that compute representations for sentences, by recursively combining word embeddings according to an externally provided parse tree. Both models thus, unlike recurrent networks, explicitly make use of the hierarchical structure of a sentence. In this paper, we demonstrate that RNNs nevertheless suffer from the vanishing gradient and long distance dependency problem, and that RLSTMs greatly improve over RNN’s on these problems. We present an artificial learning task that allows us to quantify the severity of these problems for both models. We further show that a ratio of gradients (at the root node and a focal leaf node) is highly indicative of the success of backpropagation at optimizing the relevant weights low in the tree. This paper thus provides an explanation for existing, superior results of RLSTMs on tasks such as sentiment analysis, and suggests that the benefits of including hierarchical structure and of including LSTM-style gating are complementary.
Original languageEnglish
Title of host publicationProceedings of the 1st Workshop on Representation Learning for NLP
Place of PublicationBerlin, Germany
PublisherAssociation for Computational Linguistics
Pages87-93
Number of pages7
DOIs
Publication statusPublished - 11 Aug 2016
Event1st Workshop on Representation Learning for NLP - Berlin, Germany
Duration: 11 Aug 201611 Aug 2016
https://sites.google.com/site/repl4nlp2016/

Conference

Conference1st Workshop on Representation Learning for NLP
Abbreviated titleRepL4NLP 2016
Country/TerritoryGermany
CityBerlin
Period11/08/1611/08/16
Internet address

Fingerprint

Dive into the research topics of 'Quantifying the vanishing gradient and long distance dependency problem in recursive neural networks and recursive LSTMs'. Together they form a unique fingerprint.

Cite this