Abstract
Various common deep learning architectures, such as LSTMs, GRUs, Resnets and Highway Networks, employ state passthrough connections that support training with high feed-forward depth or recurrence over
many time steps. These “Passthrough Networks” architectures also enable the decoupling of the network state size from the number of parameters of the network, a possibility has been studied by Sak et al. (2014) with their low-rank parametrization of the LSTM. In this work we extend this line of research, proposing effective, low-rank and low-rank plus diagonal matrix parametrizations for Passthrough Networks which exploit this decoupling property, reducing the data complexity and memory requirements of the network while preserving its memory capacity. This is particularly beneficial in lowresource settings as it supports expressive models with a compact parametrization less susceptible to overfitting. We present competitive experimental results on several tasks, including language modeling and a near state of the art result on sequential randomly-permuted MNIST classification, a hard task on natural data.
many time steps. These “Passthrough Networks” architectures also enable the decoupling of the network state size from the number of parameters of the network, a possibility has been studied by Sak et al. (2014) with their low-rank parametrization of the LSTM. In this work we extend this line of research, proposing effective, low-rank and low-rank plus diagonal matrix parametrizations for Passthrough Networks which exploit this decoupling property, reducing the data complexity and memory requirements of the network while preserving its memory capacity. This is particularly beneficial in lowresource settings as it supports expressive models with a compact parametrization less susceptible to overfitting. We present competitive experimental results on several tasks, including language modeling and a near state of the art result on sequential randomly-permuted MNIST classification, a hard task on natural data.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP |
| Subtitle of host publication | Melbourne, Australia July 19, 2018 |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 77-86 |
| Number of pages | 10 |
| Publication status | Published - Jul 2018 |
| Event | 1st Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing - Melbourne, Australia Duration: 19 Jul 2018 → 19 Jul 2018 https://sites.google.com/view/deeplo18/home |
Workshop
| Workshop | 1st Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing |
|---|---|
| Abbreviated title | DeepLo |
| Country/Territory | Australia |
| City | Melbourne |
| Period | 19/07/18 → 19/07/18 |
| Internet address |
Fingerprint
Dive into the research topics of 'Low-rank passthrough neural networks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver