Low-rank passthrough neural networks

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Various common deep learning architectures, such as LSTMs, GRUs, Resnets and Highway Networks, employ state passthrough connections that support training with high feed-forward depth or recurrence over
many time steps. These “Passthrough Networks” architectures also enable the decoupling of the network state size from the number of parameters of the network, a possibility has been studied by Sak et al. (2014) with their low-rank parametrization of the LSTM. In this work we extend this line of research, proposing effective, low-rank and low-rank plus diagonal matrix parametrizations for Passthrough Networks which exploit this decoupling property, reducing the data complexity and memory requirements of the network while preserving its memory capacity. This is particularly beneficial in lowresource settings as it supports expressive models with a compact parametrization less susceptible to overfitting. We present competitive experimental results on several tasks, including language modeling and a near state of the art result on sequential randomly-permuted MNIST classification, a hard task on natural data.
Original languageEnglish
Title of host publicationProceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Subtitle of host publicationMelbourne, Australia July 19, 2018
PublisherAssociation for Computational Linguistics (ACL)
Pages77-86
Number of pages10
Publication statusPublished - Jul 2018
Event1st Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing - Melbourne, Australia
Duration: 19 Jul 201819 Jul 2018
https://sites.google.com/view/deeplo18/home

Workshop

Workshop1st Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing
Abbreviated titleDeepLo
CountryAustralia
CityMelbourne
Period19/07/1819/07/18
Internet address

Fingerprint

Dive into the research topics of 'Low-rank passthrough neural networks'. Together they form a unique fingerprint.

Cite this