Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A broadcast news stream consists of a number of stories and it is an important task to find the boundaries of stories automatically in news analysis. We capture the topic structure using a hierarchical model based on a Recurrent Neural Network (RNN) sentence modeling layer and a bidirectional Long Short-Term Memory (LSTM) topic modeling layer, with a fusion of acoustic and lexical features. Both features are accumulated with RNNs and trained jointly within the model to be fused at the sentence level. We conduct experiments on the topic detection and tracking (TDT4) task comparing combinations of two modalities trained with limited amount of parallel data. Further we utilize additional sufficient text data for training to polish our model. Experimental results indicate that the hierarchical RNN topic modeling takes advantage of the fusion scheme, especially with additional text training data, with a higher F1-measure compared to conventional state-of-the-art methods.
Original languageEnglish
Title of host publication2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages525-532
Number of pages9
ISBN (Electronic)978-1-5090-4788-8
ISBN (Print)978-1-5090-4789-5
DOIs
Publication statusPublished - 25 Jan 2018
Event2017 IEEE Automatic Speech Recognition and Understanding Workshop - Okinawa, Japan
Duration: 16 Dec 201720 Dec 2017
https://asru2017.org/

Conference

Conference2017 IEEE Automatic Speech Recognition and Understanding Workshop
Abbreviated titleASRU 2017
Country/TerritoryJapan
CityOkinawa
Period16/12/1720/12/17
Internet address

Fingerprint

Dive into the research topics of 'Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features'. Together they form a unique fingerprint.

Cite this