We present an extended study of content-free topic segmentation of conversational (meeting) data based on classification of vocalization events. In previous work, content-free topic segmentation achieved good accuracy through a modified naive Bayes classifier and vocalization horizon features. In this study, we attempted to improve on those results by incorporating time (sequential) dependency information into the topic boundary detection process through the use of conditional random fields and ensemble classifiers. We expected that incorporating such information would help reduce the number of false positives generated by the naive Bayes method. We introduce a new metric in the assessment of performance, in addition to the usual Pk and WindowDiff (WD) metrics in order to account for the under-detection bias of the segmentation task. Although a boosting model showed fairly good performance using a simple base classifier and limited contextual features, the more elaborate methods still trailed the Bayesian method.
|Title of host publication||Cognitive Infocommunications (CogInfoCom), 2013 IEEE 4th International Conference on|
|Number of pages||6|
|Publication status||Published - 23 Jan 2014|