Projects per year
Abstract / Description of output
This article considers the problem of automatic paragraph segmentation. The task is relevant for speech-to-text applications whose output transcipts do not usually contain punctuation or paragraph indentation and are naturally difficult to read and process. Text-to-text generation applications (e.g., summarization) could also benefit from an automatic paragaraph segementation mechanism which indicates topic shifts and provides visual targets to the reader. We present a paragraph segmentation model which exploits a variety of knowledge sources (including textual cues, syntactic and discourse-related information) and evaluate its performance in different languages and domains. Our experiments demonstrate that the proposed approach significantly outperforms our baselines and in many cases comes to within a few percent of human performance. Finally, we integrate our method with a single document summarizer and show that it is useful for structuring the output of automatically generated text.
Original language | English |
---|---|
Pages (from-to) | 1-35 |
Number of pages | 35 |
Journal | ACM Transactions on Speech and Language Processing |
Volume | 3 |
Issue number | 2 |
DOIs | |
Publication status | Published - 2006 |
Fingerprint
Dive into the research topics of 'Broad coverage paragraph segmentation across languages and domains'. Together they form a unique fingerprint.Projects
- 1 Finished