Detecting Subject Boundaries Within Text: A Language-independent Statistical Approach

K. Richmond, A. Smith, E. Amitay

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We describe here an algorithm for detecting subject boundaries within text based on a statistical lexical similarity measure. Hearst has already tackled this problem with good results (Hearst, 1994). One of her main assumptions is that a change in subject is accompanied by a change in vocabulary. Using this assumption, but by introducing a new measure of word significance, we have been able to build a robust and reliable algorithm which exhibits improved accuracy without sacrificing language independency.
Original languageEnglish
Title of host publicationProc. The Second Conference on Empirical Methods in Natural Language Processing
PublisherAssociation for Computational Linguistics
Number of pages8
Publication statusPublished - Aug 1997


Dive into the research topics of 'Detecting Subject Boundaries Within Text: A Language-independent Statistical Approach'. Together they form a unique fingerprint.

Cite this