Detecting Novel Compounds: The Role of Distributional Evidence

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Research on the discovery of terms from corpora has focused on word sequences whose recurrent occurrence in a corpus is indicative of their terminological status, and has not addressed the issue of discovering terms when data is sparse. This becomes apparent in the case of noun compounding, which is extremely productive: more than half of the candidate compounds extracted from a corpus are attested only once. We show how evidence about established (i.e., frequent) compounds can be used to estimate features that can discriminate rare valid compounds from rare nonce terms in addition to a variety of linguistic features than can be easily gleaned from corpora without relying on parsed text.
Original languageEnglish
Title of host publication10th Conference of the European Chapter of the Association for Computational Linguistics
PublisherAssociation for Computational Linguistics
Pages235-242
Number of pages8
Publication statusPublished - 2003
Event10th Conference of the European Chapter of the Association for Computational Linguistics (EACL) 2003 - Agro Hotel, Budapest, Hungary
Duration: 12 Apr 200317 Apr 2003

Conference

Conference10th Conference of the European Chapter of the Association for Computational Linguistics (EACL) 2003
Country/TerritoryHungary
CityBudapest
Period12/04/0317/04/03

Fingerprint

Dive into the research topics of 'Detecting Novel Compounds: The Role of Distributional Evidence'. Together they form a unique fingerprint.

Cite this