Techniques that compare short text segments using dependency paths (or simply, paths) appear in a wide range of automated language processing applications including question answering (QA). However, few models in ad hoc information retrieval (IR) use paths for document ranking due to the prohibitive cost of parsing a retrieval collection. In this paper, we introduce a flexible notion of paths that describe chains of words on a dependency path. These chains, or catenae, are readily applied in standard IR models. Informative catenae are selected using supervised machine learning with linguistically informed features and compared to both non-linguistic terms and catenae selected heuristically with filters derived from work on paths. Automatically selected catenae of 1-2 words deliver significant performance gains on three TREC collections.
|Title of host publication||Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)|
|Place of Publication||Sofia, Bulgaria|
|Publisher||Association for Computational Linguistics|
|Number of pages||10|
|Publication status||Published - 1 Aug 2013|