Abstract
This paper shows that the web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the web by querying a search engine. We evaluate this method by demonstrating that web frequencies and correlate with frequencies obtained from a carefully edited, balanced corpus. We also perform a task-based evaluation, showing that web frequencies can reliably predict human plausibility judgments.
| Original language | English |
|---|---|
| Title of host publication | EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing |
| Place of Publication | Stroudsburg, PA |
| Publisher | Association for Computational Linguistics |
| Pages | 230-237 |
| Number of pages | 8 |
| Volume | 10 |
| Publication status | Published - 2002 |
| Event | 7th Conference on Empirical Methods in Natural Language Processing (EMNLP 2002) - University of Pennsylvania, Philadelphia, PA, United States Duration: 6 Jul 2002 → 7 Jul 2002 |
Conference
| Conference | 7th Conference on Empirical Methods in Natural Language Processing (EMNLP 2002) |
|---|---|
| Country/Territory | United States |
| City | Philadelphia, PA |
| Period | 6/07/02 → 7/07/02 |
Fingerprint
Dive into the research topics of 'Using the Web to Overcome Data Sparseness'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver