Abstract
In recent years, error mining approaches were developed to help identify the most likely sources of parsing failures in parsing systems using handcrafted grammars and lexicons. However the techniques they use to enumerate and count n-grams builds on the sequential nature of a text corpus and do not easily extend to structured data. In this paper, we propose an algorithm for mining trees and apply it to detect the most likely sources of generation failure. We show that this tree mining algorithm permits identifying not only errors in the generation system (grammar, lexicon) but also mismatches between the structures contained in the input and the input structures expected by our generator as well as a few idiosyncrasies/error in the input data.
Original language | English |
---|---|
Title of host publication | Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) |
Place of Publication | Jeju Island, Korea |
Publisher | Association for Computational Linguistics |
Pages | 592-600 |
Number of pages | 9 |
Publication status | Published - 1 Jul 2012 |
Event | 50th Annual Meeting of the Association for Computational Linguistics - Jeju Island, Korea, Republic of Duration: 8 Jul 2012 → 14 Jul 2012 |
Conference
Conference | 50th Annual Meeting of the Association for Computational Linguistics |
---|---|
Country/Territory | Korea, Republic of |
City | Jeju Island |
Period | 8/07/12 → 14/07/12 |