Abstract
We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F$_1$ on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results.
Original language | English |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics: ACL 2022 |
Place of Publication | Dublin, Ireland |
Publisher | Association for Computational Linguistics |
Pages | 1274-1291 |
Number of pages | 18 |
DOIs | |
Publication status | Published - 1 May 2022 |
Event | 60th Annual Meeting of the Association for Computational Linguistics - The Convention Centre Dublin, Dublin, Ireland Duration: 22 May 2022 → 27 May 2022 https://www.2022.aclweb.org |
Publication series
Name | Proceedings of the conference - Association for Computational Linguistics |
---|---|
Publisher | Association for Computational Linguistics |
ISSN (Electronic) | 0736-587X |
Conference
Conference | 60th Annual Meeting of the Association for Computational Linguistics |
---|---|
Abbreviated title | ACL 2022 |
Country/Territory | Ireland |
City | Dublin |
Period | 22/05/22 → 27/05/22 |
Internet address |