Co-training an Unsupervised Constituency Parser with Weak Supervision

Nickil Maveli, Shay Cohen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F$_1$ on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results.
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics: ACL 2022
Place of PublicationDublin, Ireland
PublisherAssociation for Computational Linguistics
Pages1274-1291
Number of pages18
DOIs
Publication statusPublished - 1 May 2022
Event60th Annual Meeting of the Association for Computational Linguistics - The Convention Centre Dublin, Dublin, Ireland
Duration: 22 May 202227 May 2022
https://www.2022.aclweb.org

Publication series

NameProceedings of the conference - Association for Computational Linguistics
PublisherAssociation for Computational Linguistics
ISSN (Electronic)0736-587X

Conference

Conference60th Annual Meeting of the Association for Computational Linguistics
Abbreviated titleACL 2022
Country/TerritoryIreland
CityDublin
Period22/05/2227/05/22
Internet address

Fingerprint

Dive into the research topics of 'Co-training an Unsupervised Constituency Parser with Weak Supervision'. Together they form a unique fingerprint.

Cite this