Semi-Automatic Construction of Text-to-SQL Dataset for Domain Transfer

Tianyi Li, Sujian Li, Mark Steedman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Strong and affordable in-domain data is a desirable asset when transferring trained semantic parsers to novel domains. As previous methods for semi-automatically constructing such data cannot handle the complexity of realistic SQL queries, we propose to construct SQL queries via context-dependent sampling, and introduce the concept of topic. Along with our SQL query construction method, we propose a novel pipeline of semi-automatic Text-to-SQL dataset construction that covers the broad space of SQL queries. We show that the created dataset is comparable with expert annotation along multiple dimensions, and is capable of improving domain transfer performance for SOTA semantic parsers.
Original languageEnglish
Title of host publicationProceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)
EditorsStephan Oepen, Kenji Sagae, Reut Tsarfaty, Gosse Bouma, Djamé Seddah, Daniel Zeman
Place of PublicationStroudsburg, PA, United States
PublisherAssociation for Computational Linguistics (ACL)
Pages38-49
Number of pages12
ISBN (Electronic)978-1-954085-80-0
DOIs
Publication statusPublished - 6 Aug 2021
EventThe 17th International Conference on Parsing Technologies - Bangkok, Thailand
Duration: 6 Aug 20216 Aug 2021
Conference number: 17
https://iwpt21.sigparse.org/

Conference

ConferenceThe 17th International Conference on Parsing Technologies
Abbreviated titleIWPT 2021
Country/TerritoryThailand
CityBangkok
Period6/08/216/08/21
Internet address

Fingerprint

Dive into the research topics of 'Semi-Automatic Construction of Text-to-SQL Dataset for Domain Transfer'. Together they form a unique fingerprint.

Cite this