UniArk: Improving generalisation and consistency for factual knowledge extraction through debiasing

Yijun Yang, Jie He, Pinzhen Chen, Victor Gutiérrez-Basulto, Jeff Z. Pan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Several recent papers have investigated the potential of language models as knowledge bases as well as the existence of severe biases when extracting factual knowledge. In this work, we focus on the factual probing performance over unseen prompts from tuning, and using a probabilistic view we show the inherent misalignment between pre-training and downstream tuning objectives in language models for probing knowledge. We hypothesize that simultaneously debiasing these objectives can be the key to generalisation over unseen prompts. We propose an adapter-based framework, **UniArk**, for generalised and consistent factual knowledge extraction through simple methods without introducing extra parameters. Extensive experiments show that UniArk can significantly improve the model's out-of-domain generalisation as well as consistency under various prompts. Additionally, we construct **ParaTrex**, a large-scale and diverse dataset for measuring the inconsistency and out-of-domain generation of models. Further, ParaTrex offers a reference method for constructing paraphrased datasets using large language models.
Original languageEnglish
Title of host publicationProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies
EditorsKevin Duh, Helena Gomez, Steven Bethard
PublisherAssociation for Computational Linguistics
Pages7018-7035
Number of pages18
Volume1
ISBN (Electronic)9798891761148
DOIs
Publication statusPublished - 21 Jun 2024
Event2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Mexico City, Mexico
Duration: 16 Jun 202421 Jun 2024
https://2024.naacl.org/

Conference

Conference2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Abbreviated titleNAACL 2024
Country/TerritoryMexico
CityMexico City
Period16/06/2421/06/24
Internet address

Fingerprint

Dive into the research topics of 'UniArk: Improving generalisation and consistency for factual knowledge extraction through debiasing'. Together they form a unique fingerprint.

Cite this