AsyLex: A Dataset for Legal Language Processing of Refugee Claims

Claire Barale, Mark Klaisoongnoen, Pasquale Minervini, Michael Rovatsos, Nehal Bhuta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Advancements in natural language processing (NLP) and language models have demonstrated immense potential in the legal domain, enabling automated analysis and comprehension of legal texts. However, developing robust models in Legal NLP is significantly challenged by the scarcity of resources. This paper presents AsyLex, the first dataset specifically designed for Refugee Law applications to address this gap. The dataset introduces 59,112 documents on refugee status determination in Canada from 1996 to 2022, providing researchers and practitioners with essential material for training and evaluating NLP models for legal research and case review. Case review is defined as entity extraction and outcome prediction tasks. The dataset includes 19,115 gold-standard human-labeled annotations for 20 legally relevant entity types curated with the help of legal experts and 1,682 gold-standard labeled documents for the case outcome. Furthermore, we supply the corresponding trained entity extraction models and the resulting labeled entities generated through the inference process on AsyLex. Four supplementary features are obtained through rule-based extraction. We demonstrate the usefulness of our dataset on the legal judgment prediction task to predict the binary outcome and test a set of baselines using the text of the documents and our annotations. We observe that models pretrained on similar legal documents reach better scores, suggesting that acquiring more datasets for specialized domains such as law is crucial. The dataset is available at https://huggingface. co/datasets/clairebarale/AsyLex.
Original languageEnglish
Title of host publicationProceedings of the Natural Legal Language Processing Workshop (NLLP 23)
PublisherAssociation for Computational Linguistics (ACL)
Number of pages14
ISBN (Electronic)979-8-89176-054-7
Publication statusPublished - 7 Dec 2023
EventThe 5th Natural Legal Language Processing Workshop 2023 - , Singapore
Duration: 7 Dec 2023 → …
Conference number: 5


WorkshopThe 5th Natural Legal Language Processing Workshop 2023
Abbreviated titleNLLP 2023
Period7/12/23 → …
Internet address


Dive into the research topics of 'AsyLex: A Dataset for Legal Language Processing of Refugee Claims'. Together they form a unique fingerprint.

Cite this