Abstract / Description of output
In this paper, we introduce an end-to-end pipeline for retrieving, processing, and extracting targeted information from legal cases. We investigate an under-studied legal domain with a case study on refugee law in Canada.
Searching case law for past similar cases is a key part of legal work for both lawyers and judges, the potential end-users of our prototype. While traditional named-entity recognition labels such as dates provide meaningful information in legal work, we propose to extend existing models and retrieve a total of 19 useful categories of items from refugee cases.
After creating a novel data set of cases, we perform information extraction based on state-of-the-art neural named-entity recognition (NER). We test different architectures including two transformer models, using contextual and noncontextual embeddings, and compare general purpose versus domain-specific pre-training.
The results demonstrate that models pre-trained on legal data perform best despite their smaller size, suggesting that domain matching had a larger effect than network architecture. We achieve a F1 score above 90% on five of the targeted categories and over 80% on four further categories.
Searching case law for past similar cases is a key part of legal work for both lawyers and judges, the potential end-users of our prototype. While traditional named-entity recognition labels such as dates provide meaningful information in legal work, we propose to extend existing models and retrieve a total of 19 useful categories of items from refugee cases.
After creating a novel data set of cases, we perform information extraction based on state-of-the-art neural named-entity recognition (NER). We test different architectures including two transformer models, using contextual and noncontextual embeddings, and compare general purpose versus domain-specific pre-training.
The results demonstrate that models pre-trained on legal data perform best despite their smaller size, suggesting that domain matching had a larger effect than network architecture. We achieve a F1 score above 90% on five of the targeted categories and over 80% on four further categories.
Original language | English |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics: ACL 2023 |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 2992–3005 |
Number of pages | 13 |
ISBN (Electronic) | 9781959429623 |
Publication status | Published - 9 Jul 2023 |
Event | 61st Annual Meeting of the Association for Computational Linguistics - Toronto, Canada Duration: 9 Jul 2023 → 14 Jul 2023 Conference number: 61 https://2023.aclweb.org/ |
Conference
Conference | 61st Annual Meeting of the Association for Computational Linguistics |
---|---|
Abbreviated title | ACL 2023 |
Country/Territory | Canada |
City | Toronto |
Period | 9/07/23 → 14/07/23 |
Internet address |