TY - GEN
T1 - A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching
AU - Ardanuy, Mariona Coll
AU - Hosseini, Kasra
AU - McDonough, Katherine
AU - Krause, Amrey
AU - Van Strien, Daniel
AU - Nanni, Federico
N1 - Funding Information:
This work was supported by Living with Machines (AHRC grant AH/S01179X/1) and The Alan Turing Institute (EPSRC grant EP/ N510129/1). Newspaper data was kindly shared by Findmypast.
Publisher Copyright:
© 2020 Owner/Author.
PY - 2020/11/3
Y1 - 2020/11/3
N2 - Recognizing toponyms and resolving them to their real-world referents is required to provide advanced semantic access to textual data. This process is often hindered by the high degree of variation in toponyms. Candidate selection is the task of identifying the potential entities that can be referred to by a previously recognized toponym. While it has traditionally received little attention, candidate selection has a significant impact on downstream tasks (i.e. entity resolution), especially in noisy or non-standard text. In this paper, we introduce a deep learning method for candidate selection through toponym matching, using state-of-the-art neural network architectures. We perform an intrinsic toponym matching evaluation based on several datasets, which cover various challenging scenarios (cross-lingual and regional variations, as well as OCR errors) and assess its performance in the context of geographical candidate selection in English and Spanish.
AB - Recognizing toponyms and resolving them to their real-world referents is required to provide advanced semantic access to textual data. This process is often hindered by the high degree of variation in toponyms. Candidate selection is the task of identifying the potential entities that can be referred to by a previously recognized toponym. While it has traditionally received little attention, candidate selection has a significant impact on downstream tasks (i.e. entity resolution), especially in noisy or non-standard text. In this paper, we introduce a deep learning method for candidate selection through toponym matching, using state-of-the-art neural network architectures. We perform an intrinsic toponym matching evaluation based on several datasets, which cover various challenging scenarios (cross-lingual and regional variations, as well as OCR errors) and assess its performance in the context of geographical candidate selection in English and Spanish.
KW - Candidate selection
KW - Deep learning
KW - Fuzzy String Matching
KW - Toponym matching
UR - http://www.scopus.com/inward/record.url?scp=85097298252&partnerID=8YFLogxK
U2 - 10.1145/3397536.3422236
DO - 10.1145/3397536.3422236
M3 - Conference contribution
AN - SCOPUS:85097298252
T3 - GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems
SP - 385
EP - 388
BT - Proceedings of the 28th International Conference on Advances in Geographic Information Systems, SIGSPATIAL GIS 2020
A2 - Lu, Chang-Tien
A2 - Wang, Fusheng
A2 - Trajcevski, Goce
A2 - Huang, Yan
A2 - Newsam, Shawn
A2 - Xiong, Li
PB - Association for Computing Machinery, Inc
T2 - 28th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL GIS 2020
Y2 - 3 November 2020 through 6 November 2020
ER -