A System for Aligning Geographical Entities from Large Heterogeneous Sources

André Melo, Btissam Er-Rahmadi, Jeff Z. Pan

Research output: Contribution to journalArticlepeer-review

Abstract

Aligning points of interest (POIs) from heterogeneous geographical data sources is an important task that helps extend map data with information from different datasets. This task poses several challenges, including differences in type hierarchies, labels (different formats, languages, and levels of detail), and deviations in the coordinates. Scalability is another major issue, as global-scale datasets may have tens or hundreds of millions of entities. In this paper, we propose the GeographicaL Entities AligNment (GLEAN) system for efficiently matching large geographical datasets based on spatial partitioning with an adaptable margin. In particular, we introduce a text similarity measure based on the local-context relevance of tokens used in combination with sentence embeddings. We then come up with a scalable type embedding model. Finally, we demonstrate that our proposed system can efficiently handle the alignment of large datasets while improving the quality of alignments using the proposed entity similarity measure.
Original languageEnglish
Article number96
Number of pages24
JournalISPRS International Journal of Geo-Information
Volume11
Issue number2
DOIs
Publication statusPublished - 28 Jan 2022

Keywords / Materials (for Non-textual outputs)

  • geographic information systems
  • data integration
  • entity alignment
  • points of interest
  • attributes-based matching
  • heterogeneous large-scale

Fingerprint

Dive into the research topics of 'A System for Aligning Geographical Entities from Large Heterogeneous Sources'. Together they form a unique fingerprint.

Cite this