Efficient Approximate Entity Matching Using Jaro-Winkler Distance

Yaoshu Wang, Jianbin Qin, Wei Fang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Jaro-Winkler distance is a measurement to measure the similarity between two strings. Since Jaro-Winkler distance performs well in matching personal and entity names, it is widely used in the areas of record linkage, entity linking, information extraction. Given a query string q, Jaro-Winkler distance similarity search finds all strings in a dataset D whose Jaro-Winkler distance similarity with q is no more than a given threshold τ . With the growth of the dataset size, to efficiently perform Jaro-Winkler distance similarity search becomes challenge problem. In this paper, we propose an index-based method that relies on a filter-and-verify framework to support efficient Jaro-Winkler distance similarity search on a large dataset. We leverage e-variants methods to build the index structure and pigeonhole principle to perform the search. The experiment results clearly demonstrate the efficiency of our methods.
Original languageEnglish
Title of host publication18th International Conference on Web Information Systems Engineering
Place of PublicationPuschino, Russia
PublisherSpringer, Cham
Pages231-239
Number of pages9
ISBN (Electronic)978-3-319-68783-4
ISBN (Print)978-3-319-68782-7
DOIs
Publication statusPublished - 4 Oct 2017
EventWeb Information Systems Engineering 2017 - Moscow, Russian Federation
Duration: 7 Oct 201711 Oct 2017
http://www.wise-conferences.org/2017/index.html

Publication series

NameLecture Notes in Computer Science
PublisherSpringer, Cham
Volume10569
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349
NameInformation Systems and Applications, incl. Internet/Web, and HCI
Volume10569

Conference

ConferenceWeb Information Systems Engineering 2017
Abbreviated titleWISE 2017
Country/TerritoryRussian Federation
CityMoscow
Period7/10/1711/10/17
Internet address

Fingerprint

Dive into the research topics of 'Efficient Approximate Entity Matching Using Jaro-Winkler Distance'. Together they form a unique fingerprint.

Cite this