Edinburgh Research Explorer

The Grouped Author-Topic Model for Unsupervised Entity Resolution

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions

Open

Documents

http://link.springer.com/chapter/10.1007%2F978-3-642-21735-7_30
Original languageEnglish
Title of host publicationARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT I
EditorsT Honkela, W Duch, M Girolami, S Kaski
Place of PublicationBERLIN
PublisherSpringer-Verlag Berlin Heidelberg
Pages241-249
Number of pages9
ISBN (Print)978-3-642-21734-0
Publication statusPublished - 2011
Event21st International Conference on Artificial Neural Networks, ICANN 2011 - Espoo, Finland
Duration: 14 Jun 201117 Jun 2011

Publication series

NameLecture Notes in Computer Science
PublisherSPRINGER-VERLAG BERLIN
Volume6791
ISSN (Print)0302-9743

Conference

Conference21st International Conference on Artificial Neural Networks, ICANN 2011
CountryFinland
Period14/06/1117/06/11

Abstract

This paper describes a generative approach for tackling the problem of identity resolution in a completely unsupervised context with no fixed assumption regarding the true number of identities. The problem of entity resolution involves associating different references to authors (in a paper's author list, for example) with real underlying identities. The references may be written in differing forms or may have errors, and identical references may refer to different real identities. The approach taken here uses a generative model of both the abstract of a document and its list of authors to resolve identities in a corpus of documents. In the model, authors and topics are associated with latent groups. For each document, an abstract and an author list are generated conditioned on a given group. Results are presented on real-world datasets, and outperform the best performing unsupervised methods.

    Research areas

  • Bayesian nonparametrics, Dirichlet processes, nested Dirichlet processes, author disambiguation, DIRICHLET PROCESS MIXTURE

Event

Download statistics

No data available

ID: 20027297