Author Disambiguation: A Nonparametric Topic and Co-authorship Model

Andrew M. Dai, Amos J. Storkey

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A fully generative model is provided for the problem of author disambiguation. This approach infers the topics for each author and combines that with co-author information. The problems involved are similar to other entity resolution problems where differing references may refer to one author entity and identical references may refer to different author entities. We extend the hierarchical Dirichlet process and nonparametric latent Dirichlet allocation models to tackle this problem in a nonparametric, generative manner making no prior assumptions on the number of author entities, topics or research groups in the corpus. The model develops a hierarchical Dirichlet process for author-topic combinations. It conditions this model at document level on another hierarchical Dirichlet process for research groups. This enables the authors and topics to be suitably coupled. We perform joint inference to sample the author entities, topics and their group memberships. We present results from our approach on real-world datasets.
Original languageEnglish
Title of host publicationProceedings of NIPS Workshop on Applications for Topic Models Text and Beyond
Number of pages4
Publication statusPublished - 2009

Fingerprint

Dive into the research topics of 'Author Disambiguation: A Nonparametric Topic and Co-authorship Model'. Together they form a unique fingerprint.

Cite this