Abstract
A fully generative model is provided for the problem of author disambiguation.
This approach infers the topics for each author and combines that with co-author
information. The problems involved are similar to other entity resolution problems
where differing references may refer to one author entity and identical references
may refer to different author entities. We extend the hierarchical Dirichlet process
and nonparametric latent Dirichlet allocation models to tackle this problem in a
nonparametric, generative manner making no prior assumptions on the number
of author entities, topics or research groups in the corpus. The model develops
a hierarchical Dirichlet process for author-topic combinations. It conditions this
model at document level on another hierarchical Dirichlet process for research
groups. This enables the authors and topics to be suitably coupled. We perform
joint inference to sample the author entities, topics and their group memberships.
We present results from our approach on real-world datasets.
Original language | English |
---|---|
Title of host publication | Proceedings of NIPS Workshop on Applications for Topic Models Text and Beyond |
Number of pages | 4 |
Publication status | Published - 2009 |